Table of Contents

Search

  1. Preface
  2. Introduction to Data Integration Hub
  3. Getting Started with Data Integration Hub
  4. Creating Topics
  5. Creating Publications
  6. Creating Subscriptions
  7. Appendix A: Glossary

Getting Started

Getting Started

Creating Topics Overview

Creating Topics Overview

In this section, you create topics to which applications publish data and from which applications consume data. You must have first completed the chapter "Getting Started with
Data Integration Hub
."
When you create a topic, you choose the topic type and the type of repository on which to store data for the topic, define the data structure and the data retention period, select a data storage location, and assign topic permissions.

Chapter Concepts

Data Integration Hub
can manage and store topic data on the following types of publication repository:
  • Relational database. Choose this type of repository to store published data in a relational database structure that represents the structure in which you want to keep the data. For example, data that is published from a relational database or from files. A relational database publication repository usually stores the data for a short intermediate period after the data is consumed by all subscribers.
    Data Integration Hub
    supports the following databases on which to store relational database topic data: Oracle, Microsoft SQL Server.
  • Big Data. Choose this type of repository if you publish high volumes of data that you want to store for a long period of time or if you do not want
    Data Integration Hub
    to delete published data after the data is consumed. The availability of the Hadoop repository depends on whether or not the Hadoop component is installed on your system.
    To publish and subscribe to a Hadoop-based repository with custom publications and subscriptions, you must use workflows that are based on a Data Engineering Integration mapping and workflow. When you create a custom publication, if one of the topics that you select for the publication is a Hadoop-based topic, only workflows that are based on a Data Engineering Integration mapping or workflow are listed for selection as the publication mapping.
    When you create a compound subscription, that subscribes to multiple topics, all topics that you select must be Hadoop-based, and only workflows that are based on a Data Engineering Integration mapping or workflow are listed for selection as the subscription mapping. You also enable the mandatory option for topics in compound subscription to prioritize a few topics in the compound subscription over other topics.
    Data Integration Hub
    triggers a processing of subscription after the publication event for all topics are completed. If the wait time of the publication event is complete and
    Data Integration Hub
    has not published all mandatory topics, an error event is generated during run-time.
    Before you use a Hadoop-based publication repository for publications and subscriptions, consider the following restrictions:
    • You cannot assign a pre-process to a custom publication that publishes to a Hadoop-based repository.
    • You cannot configure a custom publication that publishes files to a Hadoop-based repository to run immediately when the files are ready to be published.
    • You cannot use a Hadoop repository to publish and subscribe to pass-through files and Hadoop Distributed File System (HDFS) files.
  • File Store. Choose this type of repository to publish files that you want to keep as-is without loading the data into a relational database. For example, if you publish PDF or .zip files into a file repository,
    Data Integration Hub
    delivers the files without processing them.
  • Real-time. Choose this type of repository to monitor real-time Apache Kafka data streaming. Apache Kafka is a distributed streaming platform that can publish and subscribe to stream of records, store and process streams of record. In order to track the Apache Kafka flows, you must configure the Apache Kafka server URL in the System property of the Data Integration Hub.
    You must then create a topic with the publication repository type of Real-time and create an application to define the publisher and subscriber. Also, create a workflow that maps to the Apache Kafka server. The publication and subscription of the Data Integration hub associated with the source and target of the Kafka server.
    The Data Integration Hub records streaming of data in the Apache Kafka server at regular intervals. The
    Data Integration Hub
    operator configures the interval at which
    Data Integration Hub
    must record the data streaming value in the topic. The Events List stores the log of events. The Processing Information tab in the Events List, stores the Offset and LogEndOffSet values that define the difference between data values at intervals in each partition.
When you create the structure of a topic, you define the data structure on the
Data Integration Hub
publication repository to where the publications that are associated with the topic publish data, and from where subscribers to the topic consume the data. The topic structure must contain at least one table and can consist of multiple tables.
The data retention period defines how long
Data Integration Hub
retains the data in the publication repository after the data is consumed.
Topic permissions control who can access the topic. The
Data Integration Hub
administrator creates categories and assigns categories to user groups to determine the users that can view or change topics. You assign categories to a topic to permit users to view or change the topic. Because publications and subscriptions are associated with topics, they inherit the permissions from the associated topic. When you configure permissions for a topic, only user groups with permissions to the topic can access the associated subscriptions and publications.

Chapter Objectives

In this chapter, you perform the following tasks:
  • Create a topic where the published data is stored on a relational database.
  • Create a topic where the published data is stored on a Hadoop repository.
  • Create a topic where the published data is stored on a file repository.
  • Create a topic where the published data is stored in a Apache Kafka server.

0 COMMENTS

We’d like to hear from you!