Table of Contents

Search

  1. Preface
  2. Introduction to Data Integration Hub
  3. Catalog
  4. Applications
  5. Topics
  6. Creating Topics
  7. Topic Properties
  8. Publications
  9. Creating Publications
  10. Publication Properties
  11. Subscriptions
  12. Creating Subscriptions
  13. Subscription Properties
  14. Events and Event Monitoring
  15. Dashboard and Reports
  16. Glossary

Operator Guide

Operator Guide

Publication Repository Types

Publication Repository Types

When you create a topic, you choose the type of publication repository in which
Data Integration Hub
manages and stores published data for the topic.
Data Integration Hub
can store topic data on the following types of publication repository:
  • Relational database. Choose this type of repository to store published data in a relational database structure that represents the structure in which you want to keep the data. For example, data that is published from a relational database or from files. A relational database publication repository usually stores the data for a short intermediate period after the data is consumed by all subscribers.
    Data Integration Hub
    supports the following databases on which to store relational database topic data: Oracle, Microsoft SQL Server.
  • Big Data. Choose this type of repository if you publish high volumes of data that you want to store for a long period of time or if you do not want
    Data Integration Hub
    to delete published data after the data is consumed. The availability of the Hadoop repository depends on whether or not the Hadoop component is installed on your system.
    To publish and subscribe to a Hadoop-based repository with custom publications and subscriptions, you must use workflows that are based on a Data Engineering Integration mapping and workflow. When you create a custom publication, if one of the topics that you select for the publication is a Hadoop-based topic, only workflows that are based on a Data Engineering Integration mapping or workflow are listed for selection as the publication mapping.
    When you create a compound subscription, that subscribes to multiple topics, all topics that you select must be Hadoop-based, and only workflows that are based on a Data Engineering Integration mapping or workflow are listed for selection as the subscription mapping. You also enable the mandatory option for topics in compound subscription to prioritize a few topics in the compound subscription over other topics.
    Data Integration Hub
    triggers a processing of subscription after the publication event for all topics are completed. If the wait time of the publication event is complete and
    Data Integration Hub
    has not published all mandatory topics, an error event is generated during run-time.
    Before you use a Hadoop-based publication repository for publications and subscriptions, consider the following restrictions:
    • You cannot assign a pre-process to a custom publication that publishes to a Hadoop-based repository.
    • You cannot configure a custom publication that publishes files to a Hadoop-based repository to run immediately when the files are ready to be published.
    • You cannot use a Hadoop repository to publish and subscribe to pass-through files and Hadoop Distributed File System (HDFS) files.
  • File Store. Choose this type of repository to publish files that you want to keep as-is without loading the data into a relational database. For example, if you publish PDF or .zip files into a file repository,
    Data Integration Hub
    delivers the files without processing them.
  • Real-time. Choose this type of repository to monitor real-time Apache Kafka data streaming. Apache Kafka is a distributed streaming platform that can publish and subscribe to stream of records, store and process streams of record.
    In order to track the Kafka flows, you must configure the Apache Kafka server URL in the System property of the Data Integration Hub.
    You must then create a topic with the publication repository type of Real-time and create an application to define the publisher and subscriber. Also, create a workflow that maps to the Apache Kafka server. The publication and subscription of the Data Integration hub associated with the source and target of the Kafka server.
    The Data Integration Hub records streaming of data in the Apache Kafka server at regular intervals. The
    Data Integration Hub
    operator configures the interval at which
    Data Integration Hub
    must record the data streaming value in the topic. The Events List stores the log of events. The Processing Information tab in the Events List, stores the Offset and LogEndOffSet values that define the difference between data values at intervals in each partition. For more information about events, see Managing Events on the Event List Page.
When you create a compound subscription, that is, a subscription that consumes data sets from multiple topics with a single batch workflow, all topics must be of the same type.
Data Integration Hub
operator enables the mandatory option for topics in compound subscription to prioritize a few topics in the compound subscription over other topics.
Data Integration Hub
triggers a processing of subscription after the publication event for all topics are completed. If the wait time of the publication event is complete and
Data Integration Hub
has not published all mandatory topics, an error event is generated during run-time.

0 COMMENTS

We’d like to hear from you!