The Informatica CDC Publisher is a Java-based tool that is used to stream change data to a target messaging system such as Apache Kafka.
The CDC Publisher contains the following components that move data:
The CDC Publisher
Extractor
consumes a stream of change data from the source. The incoming data records include schema information, row-based data changes, and transactional boundary metadata. The Extractor performs the following functions:
Assigns a sequence ID that is both repeatable and increasing to each change data record.
Interacts with the component that supplies the streamed data.
Ignores records that are older than the current restart point.
Verifies that data is in an expected format.
Places the results on an outbound queue for Formatter processing.
The
Filter
component optionally filters the extracted change data based on lists of source objects to include or exclude that you specify.
The
Formatter
receives change data from the CDC Publisher Extractor, formats the data based on the generated Avro schema of the selected format (flat, nested, or generic) for inclusion in messages, and sends the formatted messages to the Connector.
The
Connector
reads the formatted messages from the Formatter and connects to the target messaging system to apply the messages. The Connector applies the message data in a consistent, ordered, and recoverable manner.
The following image shows the basic architecture of the Java-based CDC Publisher: