If Informatica Big Data Streaming Will Consume Data from a Target
If Informatica Big Data Streaming Will Consume Data from a Target
PowerExchange CDC Publisher streams change data that PowerExchange captured in near real time to target messaging systems such as Apache Kafka. Informatica Big Data Streaming can then consume the change data from the target message queue and use it for a variety of purposes. For example, Big Data Streaming can use the change data to generate near-real-time fraud detection alerts or customize sales offers at point-of-sale.
If the Big Data Streaming product will consume the change data that the PowerExchange CDC Publisher sends to a target messaging system, use the following PowerExchange CDC Publisher configuration guidelines:
Big Data Streaming cannot consume data from fields that have a binary data type. Configure the PowerExchange CDC Publisher to send data from binary fields as string data by setting the following properties in the cdcPublisherAvro.cfg configuration file:
Formatter.avroBinaryAsString=true
. With this setting, binary data is represented as string data in the generated Avro messages.
Formatter.avroBinaryStringRepresentationType=(
base64
|hexadecimal}
. When
Formatter.avroBinaryAsString=true
, this property determines whether to use base64 or hexadecimal strings to represent binary data. Default is base64.
Big Data Streaming cannot consume JSON-encoded Avro messages. To use binary-encoded messages, specify
Formatter.avroEncodingType=binary
in the cdcPublisherAvro.cfg configuration file.
As a consumer application, Big Data Streaming must have copies of the Avro schemas for the source tables to properly interpret the change data in the messages. You can use the REPORT=FORMAT parameter of the PwxCDCAdmin utility to report the existing Avro schemas in a legible format for use by consumer applications. If no Avro schemas have been generated for the source tables, the utility attempts to create the Avro schemas based on the properties in the cdcPublisherAvro.cfg configuration file. For more information, see
PwxCDCAdmin Utility - Command and Parameters.
If you try to import an Avro schema that the PowerExchange CDC Publisher generated for a very large table and that is larger than 65535 bytes into Big Data Streaming, the Scala compiler issues a Java exception related to the scala.tools.asm package. This problem occurs because the Scala code does not handle literals greater than 65535 bytes in size. To circumvent this problem, you can configure the PowerExchange CDC Publisher to generate Avro schema in a minimized format by specifying some or all of the following properties in the cdcPublisherAvro.cfg configuration file:
Formatter.avroSchemaPrintPretty={
true
|false}
. Set this property to false to
not
include the spaces and line feeds that are intended to improve legibility in the generated Avro schemas. Default value is true, which causes the spaces and line feeds to be included.
Formatter.avroSchemaPrintDocFields={
true
|false}
. Set this property to false to
not
report the "doc" fields in the generated Avro schemas. The doc fields include metadata such as the CDC and PowerExchange datatypes, precision, and scale. Default value is true, which causes this information to be included.
Formatter.avroSchemaPrintDefaultFields={
true
|false
}. Set this property to false to
not
include the "default" fields in the generated Avro schemas. Default value is true, which causes the default fields to be included.