Table of Contents

Search

  1. Preface
  2. PowerExchange CDC Publisher Overview
  3. Installing and Upgrading PowerExchange CDC Publisher
  4. PowerExchange CDC Publisher Key Concepts
  5. PowerExchange Change Capture Environment
  6. Target Messaging Systems
  7. Configuring PowerExchange CDC Publisher
  8. Streaming Change Data
  9. Monitoring PowerExchange CDC Publisher
  10. Administering PowerExchange CDC Publisher
  11. Appendix A: Command Reference for the Command-Line Utilities
  12. Appendix B: Avro Schema Formats
  13. Appendix C: Custom Pattern Formats
  14. Appendix D: Message Reference

User Guide

User Guide

Formatter Configuration Properties

Formatter Configuration Properties

The sample cdcPublisherAvro.cfg file contains configuration properties that define the format of the generated Avro schema, the encoding type to use for serializing the Avro records to be included in messages, and several optional Formatter settings.

Property Descriptions

The following properties are in the sample cdcPublisherAvro.cfg file:
Formatter.avroSchemaFormat=avroFlatSchemaFormatV1 is the only setting that is supported for custom patterns.
Formatter.formatterType
The type of data serialization formatter to use for messages. The only valid value is
Avro
.
Formatter.avroSchemaFormat
Required. The Avro schema format that the PowerExchange CDC Publisher uses to generate the Avro schema that will determine the structure of the message values. Valid values are:
  • avroFlatSchemaFormatV1
    . Structures messages by using a flat Avro schema format, which lists all Avro fields in one Avro record. A unique Avro schema is generated for each source object, which contains the Avro field definitions.
  • avroNestedSchemaFormatV1
    . Structures messages by using a nested Avro schema format, which provides a main Avro record that contains a separate nested record for each type of Avro field.
  • avroGenericSchemaFormatV1
    . Structures messages in a generic manner that accommodates any source object definition. All source columns are represented by an array. Each array entry contains column data and metadata. The source column names are included in each data record, allowing the generic schema to be independent of the source table.
    Do not use this format type with the Custom Pattern Formatter.
No default value is provided.
You can "wrap" a flat, nested, or generic schema by setting the Formatter.avroWrapperSchemaFormat property to avroWrapperSchemaFormatV1. The schema then consists of four fields for each source object.
Use a generic or wrapper schema to allow a single Avro schema to represent multiple source tables. For more information about the schema formats, see Appendix B: Avro Schema Formats.
Formatter.avroEncodingType
Required. The Avro encoding type that the CDC Publisher Formatter uses to serialize the Avro records to be included in messages. Valid values are:
  • binary
    . Use binary encoding to serialize Avro records.
  • json
    . Use JSON to serialize Avro records.
  • none
    . Do not use any explicit encoding type. Specify this option only if you use Confluent Schema Registry in a Kafka target environment.
No default value is provided.
The following additional properties can also be included in the cdcPublisherAvro.cfg file at your discretion:
Formatter.avroBinaryAsString
Controls whether change data with a binary datatype is represented as string data in Avro messages. Set this property to true if the data will be consumed by applications that do not support binary data, such as Informatica Data Engineering Streaming. The default value is false.
Formatter.avroExcludeDTLColumns
Excludes all or selected PowerExchange-generated metadata columns that have the DTL__ prefix from the messages.
  • To exclude all DTL__ columns from the formatted results, enter only the asterisk (*) wildcard character in the following format:
    Formatter.avroExcludeDTLColumns=*
  • To exclude one or more individual DTL__ columns from the formatted results, enter the column names using a comma (,) separator. For example:
    Formatter.avroExcludeDTLColumns=(DTL__CAPXACTION, DTL__CAPXRESTART1, DTL__CAPXRESTART2, DTL__CAPXUSER, DTL__CAPXUOW, DTL__CAPXTIMESTAMP, DTL__CAPXROWID)
Formatter.avroIncludeBeforeImage
Controls whether the generated Avro schema and messages include a field for before-image data. Set this property to true to include this field. Set this property to false to not include this field.
If you include the before-image field, the field is populated with data for UPDATE operations, if you set the Extract.pwxUpdateImageOption property to enable the extraction of before-image data from the PowerExchange change stream. For DELETE and INSERT operations, the field is not populated with data.
The default value is true.
Formatter.avroIncludeInfaBigIntSequence
Controls whether the Avro Formatter generates an internal sequence value for each captured change record, which you can use to filter or sort messages in a change data stream. The generated value is in string format but can fit in a big integer column, if needed. For each captured change record, the Avro Formatter adds a column named INFA_BIGINT_SEQUENCE in the output messages to hold the generated sequence string. Valid values are:
  • false
    . Do not generate a sequence string and do not add the INFA_BIGINT_SEQUENCE column in the formatted messages.
  • true
    . Generate the sequence string and include it in the INFA_BIGINT_SEQUENCE column.
The default value is false.
When Formatter.avroIncludeInfaBigIntSequence is set to true, CDC Publisher generates a sequence string in the following form:
timestamp overflow + timestamp + sequence overflow + sequence
For example: 0158596108968700000000000000000003
The following elements make up the string:
  • Timestamp overflow
    . A 1-byte value that indicates whether the timestamp has overflowed its maximum value. The value 1 indicates an overflow occurred, and the value 0 indicates no overflow occurred.
  • Timestamp
    . A 13-byte timestamp value in string format. The timestamp is set once and remains consistent as long as the CDC Publisher process exists.
  • Sequence Overflow
    . A 1-byte value that indicates that the sequence number has overflowed its maximum value. The value 1 indicates an overflow occurred, and the value 0 indicates no overflow occurred.
  • Sequence
    . A 19-byte ascending sequential value for each record that is created. This number starts from 0 when the CDC Publisher process starts.
The combination of the timestamp overflow, timestamp, sequence overflow, and sequence number provides a unique ascending value for each record.
Usage considerations:
  • The generated sequence strings are not repeated if you restart the CDC Publisher process. The strings are unique for each CDC Publisher run. You can use the sequence strings at the target to compare and determine the original order in which they were processed at the source database.
  • Start time values on restart are sequential and ascending but not consecutive. Within one CDC Publisher run, the sequence number is ascending but might have gaps.
  • You cannot use the sequence strings to determine if operations are missing, delivered, or received at the target.
Formatter.avroBinaryStringRepresentationType
If you set the Formatter.avroBinaryAsString property to true or use a generic Avro format, indicates whether binary data is represented as a hexadecimal string or base64 string. Valid values are:
  • hexadecimal
  • base64
The default value is base64.
Formatter.avroCheckZeroScale
Before writing data to Kafka messages, PowerExchange CDC Publisher converts numeric data types to decimal data types for fields that have a scale value of zero or null, by default. If you want to retain the original data type, as defined in PowerExchange, set this property to true. Valid values are:
  • true
    . Do not convert a numeric data type to a decimal data type when a field has a zero or null scale value. Retain the original data type.
  • false
    . Convert a numeric data type to a decimal data type when a field has a zero or null scale value.
The default value is false.
Formatter.avroDisplaySchemaWithEscapedQuotes
If you use Confluent Schema Registry in a Kafka target environment and need to manually add an Avro schema to the registry as a single string that is delimited by double-quotation marks, set this parameter to true to use a backslash (/) as the escape character that precedes the double-quotation marks. Then run the PwxCDCAdmin utility with the REPORT=FORMAT parameter to generate a schema definition that includes the escape character before each delimiter, for example,
/"
schema_string
/"
. You can then use the generated schema definition to add the schema to Confluent Schema Registry. The default value is false, which disables the use of escaped double-quotation marks in generated schema.
Formatter.avroSchemaPrintDefaultFields
Controls whether Avro schemas include the "default" fields. If you need to reduce the schema size, you can set this property to false to exclude the default fields. The default value is true, which includes the default fields.
Formatter.avroSchemaPrintDocFields
Controls whether Avro schemas include the "doc" fields. The doc fields include metadata such as the CDC and PowerExchange datatypes, precision, and scale. If you need to reduce the schema size, you can set this property to false to exclude the doc fields. The default value is true, which includes the doc fields.
Formatter.avroSchemaPrintPretty
Controls whether Avro schemas include spaces and line feeds to improve legibility. If you need to reduce the schema size, you can set this property to false to exclude the spaces and lines feeds. The default is true, which includes the spaces and line feeds.
Formatter.avroUseOriginalLogicalTimeConvertToUtc
PowerExchange CDC Publisher represents Avro logical date and time values in UTC by default. Previously, CDC Publisher converted Avro logical date and time values to long epoch values by using a UTC time zone. If you want to revert to the previous behavior, set the Formatter.avroUseOriginalLogicalTimeConvertToUtc property to true. Valid values are:
  • true
    . Use the original behavior which converts Avro logical date and time values to long epoch values by using a UTC time zone.
  • false
    . Recommended. CDC Publisher does not convert values.
The default value is false.
Formatter.avroWrapperSchemaFormat
Enables the use of an Avro "wrapper" schema format. The wrapper schema can be used to describe any source object. The wrapper, or parent, schema consists of four fields for each source object: the sequence number of the change record, source table name, change operation type, and the "wrapped" Avro child schema expressed as a large string. The consumer application can then parse the underlying data and put it in the proper Avro format for the source object. To use a wrapper schema format, set this property to
avroWrapperSchemaFormatV1
. No default value is provided. For more information, see Avro Wrapper Schema Format.
Formatter.avroUseLogicalDateType
Formatter.avroUseLogicalDecimalType
Formatter.avroUseLogicalTimeMillisType
Formatter.avroUseLogicalTimeMicrosType
Formatter.avroUseLogicalTimestampMillisType
Formatter.avroUseLogicalTimestampMicrosType
If you use Avro logical types for dates, decimal values, times, or timestamps and want the CDC Publisher to make a best-effort attempt to process these logical types, set this property to true. The following sets of properties are mutually exclusive so specify one property or the other but not both:
  • Formatter.avroUseLogicalTimeMillisType and Formatter.avroUseLogicalTimeMicrosType
  • Formatter.avroUseLogicalTimestampMillisType and Formatter.avroUseLogicalTimestampMicrosType
The default value for each of these properties is false.
If you set a property to true, make sure that the source fields are defined in the extraction map with a compatible data type, scale, and precision.
Formatter.captureColumnValuesFile
Identifies the full path and file name of the file in which you optionally define rules that the Formatter uses to generate a composite message key for each source table. Each rule specifies the column or columns to be included in the generated message key. PowerExchange CDC Publisher includes the generated message key in the messages that it sends to the target messaging system. The target messaging system writes the messages that contain the same key value to the same partition in the target topics. For example, if the rule identifies the DEPT column as the message key for a source table, all records that contain a specific DEPT value, such as "Finance," will be sent to the same topic partition. When using a message key to write messages to the same topic partition, the target messaging system can maintain the order in which the messages were received from the CDC Publisher. For information about defining rules, see Generating Composite Message Keys for Source Tables.
Formatter.formatterAddTimestampColumn
Indicates whether the PowerExchange CDC Publisher adds a timestamp column to the generated Avro schema and formatted output messages to represent the date and time at which the Formatter processed the incoming change records. Valid values are:
  • false
    . Do not add the timestamp column.
  • true
    . Add the timestamp column.
Default is false.
If you set this property to true, you can optionally specify the column name, timestamp format, time zone, and source in the following properties: Formatter.formatterAddedTimestampColumnName, Formatter.formatterAddedTimestampColumnFormat, Formatter.formatterAddedTimestampColumnTimezone, and Formatter.formatterAddedTimestampUseSource properties.
Formatter.formatterAddedTimestampColumnFormat
If Formatter.formatterAddTimestampColumn is set to true, you can use this property to specify a date and time pattern string that indicates the format of the timestamp values in the added timestamp metadata column. Enter any date and time pattern that the Java class SimpleDateFormat supports for formatting dates and times. For more information, see https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. Default is yyyy/MM/dd HH:mm:ss.SSS.
Formatter.formatterAddedTimestampColumnTimezone
If Formatter.formatterAddTimestampColumn is set to true, you can use this property to control the time zone in which the timestamp value in the added timestamp metadata column is reported. Valid values are:
  • local
    . The local time zone where the CDC Publisher runs.
  • UTC
    . Coordinated Universal Time.
Default is
local
.
Formatter.formatterAddedTimestampUseSource
If Formatter.formatterAddTimestampColumn is set to true, you can use this property to indicate whether the added timestamp column contains PowerExchange-generated DTL__CAPXTIMESTAMP timestamp values included in captured source records. Valid values are:
  • false
    . Do not write the DTL__CAPXTIMESTAMP values to the added timestamp column.
  • true
    . Write the DTL__CAPXTIMESTAMP values to the added timestamp column.
Default is false.
Formatter.formatterAddedTimestampColumnName
If the Formatter.formatterAddTimestampColumn property is set to true, you can use this property to specify the name of the added timestamp metadata column. This column will appear in the generated Avro schema and formatted output messages. Enter an alphanumeric string. Default is INFA_TIME_CREATED.
Formatter.generateCommitDML
Indicates whether the Formatter generates messages for transaction commit operations. Also indicates whether the Formatter generates a commit message for each source table that was updated by the committed transaction or generates one commit message for all of the updated tables by using the schema of the last updated table. Valid values are:
  • none
    . Do not generate messages for commit operations.
  • LAST_TABLE
    . Generate a single commit message for all source tables that the transaction updated. The Formatter generates the commit message by using the Avro schema of the last source table that was updated by the transaction.
  • ALL_TABLES
    . Generate a commit message for each source table that was updated by the transaction. Consider using this option if you configured CDC Publisher to generate one topic per source table.
Default is
none
.
If you enable the generation of commit messages, you can optionally set the Connector.kafkaCommitDmlTopic and Connector.kafkaCommitDmlTopicFiltering properties in the cdcPublisherKafka.cfg file.

0 COMMENTS

We’d like to hear from you!