A Kafka data object is a physical data object that represents data in a Kafka stream. After you configure a Messaging connection, create a Kafka data object to read from or write to Apache Kafka brokers.
Kafka runs as a cluster comprised of one or more servers each of which is called a broker. Kafka brokers stream data in the form of messages. These messages are published to a topic. When you configure the Kafka data object, specify the name of the topic that you read from. Similarly, when you write data to a Kafka messaging stream, specify the name of the topic that you publish to. You can also read from or write to a Kerberised Kafka cluster.
Kafka topics are divided into partitions. Spark Streaming can read the partitions of the topics in parallel. This gives better throughput and could be used to scale the number of messages processed. Message ordering is guaranteed only within partitions. For optimal performance you should have multiple partitions.
When you write to Kafka brokers, you can use the
output ports. You can override these ports when you create the mapping.
You can create or import a Kafka data object.
After you create a Kafka data object, create a read and write operation is created. You can use the Kafka data object read operation as a source and the Kafka data object write operation as a target in Streaming mappings. If you want to configure high availability for the mapping, ensure that the Kafka cluster is highly available.
When you configure the data operation read properties, you can specify the time from which the Kafka source starts reading Kafka messages from a Kafka topic.
When you configure the data operation properties, specify the format in which the Kafka data object reads or writes data. You can specify XML, JSON, Avro, or flat as format. When you specify XML format, you must provide an XSD file. When you specify JSON or Avro format, you must provide a sample file.
You can pass any payload format directly from source to target in Streaming mappings. You can project columns in binary format pass a payload from source to target in its original form or to pass a payload format that is not supported.
Streaming mappings can read, process, and write hierarchical data. You can use array, struct, and map complex data types to process the hierarchical data. You assign complex data types to ports in a mapping to flow hierarchical data. Ports that flow hierarchical data are called complex ports.
For more information about processing hierarchical data, see the