Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Streaming
  3. Data Engineering Streaming Administration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Transformation in Streaming Mappings
  8. Window Transformation
  9. Appendix A: Connections
  10. Appendix B: Monitoring REST API Reference
  11. Appendix C: Sample Files

Confluent Kafka Data Objects

Confluent Kafka Data Objects

A Confluent Kafka data object is a physical data object that represents data in a Kafka stream or a Confluent Kafka stream. After you configure a Messaging connection, create a Confluent Kafka data object to write data to Kafka brokers or Confluent Kafka brokers using schema registry.
Confluent Kafka runs as a cluster comprised of one or more servers each of which is called a broker. Confluent Kafka brokers stream data in the form of messages. These messages are published to a topic.
Confluent Kafka topics are divided into partitions. The Spark engine can write to the partitions of the topics in parallel to achieve better throughput and to scale the number of messages processed. Message ordering is guaranteed only within partitions. For optimal performance you should have multiple partitions.

Write Operation in Confluent Kafka

You can use the Confluent Kafka data object write operation as a target in streaming mappings. By default, the write operation is created for Confluent Kafka.

File Format in Confluent Kafka

When you configure the data operation properties, specify the format in which the Confluent Kafka data object writes data.
You can specify XML, JSON, Avro, or Flat as format for Kafka data objects. When you specify XML format, you must provide a XSD file. When you specify JSON or Flat format, you must provide a sample file. When you specify Avro format, provide a sample Avro schema in an .avsc file.
You can specify Avro as the format for Confluent Kafka data objects using schema registry.
Streaming mappings can read, process, and write hierarchical data. You can use array, struct, and map complex data types to process the hierarchical data. You assign complex data types to ports in a mapping to flow hierarchical data. Ports that flow hierarchical data are called complex ports.
For more information about processing hierarchical data, see the
Data Engineering Integration User Guide
.
In Databricks environment, a streaming mapping fails when you enable schema registry in the connection properties of the Confluent Kafka data object.

0 COMMENTS

We’d like to hear from you!