Sizing Guidelines and Performance Tuning for Big Data Streaming 10.2.1

Sizing Guidelines and Performance Tuning for Big Data Streaming 10.2.1

Recommendations for Tuning the Kafka Cluster

Recommendations for Tuning the Kafka Cluster

Consider the following recommendations to tune the Kafka cluster:
  • Configure the Kafka cluster so that Big Data Streaming can produce and consume messages at the needed message ingestion rate.
  • To increase the rate of message consumption in Big Data Streaming, increase the number of Kafka brokers in the Kafka cluster and in the Kafka connection.
  • Increase the number of partitions on the Kafka topic. Ideally, the number of partitions can be equal to the number of CPU cores allocated to the executors. For example, if you set
    spark.executor.instances
    to 6 and
    spark.executor.cores
    to 3, there are 18 cores allocated. Then set the number of Kafka partitions to 18, so that there are 18 parallel tasks to read from the Kafka Source.
    For example, you can use the following command to specify the number of partitions:
    ./ kafka-topics.sh --create --zookeeper zookeeper_host_name1:zookeeper_port_number ,zookeeper_host_name2:zookeeper_port_number,zookeeper_host_name3:zookeeper_port_number --replication-factor 1 --partitions 18 --topic NewOSConfigSrc
  • Ensure that the Kafka producer is publishing messages to every partition in a load balanced manner.
  • Reduce the number of network hops between Big Data Streaming and the Kafka cluster. Ideally the Kafka broker must be on the same machine as the data node or the Kafka cluster can run on its own machines with a zero latency network.
  • Configure the
    batch.size
    and
    linger.ms
    properties to increase throughput. For each partition, the producer maintains buffers of unsent records. The
    batch.size
    property specifies the size of the buffer. To accumulate as many messages as possible in the buffer, configure a high value for the
    batch.size
    property.
    By default, the buffer sends messages immediately. To increase the time that the producer waits before sending messages in a batch, set the
    linger.ms
    property to 5 milliseconds.
  • Kafka scalability depends on disk and network performance. The test setup included 12 disks per node on a 10 GBPS network with an open file limit of 65000.

0 COMMENTS

We’d like to hear from you!