Sizing Guidelines and Performance Tuning for Big Data Streaming 10.2.1

Back Next

Tune Spark Parameters

Tune the Spark parameters in the Hadoop connection.

You can configure the following parameters based on the input data rate, mapping complexity, and concurrency of mappings:

spark.executor.cores: The number of cores to use on each executor.
Recommended value: Specify 3 to 4 cores for each executor. Specifying a higher number of cores might lead to performance degradation.
spark.executor.memory: The amount of memory to use for each executor process.
Recommended value: Specify a value of 8 GB.
spark.driver.memory: The amount of memory to use for the driver process.
Recommended value: Specify a value of 8 GB.
spark.driver.cores: The number of cores to use for each driver process.
Recommended value: Specify 8 cores.
spark.executor.instances: The total number of executors to be started. This number depends on number of machines in the cluster, memory allocated, and cores per machine.
Configure the number of executor instances based on the following deployment types:
Sandbox deployment. 4
Small deployment. 14
Medium deployment. 27
Large deployment. 262
spark.sql.shuffle.partitions: The total number of partitions used for a SQL shuffle operation.
Recommended value: Specify a value that equals the total number of executor cores if total executor cores allocated is less than 200. Maximum value is 200.

Configure the partitions based on the following deployment types:
Sandbox deployment. 16
Small deployment. 56
Medium deployment. 108
Large deployment. 200
spark.kryo.registrationRequired: Indicates whether registration with Kryo is required.
Recommended value: True
spark.kryo.classesToRegister: The comma-separated list of custom class names to register with Kryo if you use Kyro serialization.
Specify the following value for all deployment types:
org.apache.spark.sql.catalyst.expressions.GenericRow,[Ljava.lang.Object;, org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema, org.apache.spark.sql.types.StructType,[Lorg.apache.spark.sql.types.StructField;, org.apache.spark.sql.types.StructField, org.apache.spark.sql.types.StringType$, org.apache.spark.sql.types.Metadata, scala.collection.immutable.Map$EmptyMap$ [Lorg.apache.spark.sql.catalyst.InternalRow;, scala.reflect.ClassTag$$anon$1,java.lang.Class
ExecutionContextOptions.Spark.StreamingDropEmptyBatches: To prevent Spark from creating jobs and tasks when there are no messages to be processed in a batch, set this parameter to true. You can configure this property in the
Custom Properties
tab of the Data Integration Service.

Rename Saved Search

Table of Contents

Sizing Guidelines and Performance Tuning for Big Data Streaming 10.2.1

Sizing Guidelines and Performance Tuning for Big Data Streaming 10.2.1

Tune Spark Parameters

Tune Spark Parameters