Sizing Guidelines and Performance Tuning for Big Data Streaming 10.2.1

Sizing Guidelines and Performance Tuning for Big Data Streaming 10.2.1

Lookup Transformation

Lookup Transformation

Consider the following restrictions when you optimize the Lookup transformation to run on the Spark engine:
Data skew
Data skew refers to uneven distribution of data. Spark engine optimization might lead to data skew among executors due to the location of the data. To avoid such an issue, you can set the
spark.shuffle.reduceLocality.enabled
property to false.
When the
spark.shuffle.reduceLocality.enabled
property is set to false, the shuffle behaviour is impacted.
Inefficient lookup partitioning
Mapping performance might degrade due to inefficient lookup partitioning and caching.
To configure cache partitioning for a lookup transformation, perform the following steps:
  • Set the value of
    infaspark.lookup.repartition.partitions
    property equal to the number of source topic partitions. For example, if a Kafka topic has 18 partitions, set the value of
    infaspark.lookup.repartition.partitions
    property to 18.
  • Set the value of
    infaspark.lookup.persist.enabled
    property to true.
Data duplication
Avoid data duplication in lookup source. If lookup data is unique, configure to return all rows on multiple matches.

0 COMMENTS

We’d like to hear from you!