Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Back Next

Joiner Transformation

You can optimize Joiner transformations to enable the Spark engine to efficiently perform a full outer join.

To increase memory for a full outer join and to determine shuffle partitions, perform the following two-step tuning process:

Ensure every executor core has at least 3 GB of memory.

For example, set spark.executor.memory=6 GB and spark.executor.cores=2.

Set spark.sql.shuffle.partitions = <master splits> + <detailed partitions>.

The spark.sql.shuffle.partitions property determines the number of partitions to use when shuffling data for joins or aggregations.

For example, with a DFS block size of 256 MB, 100 GB of master data will have 400 splits and 200 GB of details will have 800 partitions.

Watch

Comments

0 COMMENTS

We’d like to hear from you! Log in to comment.