Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Joiner Transformation

Joiner Transformation

You can optimize Joiner transformations to enable the Spark engine to efficiently perform a full outer join.
To increase memory for a full outer join and to determine shuffle partitions, perform the following two-step tuning process:
  1. Ensure every executor core has at least 3 GB of memory.
    For example, set spark.executor.memory=6 GB and spark.executor.cores=2.
  2. Set spark.sql.shuffle.partitions = <master splits> + <detailed partitions>.
    The spark.sql.shuffle.partitions property determines the number of partitions to use when shuffling data for joins or aggregations.
    For example, with a DFS block size of 256 MB, 100 GB of master data will have 400 splits and 200 GB of details will have 800 partitions.

0 COMMENTS

We’d like to hear from you!