Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Administration
  3. Authentication
  4. Running Mappings on a Cluster with Kerberos Authentication
  5. Authorization
  6. Cluster Configuration
  7. Cloud Provisioning Configuration
  8. Data Integration Service Processing
  9. Appendix A: Connections Reference
  10. Appendix B: Monitoring REST API

Data Engineering Administrator Guide

Data Engineering Administrator Guide

Tuning the Spark Engine

Tuning the Spark Engine

Tune the Spark engine according to a deployment type that defines the big data processing requirements on the Spark engine. When you tune the Spark engine, the autotune command configures the Spark advanced properties in the Hadoop connection.
The following table describes the advanced properties that are tuned:
Property
Description
spark.driver.memory
The driver process memory that the Spark engine uses to run mapping jobs.
spark.executor.memory
The amount of memory that each executor process uses to run tasklets on the Spark engine.
spark.executor.cores
The number of cores that each executor process uses to run tasklets on the Spark engine.
spark.sql.shuffle.partitions
The number of partitions that the Spark engine uses to shuffle data to process joins or aggregations in a mapping job.
The following table lists the tuned value for each advanced property based on the deployment type:
Property
Sandbox
Basic
Standard
Advanced
spark.driver.memory
1 GB
2 GB
4 GB
4 GB
spark.executor.memory
2 GB
4 GB
6 GB
6 GB
spark.executor.cores
2
2
2
2
spark.sql.shuffle.partitions
100
400
1500
3000

0 COMMENTS

We’d like to hear from you!