Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Back Next

Tune the Spark Engine

When you develop mappings in the Developer tool to run on the Spark engine, consider the following prerequisites, tuning recommendations, and performance best practices.

Meet the following prerequisites:

On the Hadoop cluster, configure the Spark History Server.

On the Hadoop cluster, enable the Spark Shuffle Service.

To run mappings on the Spark engine, configure the Hadoop connection with the location of the Spark HDFS staging directory and the Spark event log directory.

Use the same directory as the event log directory from which the Spark History Server is reading. The Spark Event Log Directory is the base directory that logs Spark events. Within this base directory, Spark creates a subdirectory for each application, and logs the events specific to the application in this directory.

For more information about configuring Spark History Server and Spark Shuffle Service, refer to the Hadoop distribution documentation or the Apache Spark documentation.

For more information about configuring the Hadoop connection, refer to the

Informatica Big Data Management User Guide

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Spark Configuration

Transformation Optimization

Troubleshooting Spark Job Failures

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal