Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Amazon EMR Distribution Tips

Amazon EMR Distribution Tips

Consider the following performance tuning and troubleshooting tips for the Amazon EMR distribution.

Tuning Amazon EMR Ephemeral Clusters

When you deploy ephemeral Amazon EMR clusters, consider the following best practices to improve workflow performance:
  • Deploy the cluster in the same region and availability zone as the EC2 instance where you deployed the Informatica domain.
  • Deploy the cluster with a small number of core EC2 instances. The default number of core EC2 instances is 2. Scale up or scale down depending on your requirements.
  • Use Amazon S3 buckets to archive all Hadoop and application logs for future analysis.

Best Practices for Amazon EMR Auto-Scaling

You can enable auto-scaling to dynamically adjust to performance thresholds on the Spark engine. Consider the following best practices for auto-scaling rules:
Implement auto-scaling rules for Spark applications that run for at least 10 minutes.
By default, the AWS CloudWatch auto-scaling evaluation period is 5 minutes. Another 5 to 7 minutes are required to add additional nodes to the cluster. If Spark applications do not run for at least 10 minutes, it is more cost-efficient to disable auto-scaling on the ephemeral cluster.
Always specify MinCapacity and MaxCapacity
MinCapacity and MaxCapacity determine the minimum and maximum number of cluster nodes that auto- scaling rules can create. You can set MinCapacity to the default number of core nodes that are created on the ephemeral cluster.

0 COMMENTS

We’d like to hear from you!