Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Case Study: Sqoop TDCH Export and Import

Case Study: Sqoop TDCH Export and Import

The following case study uses a simple pass-through mapping that reads data from Teradata and writes to HDFS or Hive using TDCH for Sqoop. It also reads data from HDFS or Hive and writes to Teradata.

Environment

Chipset
Intel® Xeon® Processor X5675 @ 3.2 GHz
Cores
2 x 6 cores
Memory
256 GB
Operating system
Red Hat Enterprise Linux 7.0
Hadoop distribution
Cloudera Enterprise 5.11.1
Hadoop cluster
7 nodes
Data set
TPC-H.Lineitem SF-10, ~7.5 GB. 16 Col, 600 Million Rows, Row Size- ~405 Bytes

Performance Chart

The following chart shows the execution time for TDCH export:
The following chart shows the execution time for TDCH import:

Conclusions

  • For the Sqoop writer, the number of mappers increased from the default 4 to 144. With the maximum session restriction for the internal.fastLoad method, the actual sessions created were 25.
  • For the Sqoop reader, the number of mappers increased from the default 1 to 144. Default value is 1 because the table has a primary key defined. When the number of mappers increase, set the value of the
    spark.executor.instances
    property equal to the number of mappers for optimal performance.

0 COMMENTS

We’d like to hear from you!