Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Back Next

Case Study: Sqoop TDCH Export and Import

The following case study uses a simple pass-through mapping that reads data from Teradata and writes to HDFS or Hive using TDCH for Sqoop. It also reads data from HDFS or Hive and writes to Teradata.

Environment

Chipset	Intel® Xeon® Processor X5675 @ 3.2 GHz
Cores	2 x 6 cores
Memory	256 GB
Operating system	Red Hat Enterprise Linux 7.0
Hadoop distribution	Cloudera Enterprise 5.11.1
Hadoop cluster	7 nodes
Data set	TPC-H.Lineitem SF-10, ~7.5 GB. 16 Col, 600 Million Rows, Row Size- ~405 Bytes

Performance Chart

The following chart shows the execution time for TDCH export:

The following chart shows the execution time for TDCH import:

Conclusions

For the Sqoop writer, the number of mappers increased from the default 4 to 144. With the maximum session restriction for the internal.fastLoad method, the actual sessions created were 25.

For the Sqoop reader, the number of mappers increased from the default 1 to 144. Default value is 1 because the table has a primary key defined. When the number of mappers increase, set the value of the

spark.executor.instances

property equal to the number of mappers for optimal performance.

Case Studies

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal