Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Back Next

Tune the TDCH for Sqoop Parameters

When you read data from or write data to Teradata, you can use Teradata Connector for Hadoop (TDCH) specialized connectors for Sqoop. When you run Sqoop mappings on the Blaze and Spark engines, you can configure the Cloudera Connector Powered by Teradata and Hortonworks Connector for Teradata.

The following table lists the parameters that you can tune:

Parameter	Applies To	Description
batch.insert	Writer	This Teradata target plugin associates an SQL JDBC session with each mapper in the TDCH job when loading a target table in Teradata.
internal.fastexport	Reader	This Teradata source plugin associates a FastExport JDBC session with each mapper in the TDCH job to retrieve data from the source table in Teradata.
internal.fastload	Writer	This Teradata target plugin associates a FastLoad JDBC session with each mapper in the TDCH job when loading a target table in Teradata.
split.by.amp	Reader	The connector creates one mapper per available Teradata AMP, and each mapper subsequently retrieves data from each AMP. As a result, no staging table is required.
split.by.hash	Reader	This input method is similar to the split.by.partition method. Instead of directly operating on value ranges of one column, this method operates on the hash of the column. Use this method to extract data in situations where split.by.value and split.by.partition are not appropriate.
split.by.partition	Reader	This method is preferred to extract a large amount of data from the Teradata system. Behavior of this method depends on whether the source table is partitioned or not.
split.by.value	Reader	This method creates input splits as ranges on the split by column, which is typically the table’s primary key. Each split is subsequently processed by a single mapper to transfer the data using SELECT queries.

The following image shows the additional Sqoop arguments that you specify at the mapping level:

Tuning Guidelines

Consider the following guidelines when you tune internal.fastload, internal.fastexport, and batch.insert methods:

internal.fastLoad. Informatica has a restriction on number of sessions that can be opened at a time to write to Teradata. Use the following formula to determine the number of sessions with an upper limit of 100 sessions per job:

If number of AMPs <= 20, then use 1 per AMP.

If number of AMPs > 20, then use (20 + (Number of AMPs / 20))

internal.fastexport. Uses 1 session per AMP with an upper limit of 4 sessions per job. If the number of mappers specified is more than the max number of sessions that can be opened, TDCH restricts the mappers to max sessions.

batch.insert. If the number of AMPs is high such as 170, 180, or more, then Informatica observed that the performance of the batch.insert method is better.

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Rename Saved Search

Table of Contents

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Performance Tuning and Sizing Guidelines for Informatica® Big Data Management 10.2.2

Tune the TDCH for Sqoop Parameters

Tune the TDCH for Sqoop Parameters

Tuning Guidelines