Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Tune the TDCH for Sqoop Parameters

Tune the TDCH for Sqoop Parameters

When you read data from or write data to Teradata, you can use Teradata Connector for Hadoop (TDCH) specialized connectors for Sqoop. When you run Sqoop mappings on the Blaze and Spark engines, you can configure the Cloudera Connector Powered by Teradata and Hortonworks Connector for Teradata.
The following table lists the parameters that you can tune:
Parameter
Applies To
Description
batch.insert
Writer
This Teradata target plugin associates an SQL JDBC session with each mapper in the TDCH job when loading a target table in Teradata.
internal.fastexport
Reader
This Teradata source plugin associates a FastExport JDBC session with each mapper in the TDCH job to retrieve data from the source table in Teradata.
internal.fastload
Writer
This Teradata target plugin associates a FastLoad JDBC session with each mapper in the TDCH job when loading a target table in Teradata.
split.by.amp
Reader
The connector creates one mapper per available Teradata AMP, and each mapper subsequently retrieves data from each AMP. As a result, no staging table is required.
split.by.hash
Reader
This input method is similar to the split.by.partition method. Instead of directly operating on value ranges of one column, this method operates on the hash of the column. Use this method to extract data in situations where split.by.value and split.by.partition are not appropriate.
split.by.partition
Reader
This method is preferred to extract a large amount of data from the Teradata system. Behavior of this method depends on whether the source table is partitioned or not.
split.by.value
Reader
This method creates input splits as ranges on the split by column, which is typically the table’s primary key. Each split is subsequently processed by a single mapper to transfer the data using SELECT queries.
The following image shows the additional Sqoop arguments that you specify at the mapping level:

Tuning Guidelines

Consider the following guidelines when you tune internal.fastload, internal.fastexport, and batch.insert methods:
  • internal.fastLoad. Informatica has a restriction on number of sessions that can be opened at a time to write to Teradata. Use the following formula to determine the number of sessions with an upper limit of 100 sessions per job:
    If number of AMPs <= 20, then use 1 per AMP. If number of AMPs > 20, then use (20 + (Number of AMPs / 20))
  • internal.fastexport. Uses 1 session per AMP with an upper limit of 4 sessions per job. If the number of mappers specified is more than the max number of sessions that can be opened, TDCH restricts the mappers to max sessions.
  • batch.insert. If the number of AMPs is high such as 170, 180, or more, then Informatica observed that the performance of the batch.insert method is better.

0 COMMENTS

We’d like to hear from you!