Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in a Hadoop Environment
  5. Mapping Objects in a Hadoop Environment
  6. Mappings in the Native Environment
  7. Profiles
  8. Native Environment Optimization
  9. Data Type Reference
  10. Function Reference
  11. Parameter Reference

Sqoop Mapping-Level Arguments

Sqoop Mapping-Level Arguments

In the Sqoop mapping, you can define the arguments that the Sqoop program must use to process the data. The Data Integration Service merges the additional Sqoop arguments that you specify in the mapping with the arguments that you specified in the JDBC connection and constructs the Sqoop command.
The Sqoop arguments that you specify in the mapping take precedence over the arguments that you specified in the JDBC connection. However, if you do not enable the Sqoop connector in the JDBC connection but enable the Sqoop connector in the mapping, the Data Integration Service does not run the mapping through Sqoop. The Data Integration Service runs the mapping through JDBC.
You can configure the following Sqoop arguments in a Sqoop mapping:
num-mappers
Defines the number of map tasks that the Sqoop program must use to import and export data in parallel.
Use the following syntax:
--num-mappers <number of map tasks>
For example, to use 8 map tasks, configure the num-mappers argument as follows:
--num-mappers 8
If you configure the num-mappers argument, you must also configure the split-by argument to specify the column based on which the Sqoop program must split the work units. If you do not configure the split-by argument, the value of the num-mappers argument defaults to 1.
Use the num-mappers argument to increase the degree of parallelism. You might have to test different values for optimal performance.
split-by
Defines the column based on which the Sqoop program splits work units.
Use the following syntax:
--split-by <column name>
You can configure the split-by argument to improve the performance. If the primary key does not have an even distribution of values between the minimum and maximum range, you can configure the split-by argument to specify another column that has a balanced distribution of data to split the work units.
If you do not configure the split-by argument, the Sqoop program uses the primary key of the table to split the work units.
Consider the following restrictions when you configure the split-by argument:
  • If you configure the split-by argument and the column based on which the work units will be split contains NULL values, the Sqoop program does not import the rows that contain NULL values. However, the mapping runs successfully and no error is written in the YARN log.
  • If you configure the split-by argument and the column based on which the work units will be split contains special characters, the Sqoop import process fails.
batch
Indicates that the Sqoop program must export data in batches.
Use the following syntax:
--batch
You can configure the batch argument to improve the performance.
For a complete list of the Sqoop arguments that you can configure, see the Sqoop documentation.


Updated July 03, 2018