Table of Contents

Search

  1. Preface
  2. Part 1: Introduction to Profiles
  3. Part 2: Profiling with Informatica Analyst
  4. Part 3: Profiling with Informatica Developer

Profile Guide

Profile Guide

Column Profiles for Sqoop Data Sources

Column Profiles for Sqoop Data Sources

You can run a column profile on data objects that use Sqoop. After you choose Hadoop as a validation environment, you can select the Blaze engine or Spark engine on the Hadoop connection to run the column profiles.
When you run a column profile on a logical data object or customized data object, you can configure the num-mappers argument to achieve parallelism and optimize performance. You must also configure the split-by argument to specify the column based on which Sqoop must split the work units.
Use the following syntax:
--split-by <column_name>
If the primary key does not have an even distribution of values between the minimum and maximum range, you can configure the split-by argument to specify another column that has a balanced distribution of data to split the work units.
If you do not define the split-by column, Sqoop splits work units based on the following criteria:
  • If the data object contains a single primary key, Sqoop uses the primary key as the split-by column.
  • If the data object contains a composite primary key, Sqoop defaults to the behavior of handling composite primary keys without the split-by argument. See the Sqoop documentation for more information.
  • If a data object contains two tables with an identical column, you must define the split-by column with a table-qualified name. For example, if the table name is CUSTOMER and the column name is FULL_NAME, define the split-by column as follows:
    --split-by CUSTOMER.FULL_NAME
  • If the data object does not contain a primary key, the value of the m argument and num-mappers argument default to 1.
When you use Cloudera Connector Powered by Teradata or Hortonworks Connector for Teradata and the Teradata table does not contain a primary key, the split-by argument is required.

0 COMMENTS

We’d like to hear from you!