Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Spark Engine Optimization for Sqoop Pass-Through Mappings

Spark Engine Optimization for Sqoop Pass-Through Mappings

When you run a Sqoop pass-through mapping on the Spark engine, the Data Integration Service optimizes mapping performance in the following scenarios:
  • You read data from a Sqoop source and write data to a Hive target that uses the Text format.
  • You read data from a Sqoop source and write data to an HDFS target that uses the Flat, Avro, or Parquet format.
If you want to disable the performance optimization, set the --infaoptimize argument to false in the JDBC connection or Sqoop mapping. For example, if you see data type issues after you run an optimized Sqoop mapping, you can disable the performance optimization.
Use the following syntax:
--infaoptimize false

Rules and Guidelines for Sqoop Spark Engine Optimization

Consider the following rules and guidelines when you run Sqoop mappings on the Spark engine:
  • The Data Integration Service does not optimize mapping performance in the following scenarios:
    • There are unconnected ports between the source and target in the mapping.
    • The data types of the source and target in the mapping do not match.
    • You write data to a partitioned Hive target table.
    • You run a mapping on an Azure HDInsight cluster that uses WASB to write data to an HDFS complex file target of the Parquet format.
  • If you configure Hive-specific Sqoop arguments to write data to a Hive target, Sqoop ignores the arguments.
  • If you configure a delimiter for a Hive target table that is different from the default delimiter, Sqoop ignores the delimiter.

0 COMMENTS

We’d like to hear from you!