Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Data Preview
  8. Cluster Workflows
  9. Profiles
  10. Monitoring
  11. Hierarchical Data Processing
  12. Hierarchical Data Processing Configuration
  13. Hierarchical Data Processing with Schema Changes
  14. Intelligent Structure Models
  15. Stateful Computing
  16. Connections
  17. Data Type Reference
  18. Function Reference

Spark Engine Optimization for Sqoop Pass-Through Mappings

Spark Engine Optimization for Sqoop Pass-Through Mappings

When you run a pass-through mapping with a Sqoop source on the Spark engine, the Data Integration Service optimizes mapping performance in the following scenarios:
  • You write data to a Hive target that uses the Text format.
  • You write data to a Hive target that was created with a custom DDL query.
  • You write data to an existing Hive target that is either partitioned with a custom DDL query or partitioned and bucketed with a custom DDL query.
  • You write data to an existing Hive target that is both partitioned and bucketed.
  • You write data to an HDFS target that uses the Flat, Avro, or Parquet format.
If you want to disable the performance optimization, set the --infaoptimize argument to false in the JDBC connection or Sqoop mapping. For example, if you see data type issues after you run an optimized Sqoop mapping, you can disable the performance optimization.
Use the following syntax:
--infaoptimize false

Rules and Guidelines for Sqoop Spark Engine Optimization

Consider the following rules and guidelines when you run Sqoop mappings on the Spark engine:
  • The Data Integration Service does not optimize mapping performance in the following scenarios:
    • There are unconnected ports between the source and target in the mapping.
    • The data types of the source and target in the mapping do not match.
    • You write data to an existing Hive target table that is either partitioned or bucketed.
    • You run a mapping on an Azure HDInsight cluster that uses WASB to write data to an HDFS complex file target of the Parquet format.
  • If you configure Hive-specific Sqoop arguments to write data to a Hive target, Sqoop ignores the arguments.
  • If you configure a delimiter for a Hive target table that is different from the default delimiter, Sqoop ignores the delimiter.


Updated January 20, 2020