Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Spark Engine Optimization for Sqoop Pass-Through Mappings

Spark Engine Optimization for Sqoop Pass-Through Mappings

When you run a pass-through mapping with a Sqoop source on the Spark engine, the Data Integration Service optimizes mapping performance in the following scenarios:
  • You write data to a Hive target that uses the Text format.
  • You write data to a Hive target that was created with a custom DDL query.
  • You write data to an Hive target that is either partitioned with a custom DDL query or partitioned and bucketed with a custom DDL query.
  • You write data to an existing Hive target that is both partitioned and bucketed.
  • You write data to an HDFS target that uses the Flat, Avro, or Parquet format.
If you want to disable the performance optimization, set the --infaoptimize argument to false in the JDBC connection or Sqoop mapping. For example, if you see data type issues after you run an optimized Sqoop mapping, you can disable the performance optimization.
Use the following syntax:
--infaoptimize false

Rules and Guidelines for Sqoop Spark Engine Optimization

Consider the following rules and guidelines when you run Sqoop mappings on the Spark engine:
  • The Data Integration Service does not optimize mapping performance in the following scenarios:
    • There are unconnected ports between the source and target in the mapping.
    • The data types of the source and target in the mapping do not match.
    • You write data to an existing Hive target table that is either partitioned or bucketed.
    • You run a mapping on an Azure HDInsight cluster that uses WASB to write data to an HDFS complex file target of the Parquet format.
    • The date or time data type in the Sqoop source is mapped to the timestamp data type in the Hive target.
    • The Sqoop source contains a decimal column and the target is a complex file.
    • Mapping that reads Decimal or Double data types from an Sqoop source and writes to an HDFS target that uses the Parquet format.
  • If you configure Hive-specific Sqoop arguments to write data to a Hive target, Sqoop ignores the arguments.
  • If you configure a delimiter for a Hive target table that is different from the default delimiter, Sqoop ignores the delimiter.

0 COMMENTS

We’d like to hear from you!