Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Rules and Guidelines for Processing Hierarchical Data on the Spark Engine

Rules and Guidelines for Processing Hierarchical Data on the Spark Engine

There are processing differences when you work with complex data types in a mapping that runs on the Spark engine.
Consider the following rules and guidelines when you use complex data types in a mapping that runs on the Spark engine:
  • When you read hierarchical data from a Hive source, you cannot enable Hive LLAP for Hive queries.
  • When you read hierarchical data from a Hive source, the Spark engine converts float type data to double. Use the double data type when you read from and write to a Hive source to prevent precision errors.
  • When you write date/time data within a complex data type to a Hive target using HDP 3.1, configure the timezone as UTC. In the Hadoop connection Spark advanced properties, append “-Duser.timezone=UTC” to the end of the value for the following properties:
    • spark.driver.extraJavaOptions
    • spark.executor.extraJavaOptions


Updated September 28, 2020