Rules and Guidelines for Processing Hierarchical Data on the Spark Engine
Rules and Guidelines for Processing Hierarchical Data on the Spark Engine
There are processing differences when you work with complex data types in a mapping that runs on the Spark engine.
Consider the following rules and guidelines when you use complex data types in a mapping that runs on the Spark engine:
You cannot read hierarchical data from or write hierarchical data to a Hive source in a dynamic mapping.
When you read hierarchical data from a Hive source, you cannot enable Hive LLAP for Hive queries.
When you read hierarchical data from a Hive source, the Spark engine converts float type data to double. Use the double data type when you read from and write to a Hive source to prevent precision errors.
When you write date/time data within a complex data type to a Hive target using HDP 3.1, configure the timezone as UTC. In the Hadoop connection Spark advanced properties, append “-Duser.timezone=UTC” to the end of the value for the following properties: