Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Rules and Guidelines for Hive Engine Processing

Rules and Guidelines for Hive Engine Processing

Some restrictions and guidelines apply to processing Informatica functions on the Hive engine.
When you push a mapping to the Hadoop environment, the engine that processes the mapping uses a set of rules different from the Data Integration Service. As a result, the mapping results can vary based on the rules that the engine uses. This topic contains some processing differences that Informatica discovered through internal testing and usage. Informatica does not test all the rules of the third-party engines and cannot provide an extensive list of the differences.
Consider the following rules and guidelines for function and data type processing on the Hive engine:
  • The Hive engine and the Data Integration Service process overflow values differently. The Hive engine processing rules might differ from the rules that the Data Integration Service uses. As a result, mapping results can vary between the native and Hadoop environment when the Hive engine processes an overflow. Consider the following processing variations for Hive:
    • Hive uses a maximum or minimum value for integer and bigint data when there is data overflow during data type conversion.
    • If an expression results in numerical errors, such as division by zero or SQRT of a negative number, it returns an infinite or an NaN value. In the native environment, the expression returns null values and the rows do not appear in the output.
  • The Hive engine and the Data Integration Service process data type conversions differently. As a result, mapping results can vary between the native and Hadoop environment when the Hive engine performs a data type conversion. Consider the following processing variations for Hive:
    • The results of arithmetic operations on floating point types, such as Decimal, can vary up to 0.1 percent between the native environment and a Hadoop environment.
    • You can use high precision Decimal data type with Hive 0.11 and above. When you run mappings on the Hive engine, the Data Integration Service converts decimal values with a precision greater than 38 digits to double values. When you run mappings that do not have high precision enabled, the Data Integration Service converts decimal values to double values.
    • When the Data Integration Service converts a decimal with a precision of 10 and a scale of 3 to a string data type and writes to a flat file target, the results can differ between the native environment and a Hadoop environment. For example, on the Hive engine, HDFS writes the output string for the decimal 19711025 with a precision of 10 and a scale of 3 as 1971. The Data Integration Service sends the output string for the decimal 19711025 with a precision of 10 and a scale of 3 as 1971.000.
    • The results of arithmetic operations on floating point types, such as a Double, can vary up to 0.1 percent between the Data Integration Service and the Hive engine.
    • When you run a mapping with a Hive target that uses the Double data type, the Data Integration Service processes the double data up to 17 digits after the decimal point.
  • The Hadoop environment treats "/n" values as null values. If an aggregate function contains empty or NULL values, the Hadoop environment includes these values while performing an aggregate calculation.
  • Avoid including single and nested functions in an Aggregator transformation. The Data Integration Service fails the mapping in the native environment. It can push the processing to the Hadoop environment, but you might get unexpected results. Informatica recommends creating multiple transformations to perform the aggregation.
  • The UUID4 function is supported only when used as an argument in UUID_UNPARSE or ENC_BASE64.
  • The UUID_UNPARSE function is supported only when the argument is UUID4( ).

0 COMMENTS

We’d like to hear from you!