Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in a Hadoop Environment
  5. Mappings in the Native Environment
  6. Profiles
  7. Native Environment Optimization
  8. POWERCENTERHELP
  9. Data Type Reference

Data Types in a Hadoop Environment

Data Types in a Hadoop Environment

When you push data types to a Hadoop environment, some variations apply in the processing and validity of data types because of differences between the environments.
The following variations apply in data type processing and validity:
  • If a transformation in a mapping has a port with a Binary data type, you can validate and run the mapping in a Hadoop environment.
  • You can use high precision Decimal data type with Hive 0.11 and above. When you run mappings in a Hadoop environment, the Data Integration Service converts decimal values with a precision greater than 38 digits to double values. When you run mappings that do not have high precision enabled, the Data Integration Service converts decimal values to double values.
  • When you run a mapping with a Hive target that uses the Double data type, the Data Integration Service processes the double data up to 17 digits after the decimal point.
  • The results of arithmetic operations on floating point types, such as a Double or a Decimal, can vary up to 0.1 percent between the native environment and a Hadoop environment.
  • Hive complex data types in a Hive source or Hive target are not valid.
  • When the Data Integration Service converts a decimal with a precision of 10 and a scale of 3 to a string data type and writes to a flat file target, the results can differ between the native environment and a Hadoop environment. For example, in a Hadoop environment, HDFS writes the output string for the decimal 19711025 with a precision of 10 and a scale of 3 as 1971. In the native environment, the flat file writer sends the output string for the decimal 19711025 with a precision of 10 and a scale of 3 as 1971.000.
  • Hive uses a maximum or minimum value for BigInt and Integer data types when there is data overflow during data type conversion. Mapping results can vary between the native and Hadoop environment when there is data overflow during data type conversion for BigInt and Integer data types.


Updated July 03, 2018