Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in a Hadoop Environment
  5. Mappings in the Native Environment
  6. Profiles
  7. Native Environment Optimization
  8. POWERCENTERHELP
  9. Data Type Reference

Transformations in a Hadoop Environment

Transformations in a Hadoop Environment

Due to the differences between native environment and Hadoop environment only certain transformations are valid or valid with restrictions in the Hadoop environment. The Data Integration Service does not process transformations that contain functions, expressions, data types, and variable fields that are not valid in a Hadoop environment.
The Blaze engine might not support all transformations in a Hadoop environment. When the Data Integration Service finds a transformation that is not supported by the Blaze engine, it defaults to the Hive engine and runs the mapping on the Hadoop cluster.
The following table describes the rules and guidelines for transformations:
Transformation
Rules and Guidelines
Address Validator
You can push mapping logic that includes an Address Validator transformation to Hadoop if you use a Data Quality product license.
The following limitation applies to Address Validator transformations:
  • An Address Validator transformation does not generate a certification report when it runs in a mapping on Hadoop. If you select a certification report option on the transformation, the mapping validation fails when you attempt to push transformation logic to Hadoop.
Aggregator
An Aggregator transformation with pass-through fields is valid if they are group-by fields.
You can use the ANY function in an Aggregator transformation with pass-through fields to return any row.
Case Converter
The Data Integration Service can push a Case Converter transformation to Hadoop.
Comparison
You can push mapping logic that includes a Comparison transformation to Hadoop if you use a Data Quality product license.
Consolidation
You can push mapping logic that includes a Consolidation transformation to Hadoop if you use a Data Quality product license.
The following limitation applies to Consolidation transformations:
  • A Consolidation transformation may process records in a different order in native and Hadoop environments. The transformation may identify a different record as the survivor record in each environment.
Data Masking
You cannot use the following data masking techniques in mapping logic run on Hadoop clusters:
  • Repeatable expression masking
  • Unique repeatable substitution masking
Data Processor
The following limitations apply when a Data Processor transformation directly connects to a complex file reader:
  • Ports cannot be defined as file.
  • Input port must be defined as binary.
  • Output port cannot be defined as binary.
  • Pass-through ports cannot be used.
  • Additional input ports cannot be used.
The following limitations apply when a mapping has a Data Processor transformation:
  • Ports cannot be defined as file.
  • Ports cannot be defined as binary
  • Streamer cannot be defined as startup component.
The Data Processor transformation can use the following input and output formats:
  • ASN.1
  • Avro
  • Cobol
  • JSON
  • Parquet
  • XML
Decision
You can push mapping logic that includes a Decision transformation to Hadoop if you use a Data Quality product license.
Expression
An Expression transformation with a user-defined function returns a null value for rows that have an exception error in the function.
The Data Integration Service returns an infinite or a NaN (not a number) value when you push transformation logic to Hadoop for expressions that result in numerical errors. For example:
  • Divide by zero
  • SQRT (negative number)
  • ASIN (out-of-bounds number)
In the native environment, the expressions that result in numerical errors return null values and the rows do not appear in the output.
Filter
The Data Integration Service can push a Filter transformation to Hadoop.
Java
You must copy external JAR files that a Java transformation requires to the Informatica installation directory in the Hadoop cluster nodes at the following location:
[$HADOOP_NODE_INFA_HOME]/services/shared/jars/platform/dtm/
You can optimize the transformation for faster processing when you enable an input port as a partition key and sort key. The data is partitioned across the reducer tasks and the output is partially sorted.
The following limitations apply to the Transformation Scope property:
  • The value Transaction for transformation scope is not valid.
  • If transformation scope is set to Row, a Java transformation is run by mapper script.
  • If you enable an input port for partition Key, the transformation scope is set to All Input. When the transformation scope is set to All Input, a Java transformation is run by the reducer script and you must set at least one input field as a group-by field for the reducer key.
You can enable the Stateless advanced property when you run mappings in a Hadoop environment.
The Java code in the transformation cannot write output to standard output when you push transformation logic to Hadoop. The Java code can write output to standard error which appears in the log files.
Joiner
A Joiner transformation cannot contain inequality joins or parameters in the outer join condition.
Key Generator
You can push mapping logic that includes a Key Generator transformation to Hadoop if you use a Data Quality product license.
Labeler
You can push mapping logic that includes a Labeler transformation to Hadoop when you configure the transformation to use probabilistic matching techniques.
You can push mapping logic that includes all types of Labeler configuration if you use a Data Quality product license.
Lookup
The following limitations apply to Lookup transformations:
  • An unconnected Lookup transformation is not valid.
  • You cannot configure an uncached lookup source.
  • You cannot configure a persistent lookup cache for the lookup source.
  • You cannot use a Hive source for a relational lookup source.
  • When you run mappings that contain Lookup transformations, the Data Integration Service creates lookup cache Jar files. Hive copies the lookup cache JAR files to the following temporary directory:
    /tmp/<user_name>/hive_resources
    . The Hive parameter
    hive.downloaded.resources.dir
    determines the location of the temporary directory. You can delete the lookup cache JAR files specified in the LDTM log after the mapping completes to retrieve disk space.
Match
You can push mapping logic that includes a Match transformation to Hadoop if you use a Data Quality product license.
The following limitation applies to Match transformations:
  • A Match transformation generates cluster ID values differently in native and Hadoop environments. In a Hadoop environment, the transformation appends a group ID value to the cluster ID.
Merge
The Data Integration Service can push a Merge transformation to Hadoop.
Parser
You can push mapping logic that includes a Parser transformation to Hadoop when you configure the transformation to use probabilistic matching techniques.
You can push mapping logic that includes all types of Parser configuration if you use a Data Quality product license.
Rank
A comparison is valid if it is case sensitive.
Router
The Data Integration Service can push a Router transformation to Hadoop.
Sorter
The Data Integration service ignores the Sorter transformation when you push mapping logic to Hadoop.
SQL
The Data Integration Service can push SQL transformation logic to Hadoop.
You cannot use a Hive connection.
Standardizer
You can push mapping logic that includes a Standardizer transformation to Hadoop if you use a Data Quality product license.
Union
The custom source code in the transformation cannot write output to standard output when you push transformation logic to Hadoop. The custom source code can write output to standard error, that appears in the runtime log files.
Weighted Average
You can push mapping logic that includes a Weighted Average transformation to Hadoop if you use a Data Quality product license.


Updated July 03, 2018