Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in the Hadoop Environment
  5. Mapping Objects in the Hadoop Environment
  6. Monitoring Mappings in the Hadoop Environment
  7. Mappings in the Native Environment
  8. Profiles
  9. Native Environment Optimization
  10. Data Type Reference
  11. Function Reference
  12. Parameter Reference
  13. Multiple Blaze Instances on a Cluster

Transformation Support on the Spark Engine

Transformation Support on the Spark Engine

Some restrictions and guidelines apply to processing transformations on the Spark engine.
The following table describes rules and guidelines for the transformations that are supported on the Spark engine:
Transformation
Rules and Guidelines
Transformations not listed in this table are not supported.
Aggregator
Mapping validation fails in the following situations:
  • The transformation contains stateful variable ports.
  • The transformation contains unsupported functions in an expression.
When a mapping contains an Aggregator transformation with an input/output port that is not a group by port, the transformation might not return the last row of each group with the result of the aggregation. Hadoop execution is distributed, and thus it might not be able to determine the actual last row of each group.
Expression
Mapping validation fails in the following situations:
  • The transformation contains stateful variable ports.
  • The transformation contains unsupported functions in an expression.
If an expression results in numerical errors, such as division by zero or SQRT of a negative number, it returns an infinite or an NaN value. In the native environment, the expression returns null values and the rows do not appear in the output.
Filter
Supported without restrictions.
Java
You must copy external .jar files that a Java transformation requires to the Informatica installation directory on the Hadoop cluster at the following location:
[$HADOOP_NODE_INFA_HOME]/services/shared/jars
.
To run user code directly on the Spark engine, the JDK version that the Data Integration Service uses must be compatible with the JRE version on the cluster. For best performance, create the environment variable DIS_JDK_HOME on the Data Integration Service in the Administrator tool. The environment variable contains the path to the JDK installation folder on the machine running the Data Integration Service. For example, you might enter a value such as
/usr/java/default
.
The Partitionable property must be enabled in the Java transformation. The transformation cannot run in one partition.
For date/time values, the Spark engine supports the precision of up to microseconds. If a date/time value contains nanoseconds, the trailing digits are truncated.
When you enable high precision and the Java transformation contains a field that is a decimal data type, a validation error occurs.
The following restrictions apply to the Transformation Scope property:
  • The value Transaction for transformation scope is not valid.
  • If you enable an input port for partition key, the transformation scope must be set to All Input.
  • Stateless must be enabled if the transformation scope is row.
The Java code in the transformation cannot write output to standard output when you push transformation logic to Hadoop. The Java code can write output to standard error which appears in the log files.
Joiner
Mapping validation fails in the following situations:
  • Case sensitivity is disabled.
  • The join condition in the Joiner transformation contains binary data type or binary expressions.
Lookup
Mapping validation fails in the following situations:
  • Case sensitivity is disabled.
  • The lookup condition in the Lookup transformation contains binary data type.
  • The transformation is not configured to return all rows that match the condition.
  • The lookup is a data object.
  • The cache is configured to be shared, named, persistent, dynamic, or uncached. The cache must be a static cache.
The mapping fails in the following situations:
  • The transformation is unconnected.
When you use Sqoop and look up data in a Hive table based on a column of the float data type, the Lookup transformation might return incorrect results.
Router
Supported without restrictions.
Sorter
Mapping validation fails in the following situations:
  • Case sensitivity is disabled.
The Data Integration Service logs a warning and ignores the Sorter transformation in the following situations:
  • There is a type mismatch in between the target and the Sorter transformation sort keys.
  • The transformation contains sort keys that are not connected to the target.
  • The Write transformation is not configured to maintain row order.
  • The transformation is not directly upstream from the Write transformation.
The Data Integration Service treats null values as high even if you configure the transformation to treat null values as low.
Union
Supported without restrictions.
Transformations not listed in this table are not supported.


Updated July 03, 2018