Table of Contents

Search

  1. Preface
  2. Part 1: Version 10.4.0
  3. Part 2: Version 10.2.2
  4. Part 3: Version 10.2.1
  5. Part 4: Version 10.2
  6. Part 5: Version 10.1.1
  7. Part 6: Version 10.1

Transformation Support on the Spark Engine

Transformation Support on the Spark Engine

This section describes new transformation features on the Spark engine in version 10.2.1.

Transformation Support

Effective in version 10.2.1, the following transformations are supported on the Spark engine:
  • Case Converter
  • Classifier
  • Comparison
  • Key Generator
  • Labeler
  • Merge
  • Parser
  • Python
  • Standardizer
  • Weighted Average
Effective in version 10.2.1, the following transformations are supported with restrictions on the Spark engine:
  • Address Validator
  • Consolidation
  • Decision
  • Match
  • Sequence Generator
Effective in version 10.2.1, the following transformation has additional support on the Spark engine:
  • Java. Supports complex data types such as array, map, and struct to process hierarchical data.
For more information on transformation support, see the "Mapping Transformations in the Hadoop Environment" chapter in the
Informatica Big Data Management 10.2.1 User Guide
.
For more information about transformation operations, see the
Informatica 10.2.1 Developer Transformation Guide
.

Python Transformation

Effective in version 10.2.1, you can create a Python transformation in the Developer tool. Use the Python transformation to execute Python code in a mapping that runs on the Spark engine.
You can use a Python transformation to implement a machine model on the data that you pass through the transformation. For example, use the Python transformation to write Python code that loads a pre-trained model. You can use the pre-trained model to classify input data or create predictions.
The Python transformation is available for technical preview. Technical preview functionality is supported but is not production-ready. Informatica recommends that you use in non-production environments only.
For more information, see the "Python Transformation" chapter in the
Informatica 10.2.1 Developer Transformation Guide
.

Update Strategy Transformation

Effective in version 10.2.1, you can use Hive MERGE statements for mappings that run on the Spark engine to perform update strategy tasks. Using MERGE in queries is usually more efficient and helps increase performance.
Hive MERGE statements are supported for the following Hadoop distributions:
  • Amazon EMR 5.10
  • Azure HDInsight 3.6
  • Hortonworks HDP 2.6
To use Hive MERGE, select the option in the advanced properties of the Update Strategy transformation.
Previously, the Data Integration Service used INSERT, UPDATE and DELETE statements to perform this task using any run-time engine. The Update Strategy transformation still uses these statements in the following scenarios:
  • You do not select the Hive MERGE option.
  • Mappings run on the Hive or Blaze engine.
  • If the Hadoop distribution does not support Hive MERGE.
For more information about using a MERGE statement in Update Strategy transformations, see the chapter on Update Strategy transformation in the
Informatica Big Data Management 10.2.1 User Guide
.


Updated September 25, 2020