Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Java Transformation Support on the Spark Engine

Java Transformation Support on the Spark Engine

You can use complex data types to process hierarchical data.
Some processing rules for the Spark engine differ from the processing rules for the Data Integration Service.

General Restrictions

The Java transformation is supported with the following restrictions on the Spark engine:
  • The Java code in the transformation cannot write output to standard output when you push transformation logic to Hadoop. The Java code can write output to standard error which appears in the log files.
  • For date/time values, the Spark engine supports the precision of up to microseconds. If a date/time value contains nanoseconds, the trailing digits are truncated.

Partitioning

The Java transformation has the following restrictions when used with partitioning:
  • The Partitionable property must be enabled in the Java transformation. The transformation cannot run in one partition.
  • The following restrictions apply to the Transformation Scope property:
    • The value Transaction for transformation scope is not valid.
    • If you enable an input port for partition key, the transformation scope must be set to All Input.
    • Stateless must be enabled if the transformation scope is row.

Mapping Validation

Mapping validation fails in the following situations:
  • You reference an unconnected Lookup transformation from an expression within a Java transformation.
  • You select a port of a complex data type as the partition or sort key.
  • You enable nanosecond processing in date/time and the Java transformation contains a port of complex data type with an element of a date/time type. For example, a port of type
    array<data/time>
    is not valid if you enable nanosecond processing in date/time.
  • When you enable high precision, a validation error occurs in the following situations:
    • The Java transformation contains a port of a decimal data type.
    • The Java transformation contains a complex data type with an element of a decimal data type.

Using External .jar Files

To use external .jar files in a Java transformation, perform the following steps:
  1. Copy external .jar files to the Informatica installation directory in the Data Integration Service machine at the following location:
    <Informatic installation directory>/services/shared/jars
  2. Recycle the Data Integration Service.
  3. On the machine that hosts the Developer tool where you develop and run the mapping that contains the Java transformation:
    1. Copy external .jar files to a directory on the local machine.
    2. Edit the Java transformation to include an import statement pointing to the local .jar files.
    3. Update the classpath in the Java transformation.
    4. Compile the transformation.

Setting the JDK Path

To use complex ports in the Java transformation and to run Java user code directly on the Spark engine, you must set the JDK path.
In the Administrator tool, configure the following execution option for the Data Integration Service:
Property
Description
JDK Home Directory
The JDK installation directory on the machine that runs the Data Integration Service. Changes take effect after you recycle the Data Integration Service.
The JDK version that the Data Integration Service uses must be compatible with the JRE version on the cluster.
For example, enter a value such as
/usr/java/default
.
Default is blank.

0 COMMENTS

We’d like to hear from you!