Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Execution Environment

Execution Environment

Configure Hadoop properties, Pushdown Configuration properties, and Source Configuration properties in the
Execution Environment
area.
Configure the following properties in a Hadoop Execution Environment:
Name
Value
Connection
Defines the connection information that the Data Integration Service requires to push the mapping execution to the Hadoop cluster. Select the Hadoop connection to run the mapping in the Hadoop cluster. You can assign a user-defined parameter for the Hadoop connection.
Runtime Properties
You can configure run-time properties for the Hadoop environment in the Data Integration Service, the Hadoop connection, and in the mapping. You can override a property configured at a high level by setting the value at a lower level. For example, if you configure a property in the Data Integration Service custom properties, you can override it in the Hadoop connection or in the mapping. The Data Integration Service processes property overrides based on the following priorities:
  1. Mapping custom properties set using
    infacmd ms runMapping
    with the
    -cp
    option
  2. Mapping run-time properties for the Hadoop environment
  3. Hadoop connection advanced properties for run-time engines
  4. Hadoop connection advanced general properties, environment variables, and classpaths
  5. Data Integration Service custom properties
Reject File Directory
The directory for Hadoop mapping files on HDFS when you run mappings in the Hadoop environment.
The Blaze engine can write reject files to the Hadoop environment for flat file, HDFS, and Hive targets. The Spark and Hive engines can write reject files to the Hadoop environment for flat file and HDFS targets.
Choose one of the following options:
  • On the Data Integration Service machine. The Data Integration Service stores the reject files based on the RejectDir system parameter.
  • On the Hadoop Cluster. The reject files are moved to the reject directory configured in the Hadoop connection. If the directory is not configured, the mapping will fail.
  • Defer to the Hadoop Connection. The reject files are moved based on whether the reject directory is enabled in the Hadoop connection properties. If the reject directory is enabled, the reject files are moved to the reject directory configured in the Hadoop connection. Otherwise, the Data Integration Service stores the reject files based on the RejectDir system parameter.
You can configure the following pushdown configuration properties:
Name
Value
Pushdown type
Choose one of the following options:
  • None. Select no pushdown type for the mapping.
  • Source. The Data Integration Service tries to push down transformation logic to the source database.
  • Full. The Data Integration Service pushes the full transformation logic to the source database.
Pushdown Compatibility
Optionally, if you choose full pushdown optimization and the mapping contains an Update Strategy transformation, you can choose a pushdown compatibility option or assign a pushdown compatibility parameter.
Choose one of the following options:
  • Multiple rows do not have the same key. The transformation connected to the Update Strategy transformation receives multiple rows without the same key. The Data Integration Service can push the transformation logic to the target.
  • Multiple rows with the same key can be reordered. The target transformation connected to the Update Strategy transformation receives multiple rows with the same key that can be reordered. The Data Integration Service can push the Update Strategy transformation to the Hadoop environment.
  • Multiple rows with the same key cannot be reordered. The target transformation connected to the Update Strategy transformation receives multiple rows with the same key that cannot be reordered. The Data Integration Service cannot push the Update Strategy transformation to the Hadoop environment.
You can configure the following source properties:
Name
Value
Maximum Rows Read
Reserved for future use.
Maximum Runtime Interval
Reserved for future use.
State Store
Reserved for future use.

0 COMMENTS

We’d like to hear from you!