Table of Contents


  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Data Preview
  8. Cluster Workflows
  9. Profiles
  10. Monitoring
  11. Hierarchical Data Processing
  12. Hierarchical Data Processing Configuration
  13. Hierarchical Data Processing with Schema Changes
  14. Intelligent Structure Models
  15. Stateful Computing
  16. Appendix A: Connections
  17. Appendix B: Data Type Reference
  18. Appendix C: Function Reference

Overview of Mappings

Overview of Mappings

Configure the run-time environment in the Developer tool to optimize mapping performance and process data that is greater than 10 terabytes. You can choose to run a mapping in the native environment or in a non-native environment. When you run mappings in the native environment, the Data Integration Service processes and runs the mapping. When you run mappings in a non-native environment, the Data Integration Service pushes the processing to a compute cluster, such as Hadoop or Databricks.
You can run standalone mappings and mappings that are a part of a workflow in the non-native environment.
When you select the Hadoop environment, you can also select the engine to push the mapping logic to the Hadoop cluster. Based on the mapping logic, the Data Integration Service can push the mapping logic to one of the following engines in the Hadoop environment:
  • Informatica Blaze engine. An Informatica proprietary engine for distributed processing on Hadoop.
  • Spark engine. A high performance engine for batch processing that can run on a Hadoop cluster or on a Spark standalone mode cluster.
When you select the Databricks environment, the Integration Service pushes the mapping logic to the Databricks Spark engine, the Apache Spark engine packaged for Databricks.
When you select multiple engines, the Data Integration Service determines the best engine to run the mapping during validation. You can also choose to select which engine the Data Integration Service uses. You might select an engine based on whether an engine supports a particular transformation or based on the format in which the engine returns data.
When you run a mapping in a non-native environment, you must configure a connection to access the environment. You can set the run-time properties for the environment and for the engine that runs the mapping.
You can view the execution plan for a mapping to run in the non-native environment. View the execution plan for the engine that the Data Integration Service selects to run the mapping.
You can monitor Hive queries and Hadoop jobs in the Monitoring tool. Monitor the jobs on a Hadoop cluster with the YARN Web User Interface or the Blaze Job Monitor web application.
The Data Integration Service logs messages from the Blaze, Spark and Databricks Spark engines and the DTM.


We’d like to hear from you!