Network
Data Engineering
Data Engineering Integration
Enterprise Data Catalog
Enterprise Data Preparation
Cloud Integration
Cloud Application Integration
Cloud Data Integration
Cloud Customer 360
DiscoveryIQ
Cloud Data Wizard
Informatica for AWS
Informatica for Microsoft
Cloud Integration Hub
Complex Event Processing
Proactive Healthcare Decision Management
Proactive Monitoring
Real-Time Alert Manager
Rule Point
Data Integration
B2B Data Exchange
B2B Data Transformation
Data Integration Hub
Data Replication
Data Services
Data Validation Option
Fast Clone
Informatica Platform
Metadata Manager
PowerCenter
PowerCenter Express
PowerExchange
PowerExchange Adapters
Data Quality
Axon Data Governance
Data as a Service
Data Explorer
Data Quality
Data Security Group (Formerly ILM)
Data Archive
Data Centric Security
Secure@Source
Secure Testing
Master Data Management
Identity Resolution
MDM - Relate 360
Multidomain MDM
MDM Registry Edition
Process Automation
ActiveVOS
Process Automation
Product Information Management
Informatica Procurement
MDM - Product 360
Ultra Messaging
Ultra Messaging Options
Ultra Messaging Persistence Edition
Ultra Messaging Queuing Edition
Ultra Messaging Streaming Edition
Edge Data Streaming
Knowledge Base
Resources
PAM (Product Availability Matrices)
Support TV
Velocity (Best Practices)
Mapping Templates
Debugging Tools
User Groups
Documentation
English
English
English
Español
Spanish
Deutsch
German
Français
French
日本語
Japanese
한국어
Korean
Português
Portuguese
中文
Chinese
Log Out
Log In
Sign Up
Data Engineering Integration
10.4.0
H2L
10.4.1
10.4.0
10.2.2 HotFix 1
10.2.2 Service Pack 1
10.2.2
10.2.1
10.2 HotFix 2
10.2 HotFix 1
10.2
10.1.1 Update 2
10.1.1 HotFix 1
10.1.1
10.1
10.0 Update 1
10.0
User Guide
Data Engineering Integration
All Products
Table of Contents
Search
No Results
Preface
Introduction to Informatica Data Engineering Integration
Informatica Data Engineering Integration Overview
Example
Data Engineering Integration Component Architecture
Clients and Tools
Application Services
Repositories
Hadoop Integration
Hadoop Utilities
Databricks Integration
Data Engineering Integration Engines
Run-time Process on the Blaze Engine
Blaze Engine High Availability
Application Timeline Server
Run-time Process on the Spark Engine
Run-time Process on the Databricks Spark Engine
Data Engineering Process
Step 1. Collect the Data
Step 2. Cleanse the Data
Step 3. Transform the Data
Step 4. Process the Data
Step 5. Monitor Jobs
Data Warehouse Optimization Mapping Example
Mappings
Overview of Mappings
Mapping Run-time Properties
Validation Environments
Execution Environment
Parsing JSON Records on the Spark Engines
Reject File Directory
Changing the Compute Cluster for a Mapping Run
Updating Run-time Properties for Multiple Mappings
PreSQL and PostSQL Queries for JDBC Sources
Sqoop Mappings in a Hadoop Environment
Sqoop Mapping-Level Arguments
m or num-mappers
split-by
batch
infaoptimize
infaownername
schema
verbose
Incremental Data Extraction for Sqoop Mappings
Configuring Sqoop Properties in the Mapping
Configuring Parameters for Sqoop Arguments in the Mapping
Mapping Output Binding
Rules and Guidelines for Mappings in a Non-native Environment
Rules and Guidelines for Mappings on the Blaze Engine
Rules and Guidelines for Mappings on the Spark Engine
Function and Data Type Processing on the Spark Engine
Rules and Guidelines for Mappings on the Databricks Spark Engine
Workflows that Run Mappings in a Non-native Environment
Configuring a Mapping to Run in a Non-native Environment
Mapping Execution Plans
Blaze Engine Execution Plan Details
Spark Engine Execution Plan Details
Databricks Spark Engine Execution Details
Viewing the Execution Plan
Troubleshooting Mappings in a Non-native Environment
Mappings in the Native Environment
Data Processor Mappings
HDFS Mappings
HDFS Data Extraction Mapping Example
Hive Mappings
Hive Mapping Example
Social Media Mappings
Twitter Mapping Example
Native Environment Optimization
Processing Data on a Grid
Processing Data on Partitions
Partition Optimization
High Availability
Mapping Optimization
Mapping Optimization
Mapping Recommendations and Analysis
Recommendations
Implementing a Recommendation
Archiving a Recommendation
Disabling a Recommendation Category
Enabling and Disabling Recommendations
Insights
Analysis Spreadsheet
Example
Enabling Data Compression on Temporary Staging Tables
Step 1. Enable Data Compression in the Hadoop Connection
Step 2. Enable Data Compression on the Hadoop Environment
Truncating Partitions in a Hive Target
Hive Warehouse Connector and Hive LLAP
Scheduling, Queuing, and Node Labeling
Enable Scheduling and Node Labeling
Define YARN Queues
Configure the Blaze Engine to Use Node Labels
Data Engineering Recovery
Spark Engine Optimization for Sqoop Pass-Through Mappings
Sources
Overview of Sources
PowerExchange Adapter Sources
Sources on Databricks
Complex File Sources on Amazon S3
Complex File Sources on ADLS
Complex File Sources on Azure Data Lake Storage Gen2
Complex File Sources on Azure Blob
Databricks Delta Lake
Rules and Guidelines for Databricks Sources
File Sources on Hadoop
Complex File Sources on Amazon S3
Complex File Sources on ADLS
Complex File Sources on Azure Data Lake Storage Gen2
Complex File Sources on Azure Blob
Complex File Sources on MapR-FS
Complex File Sources on HDFS
Flat File Sources on Hadoop
Generate the Source File Name
Relational Sources on Hadoop
Hive Sources on Hadoop
PreSQL and PostSQL Commands
Rules and Guidelines for Hive Sources on the Blaze Engine
Sqoop Sources on Hadoop
Reading Data from Vertica Sources through Sqoop
Rules and Guidelines for Sqoop Sources
Rules and Guidelines for Sqoop Queries
Targets
Overview of Targets
PowerExchange Adapter Targets
Targets on Databricks
Complex File Targets on Amazon S3
Complex File Targets on ADLS
Complex File Sources on Azure Data Lake Storage Gen2
Complex File Targets on Azure Blob
Databricks Delta Lake
Rules and Guidelines for Databricks Targets
File Targets on Hadoop
Complex File Targets on Amazon S3
Complex File Targets on ADLS
Complex File Targets on Azure Blob
Complex File Targets on MapR-FS
Complex File Targets on HDFS
Flat File Targets on Hadoop
Message Targets on Hadoop
Relational Targets on Hadoop
Hive Targets on Hadoop
PreSQL and PostSQL Commands
Truncating Hive Targets
Updating Hive Targets with an Update Strategy Transformation
Rules and Guidelines for Hive Targets on the Blaze Engine
Sqoop Targets on Hadoop
Rules and Guidelines for Sqoop Targets
Transformations
Overview of Transformations
Address Validator Transformation in a Non-native Environment
Address Validator Transformation on the Blaze Engine
Address Validator Transformation on the Spark Engine
Address Validator Transformation in a Streaming Mapping
Aggregator Transformation in a Non-native Environment
Aggregator Transformation on the Blaze Engine
Aggregator Transformation on the Spark Engine
Aggregator Transformation in a Streaming Mapping
Aggregator Transformation on the Databricks Spark Engine
Case Converter Transformation in a Non-native Environment
Classifier Transformation in a Non-native Environment
Comparison Transformation in a Non-native Environment
Consolidation Transformation in a Non-native Environment
Consolidation Transformation on the Blaze Engine
Consolidation Transformation on the Spark Engine
Data Masking Transformation in a Non-native Environment
Data Masking Transformation on the Blaze Engine
Data Masking Transformation on the Spark Engine
Data Masking Transformation in a Streaming Mapping
Data Processor Transformation in a Non-native Environment
Decision Transformation in a Non-native Environment
Decision Transformation on the Spark Engine
Expression Transformation in a Non-native Environment
Expression Transformation on the Blaze Engine
Expression Transformation on the Spark Engine
Expression Transformation in a Streaming Mapping
Expression Transformation on the Databricks Spark Engine
Filter Transformation in a Non-native Environment
Filter Transformation on the Blaze Engine
Hierarchical to Relational Transformation in a Non-native Environment
Java Transformation in a Non-native Environment
Java Transformation on the Blaze Engine
Java Transformation on the Spark Engine
Java Transformation in a Streaming Mapping
Joiner Transformation in a Non-native Environment
Joiner Transformation on the Blaze Engine
Joiner Transformation on the Spark Engine
Joiner Transformation in a Streaming Mapping
Joiner Transformation on the Databricks Spark Engine
Key Generator Transformation in a Non-native Environment
Labeler Transformation in a Non-native Environment
Lookup Transformation in a Non-native Environment
Lookup Transformation on the Blaze Engine
Lookup Transformation on the Spark Engine
Lookup Transformation in a Streaming Mapping
Lookup Transformation on the Databricks Spark Engine
Match Transformation in a Non-native Environment
Match Transformation on the Blaze Engine
Match Transformation on the Spark Engine
Merge Transformation in a Non-native Environment
Normalizer Transformation in a Non-native Environment
Parser Transformation in a Non-native Environment
Rank Transformation in a Non-native Environment
Rank Transformation on the Blaze Engine
Rank Transformation on the Spark Engine
Rank Transformation in a Streaming Mapping
Rank Transformation on the Databricks Spark Engine
Relational to Hierarchical Transformation in a Non-native Environment
Router Transformation in a Non-native Environment
Sequence Generator Transformation in a Non-native Environment
Sequence Generator Transformation on the Blaze Engine
Sequence Generator Transformation on the Spark Engine
Sorter Transformation in a Non-native Environment
Sorter Transformation on the Blaze Engine
Sorter Transformation on the Spark Engine
Sorter Transformation in a Streaming Mapping
Sorter Transformation on the Databricks Spark Engine
Standardizer Transformation in a Non-native Environment
Union Transformation in a Non-native Environment
Union Transformation in a Streaming Mapping
Update Strategy Transformation in a Non-native Environment
Update Strategy Transformation on the Blaze Engine
Update Strategy Transformation on the Spark Engine
Update Strategy Transformation on the Databricks Spark Engine
Weighted Average Transformation in a Non-native Environment
Python Transformation
Python Transformation Overview
Active and Passive Python Transformations
Data Type Conversion
Data Types in Input and Output Ports
Python Transformation Ports
Python Transformation Advanced Properties
Python Transformation Components
Resource File
Python Code
Rules and Guidelines for the Python Transformation
Python Transformation in a Streaming Mapping
Creating a Python Transformation
Creating a Reusable Python Transformation
Creating a Non-Reusable Python Transformation
Example: Add an ID Column to Nonpartitioned Data
Example: Use Partitions to Find the Highest Salary
Use Case: Operationalize a Pre-Trained Model
Data Preview
Overview of Data Preview
Connections and Cluster Distributions that Support Data Preview
Data Preview Process
Previewing Data
Data Preview Interface for Hierarchical Data
Data Viewer
Exporting Data
Hierarchical Type Panel
Data Preview on Transformations
Data Preview Logs
Rules and Guidelines for Data Preview on the Spark Engine
Cluster Workflows
Cluster Workflows Overview
Cluster Workflow Components
Cluster Workflows Process
Create Cluster Task Properties
Advanced Properties for Amazon EMR
General Options
Master Instance Group Options
Core Instance Group Options
Task Instance Group Options
Additional Options
Advanced Properties for Azure HDInsight
Advanced Properties for Azure Databricks
General Options
Advanced Options
Advanced Properties for AWS Databricks
General Options
Advanced Options
Advanced Properties for the Blaze Engine
Advanced Properties for a Hive Metastore Database
Mapping Task Properties
Add a Delete Cluster Task
Deploy and Run the Workflow
Monitoring Azure HDInsight Cluster Workflow Jobs
Profiles
Profiles Overview
Native Environment
Hadoop Environment
Column Profiles for Sqoop Data Sources
Sampling Options
Creating a Single Data Object Profile in Informatica Developer
Creating an Enterprise Discovery Profile in Informatica Developer
Creating a Column Profile in Informatica Analyst
Creating an Enterprise Discovery Profile in Informatica Analyst
Creating a Scorecard in Informatica Analyst
Monitoring a Profile
Profiling Functionality Support
Troubleshooting
Monitoring
Overview of Monitoring
Hadoop Environment Logs
YARN Web User Interface
Accessing the Monitoring URL
Viewing Hadoop Environment Logs in the Administrator Tool
Monitoring a Mapping
Blaze Engine Monitoring
Blaze Job Monitoring Application
Blaze Summary Report
Time Taken by Individual Segments
Mapping Properties
Tasklet Execution Time
Selected Tasklet Information
Blaze Engine Logs
Viewing Blaze Logs
Orchestrator Sunset Time
Troubleshooting Blaze Monitoring
Spark Engine Monitoring
Viewing Hive Tasks
Spark Engine Logs
Viewing Spark Logs
Troubleshooting Spark Engine Monitoring
Hierarchical Data Processing
Overview of Hierarchical Data Processing
How to Develop a Mapping to Process Hierarchical Data
Complex Data Types
Array Data Type
Map Data Type
Struct Data Type
Rules and Guidelines for Complex Data Types
Complex Ports
Complex Ports in Transformations
Rules and Guidelines for Complex Ports
Creating a Complex Port
Complex Data Type Definitions
Nested Data Type Definitions
Rules and Guidelines for Complex Data Type Definitions
Creating a Complex Data Type Definition
Importing a Complex Data Type Definition
Type Configuration
Changing the Type Configuration for an Array Port
Changing the Type Configuration for a Map Port
Specifying the Type Configuration for a Struct Port
Complex Operators
Extracting an Array Element Using a Subscript Operator
Extracting a Struct Element Using the Dot Operator
Complex Functions
Midstream Parsing of Hierarchical Data
Midstream Parsing Overview
Midstream Parsing Use Case
How to Use a Midstream Mapping to Parse Hierarchical Data
Rules and Guidelines for Midstream Parsing
Hierarchical Data Processing Configuration
Hierarchical Data Conversion
Convert Relational or Hierarchical Data to Struct Data
Creating a Struct Port
Convert Relational or Hierarchical Data to Nested Struct Data
Creating A Nested Complex Port
Extract Elements from Hierarchical Data
Extracting Elements from a Complex Port
Flatten Hierarchical Data
Flattening a Complex Port
Hierarchical Data Processing with Schema Changes
Overview of Hierarchical Data Processing with Schema Changes
How to Develop a Dynamic Mapping to Process Schema Changes in Hierarchical Data
Dynamic Complex Ports
Dynamic Ports and Dynamic Complex Ports
Dynamic Complex Ports in Transformations
Input Rules for a Dynamic Complex Port
Input Rule for a Dynamic Array
Input Rules for a Dynamic Map
Input Rules for a Dynamic Struct
Port Selectors for Dynamic Complex Ports
Dynamic Expressions
Example - Dynamic Expression to Construct a Dynamic Struct
Complex Operators
Complex Functions
Rules and Guidelines for Dynamic Complex Ports
Optimized Mappings
Intelligent Structure Models
Overview of Intelligent Structure Models
Intelligent Structure Discovery Process
Use Case
Using an Intelligent Structure Model in a Mapping
Rules and Guidelines for Intelligent Structure Models
How to Develop and Run a Mapping to Process Data with an Intelligent Structure Model
Mapping Example
Create an Intelligent Structure Model in Cloud Data Integration
Before You Begin
Creating an Informatica Intelligent Cloud Services Account
Creating an Intelligent Structure Model
Exporting an Intelligent Structure Model
Blockchain
Blockchain Overview
Blockchain Process
Blockchain Data Objects
Response Ports
Blockchain Data Object Overview Properties
Creating a Blockchain Data Object
Blockchain Data Object Operations
Blockchain Data Object Read Operation Properties
Blockchain Data Object Write Operation Properties
Creating a Blockchain Data Object Operation
Use Case: Using a Blockchain Source to Improve Services in a Vehicle Lifecycle
Mapping Overview
Stateful Computing
Overview of Stateful Computing
Windowing Configuration
Frame
Partition and Order Keys
Rules and Guidelines for Windowing Configuration
Window Functions
LEAD
LAG
Aggregate Functions as Window Functions
Aggregate Offsets
Nested Aggregate Functions
Rules and Guidelines for Window Functions
Windowing Examples
Financial Plans Example
GPS Pings Example
Aggregate Function as Window Function Example
Appendix A: Connections Reference
Connections Overview
Cloud Provisioning Configuration
AWS Cloud Provisioning Configuration Properties
General Properties
Permissions
EC2 Configuration
Azure Cloud Provisioning Configuration Properties
Authentication Details
Storage Account Details
Cluster Deployment Details
External Hive Metastore Details
Databricks Cloud Provisioning Configuration Properties
Amazon Redshift Connection Properties
Amazon S3 Connection Properties
Blockchain Connection Properties
Cassandra Connection Properties
Databricks Connection Properties
Google Analytics Connection Properties
Google BigQuery Connection Properties
Google Cloud Spanner Connection Properties
Google Cloud Storage Connection Properties
Hadoop Connection Properties
Hadoop Cluster Properties
Common Properties
Reject Directory Properties
Blaze Configuration
Spark Configuration
HDFS Connection Properties
HBase Connection Properties
HBase Connection Properties for MapR-DB
Hive Connection Properties
JDBC Connection Properties
JDBC Connection String
Sqoop Connection-Level Arguments
JDBC V2 Connection Properties
Kafka Connection Properties
Microsoft Azure Blob Storage Connection Properties
Microsoft Azure Cosmos DB SQL API Connection Properties
Microsoft Azure Data Lake Storage Gen1 Connection Properties
Microsoft Azure Data Lake Storage Gen2 Connection Properties
Microsoft Azure SQL Data Warehouse Connection Properties
Snowflake Connection Properties
Creating a Connection to Access Sources or Targets
Creating a Hadoop Connection
Configuring Hadoop Connection Properties
Cluster Environment Variables
Cluster Library Path
Common Advanced Properties
Blaze Engine Advanced Properties
Spark Advanced Properties
Appendix B: Data Type Reference
Data Type Reference Overview
Transformation Data Type Support in a Non-native Environment
Complex File and Transformation Data Types
Avro Data Types and Transformation Data Types
JSON Data Types and Transformation Data Types
ORC Data Types and Transformation Data Types
Parquet Data Types and Transformation Data Types
Rules and Guidelines for Data Types
Hive Data Types and Transformation Data Types
Hive Complex Data Types
Sqoop Data Types
Aurora Data Types
IBM DB2 and DB2 for z/OS Data Types
Greenplum Data Types
Microsoft SQL Server Data Types
Netezza Data Types
Oracle Data Types
Teradata Data Types
Teradata Data Types with TDCH Specialized Connectors for Sqoop
Vertica Data Types
Appendix C: Function Reference
Function Support in a Non-native Environment
Function and Data Type Processing
User Guide
User Guide
10.4.0
10.4.1
10.2.2 HotFix 1
10.2.2 Service Pack 1
10.2.2
10.2.1
10.2 HotFix 1
10.2
10.1.1 Update 2
10.1.1 HotFix 1
10.1.1
10.1
10.0
Back
Next
Sorter Transformation on the Spark Engine
Sorter Transformation on the Spark Engine
Some processing rules for the Spark engine differ from the processing rules for the Data Integration Service.
Mapping Validation
Mapping validation fails when case sensitivity is disabled.
The Data Integration Service logs a warning and ignores the Sorter transformation in the following situations:
There is a type mismatch in between the target and the Sorter transformation sort keys.
The transformation contains sort keys that are not connected to the target.
The Write transformation is not configured to maintain row order.
The transformation is not directly upstream from the Write transformation.
Null Values
The Data Integration Service treats null values as low even if you configure the transformation to treat null values as high.
Data Cache Optimization
You cannot optimize the sorter cache to store data using variable length.
Parallel Sorting
The Data Integration Service enables parallel sorting with the following restrictions:
The mapping does not include another transformation between the Sorter transformation and the target.
The data type of the sort keys does not change between the Sorter transformation and the target.
Each sort key in the Sorter transformation must be linked to a column in the target.
Sorter Transformation in a Non-native Environment
Sorter Transformation in a Streaming Mapping
Updated September 28, 2020
Download Guide
Send Feedback
Explore Informatica Network
Communities
Knowledge Base
Success Portal
Back to Top
Back
Next