Preface
Introduction to Informatica Data Engineering Integration
- Informatica Data Engineering Integration Overview
  - Example
- Data Engineering Integration Component Architecture
- Data Engineering Integration Engines
- Data Engineering Process
- Data Warehouse Optimization Mapping Example
Mappings
- Overview of Mappings
- Mapping Run-time Properties
- PreSQL and PostSQL Queries for JDBC Sources
- Sqoop Mappings in a Hadoop Environment
- Mapping Output Binding
- Rules and Guidelines for Mappings in a Non-native Environment
- Workflows that Run Mappings in a Non-native Environment
- Configuring a Mapping to Run in a Non-native Environment
  - Configure Mappings to Run on Dataproc
- Audits
- Mapping Execution Plans
- Troubleshooting Mappings in a Non-native Environment
- Mappings in the Native Environment
Mapping Optimization
- Mapping Optimization
- Mapping Recommendations and Analysis
- Enabling Data Compression on Temporary Staging Tables
  - Step 1. Enable Data Compression in the Hadoop Connection
  - Step 2. Enable Data Compression on the Hadoop Environment
- Truncating Partitions in a Hive Target
- Hive Warehouse Connector and Hive LLAP
  - Enabling the Hive Warehouse Connector and Hive LLAP
- Scheduling, Queuing, and Node Labeling
- Data Engineering Recovery
- Spark Engine Optimization for Sqoop Pass-Through Mappings
Sources
- Overview of Sources
- PowerExchange Adapter Sources
- Sources on Databricks
- File Sources on Hadoop
- Relational Sources on Hadoop
- Hive Sources on Hadoop
  - PreSQL and PostSQL Commands
  - Rules and Guidelines for Hive Sources on the Blaze Engine
- Sqoop Sources on Hadoop
Targets
- Overview of Targets
- PowerExchange Adapter Targets
- Targets on Databricks
- File Targets on Hadoop
- Message Targets on Hadoop
- Relational Targets on Hadoop
- Hive Targets on Hadoop
- Sqoop Targets on Hadoop
  - Rules and Guidelines for Sqoop Targets
Transformations
- Overview of Transformations
- Address Validator Transformation in a Non-native Environment
  - Address Validator Transformation on the Blaze Engine
  - Address Validator Transformation on the Spark Engine
    - Address Validator Transformation in a Streaming Mapping
  - Address Validator Transformation on the Databricks Spark Engine
- Aggregator Transformation in a Non-native Environment
  - Aggregator Transformation on the Blaze Engine
  - Aggregator Transformation on the Spark Engine
    - Aggregator Transformation in a Streaming Mapping
  - Aggregator Transformation on the Databricks Spark Engine
- Case Converter Transformation in a Non-native Environment
- Classifier Transformation in a Non-native Environment
- Comparison Transformation in a Non-native Environment
- Consolidation Transformation in a Non-native Environment
  - Consolidation Transformation on the Blaze Engine
  - Consolidation Transformation on the Spark Engine
  - Consolidation Transformation on the Databricks Spark Engine
- Data Masking Transformation in a Non-native Environment
  - Data Masking Transformation on the Blaze Engine
  - Data Masking Transformation on the Spark Engine
    - Data Masking Transformation in a Streaming Mapping
- Data Processor Transformation in a Non-native Environment
- Decision Transformation in a Non-native Environment
  - Decision Transformation on the Spark Engine
  - Decision Transformation on the Databricks Spark Engine
- Expression Transformation in a Non-native Environment
  - Expression Transformation on the Blaze Engine
  - Expression Transformation on the Spark Engine
    - Expression Transformation in a Streaming Mapping
  - Expression Transformation on the Databricks Spark Engine
- Filter Transformation in a Non-native Environment
  - Filter Transformation on the Blaze Engine
- Hierarchical to Relational Transformation in a Non-native Environment
- Java Transformation in a Non-native Environment
  - Java Transformation on the Blaze Engine
  - Java Transformation on the Spark Engine
    - Java Transformation in a Streaming Mapping
- Joiner Transformation in a Non-native Environment
  - Joiner Transformation on the Blaze Engine
  - Joiner Transformation on the Spark Engine
    - Joiner Transformation in a Streaming Mapping
  - Joiner Transformation on the Databricks Spark Engine
- Key Generator Transformation in a Non-native Environment
  - Key Generator Transformation on the Blaze Engine
  - Key Generator Transformation on the Spark Engine
  - Key Generator Transformation on the Databricks Spark Engine
- Labeler Transformation in a Non-native Environment
- Lookup Transformation in a Non-native Environment
  - Lookup Transformation on the Blaze Engine
  - Lookup Transformation on the Spark Engine
    - Lookup Transformation in a Streaming Mapping
  - Lookup Transformation on the Databricks Spark Engine
- Macro Transformation in a Non-native Environment
- Match Transformation in a Non-native Environment
  - Match Transformation on the Blaze Engine
  - Match Transformation on the Spark Engine
  - Match Transformation on the Databricks Spark Engine
- Merge Transformation in a Non-native Environment
- Normalizer Transformation in a Non-native Environment
- Parser Transformation in a Non-native Environment
- Rank Transformation in a Non-native Environment
  - Rank Transformation on the Blaze Engine
  - Rank Transformation on the Spark Engine
    - Rank Transformation in a Streaming Mapping
  - Rank Transformation on the Databricks Spark Engine
- Relational to Hierarchical Transformation in a Non-native Environment
- Router Transformation in a Non-native Environment
- Rule Specification Transformation in a Non-native Environment
- Sequence Generator Transformation in a Non-native Environment
  - Sequence Generator Transformation on the Blaze Engine
  - Sequence Generator Transformation on the Spark Engine
  - Sequence Generator Transformation on the Databricks Spark Engine
- Sorter Transformation in a Non-native Environment
  - Sorter Transformation on the Blaze Engine
  - Sorter Transformation on the Spark Engine
    - Sorter Transformation in a Streaming Mapping
  - Sorter Transformation on the Databricks Spark Engine
- Standardizer Transformation in a Non-native Environment
- Union Transformation in a Non-native Environment
  - Union Transformation in a Streaming Mapping
- Update Strategy Transformation in a Non-native Environment
  - Update Strategy Transformation on the Blaze Engine
  - Update Strategy Transformation on the Spark Engine
  - Update Strategy Transformation on the Databricks Spark Engine
- Weighted Average Transformation in a Non-native Environment
Python Transformation
- Python Transformation Overview
- Python Transformation Ports
- Python Transformation Advanced Properties
- Python Transformation Components
  - Resource File
  - Python Code
- Rules and Guidelines for the Python Transformation
  - Python Transformation in a Streaming Mapping
- Creating a Python Transformation
  - Creating a Reusable Python Transformation
  - Creating a Non-Reusable Python Transformation
- Example: Add an ID Column to Nonpartitioned Data
- Example: Use Partitions to Find the Highest Salary
- Use Case: Operationalize a Pre-Trained Model
Data Preview
- Overview of Data Preview
  - Connections and Cluster Distributions that Support Data Preview
- Data Preview Process
- Previewing Data
- Data Preview Interface for Hierarchical Data
- Data Preview on Transformations
- Data Preview Logs
- Rules and Guidelines for Data Preview on the Spark Engine
Cluster Workflows
- Cluster Workflows Overview
  - Cluster Workflows Platform Support
- Cluster Workflow Components
- Cluster Workflows Process
- Create the Workflow and the Create Cluster Task
- Where to Configure Parameters for Cluster Creation
  - Using the Developer Tool to Configure Cluster Creation Properties
  - Using a JSON File to Configure Cluster Creation Properties
    - Create the JSON File
    - Enable the JSON File
- Add a Mapping Task
- Add a Delete Cluster Task
- Deploy and Run the Workflow
  - Monitoring Azure HDInsight Cluster Workflow Jobs
- Configure Databricks Clusters Using Warm Pools
- Create Cluster Task Properties
Profiles
- Profiles Overview
- Native Environment
- Hadoop Environment
  - Column Profiles for Sqoop Data Sources
- Sampling Options
- Creating a Single Data Object Profile in Informatica Developer
- Creating an Enterprise Discovery Profile in Informatica Developer
- Creating a Column Profile in Informatica Analyst
- Creating an Enterprise Discovery Profile in Informatica Analyst
- Creating a Scorecard in Informatica Analyst
- Monitoring a Profile
- Profiling Functionality Support
- Troubleshooting
Monitoring
- Overview of Monitoring
- Hadoop Environment Logs
- Blaze Engine Monitoring
- Spark Engine Monitoring
Hierarchical Data Processing
- Overview of Hierarchical Data Processing
- How to Develop a Mapping to Process Hierarchical Data
- Complex Data Types
- Complex Ports
- Complex Data Type Definitions
- Type Configuration
- Complex Operators
  - Extracting an Array Element Using a Subscript Operator
  - Extracting a Struct Element Using the Dot Operator
- Complex Functions
- Rules and Guidelines for Processing Hierarchical Data on the Spark Engine
- Midstream Parsing of Hierarchical Data
Hierarchical Data Processing Configuration
- Hierarchical Data Conversion
- Convert Relational or Hierarchical Data to Struct Data
  - Creating a Struct Port
- Convert Relational or Hierarchical Data to Nested Struct Data
  - Creating A Nested Complex Port
- Extract Elements from Hierarchical Data
  - Extracting Elements from a Complex Port
- Flatten Hierarchical Data
  - Flattening a Complex Port
Hierarchical Data Processing with Schema Changes
- Overview of Hierarchical Data Processing with Schema Changes
- How to Develop a Dynamic Mapping to Process Schema Changes in Hierarchical Data
- Flatten Hierarchical Data with Schema Changes
  - Flatten a Dynamic Struct
- Dynamic Complex Ports
  - Dynamic Ports and Dynamic Complex Ports
  - Dynamic Complex Ports in Transformations
- Input Rules for a Dynamic Complex Port
- Port Selectors for Dynamic Complex Ports
- Dynamic Expressions
  - Example - Dynamic Expression to Construct a Dynamic Struct
- Complex Operators
- Complex Functions
- Rules and Guidelines for Dynamic Complex Ports
- Optimized Mappings
Intelligent Structure Models
- Overview of Intelligent Structure Models
- Intelligent Structure Discovery Process
- Use Case
- Using an Intelligent Structure Model in a Mapping
- Rules and Guidelines for Intelligent Structure Models
- How to Create a Mapping with an Intelligent Structure Model
  - Mapping Example
- Create an Intelligent Structure Model in Cloud Data Integration
Blockchain
- Blockchain Overview
  - Blockchain Process
- Blockchain Data Objects
- Blockchain Data Object Operations
- Use Case: Using a Blockchain Source to Improve Services in a Vehicle Lifecycle
  - Mapping Overview
Stateful Computing
- Overview of Stateful Computing
- Windowing Configuration
- Window Functions
- Windowing Examples
Appendix A: Connections Reference
- Connections Overview
- Cloud Provisioning Configuration
  - AWS Cloud Provisioning Configuration Properties
  - Azure Cloud Provisioning Configuration Properties
  - Databricks Cloud Provisioning Configuration Properties
- Amazon Redshift Connection Properties
- Amazon S3 Connection Properties
- Blockchain Connection Properties
- Cassandra Connection Properties
- Confluent Kafka Connection
  - General Properties
  - Confluent Kafka Broker Properties
  - SSL Properties
  - Creating a Confluent Kafka Connection Using infacmd
- Databricks Connection Properties
- Google Analytics Connection Properties
- Google BigQuery Connection Properties
- Google Cloud Spanner Connection Properties
- Google Cloud Storage Connection Properties
- Google PubSub Connection Properties
- Hadoop Connection Properties
  - Hadoop Cluster Properties
  - Common Properties
  - Reject Directory Properties
  - Blaze Configuration
  - Spark Configuration
- HDFS or View File System (ViewFS) Connection Properties
- HBase Connection Properties
- HBase Connection Properties for MapR-DB
- Hive Connection Properties
- JDBC Connection Properties
  - JDBC Connection String
  - Sqoop Connection-Level Arguments
  - Delta Lake JDBC Connection Properties
- JDBC V2 Connection Properties
- Kafka Connection Properties
  - General Properties
  - Kafka Broker Properties
  - SSL Properties
  - Creating a Kafka Connection Using infacmd
- Kudu Connection Properties
- Microsoft Azure Blob Storage Connection Properties
- Microsoft Azure Cosmos DB SQL API Connection Properties
- Microsoft Azure Data Lake Storage Gen1 Connection Properties
- Microsoft Azure Data Lake Storage Gen2 Connection Properties
- Microsoft Azure SQL Data Warehouse Connection Properties
- Snowflake Connection Properties
- Creating a Connection to Access Sources or Targets
- Creating a Hadoop Connection
- Configuring Hadoop Connection Properties
  - Cluster Environment Variables
  - Cluster Library Path
  - Common Advanced Properties
  - Blaze Engine Advanced Properties
  - Spark Advanced Properties
Appendix B: Data Type Reference
- Data Type Reference Overview
- Transformation Data Type Support in a Non-native Environment
- Complex File and Transformation Data Types
- Flat File and Transformation Data Types
- Hive Data Types and Transformation Data Types
  - Hive Complex Data Types
- Sqoop Data Types
Appendix C: Function Reference
- Function Support in a Non-native Environment
- Function and Data Type Processing

User Guide

10.5.4
- 10.5.8
- 10.5.7
- 10.5.6
- 10.5.5
- 10.5.3
- 10.5.2
- 10.5.1
- 10.5
- 10.4.1
- 10.4.0
- 10.2.2 HotFix 1
- 10.2.2 Service Pack 1
- 10.2.2
- 10.2.1

Back Next

Rules and Guidelines for Processing Hierarchical Data on the Spark Engine

There are processing differences when you work with complex data types in a mapping that runs on the Spark engine.

Consider the following rules and guidelines when you use complex data types in a mapping that runs on the Spark engine:

You cannot read hierarchical data from or write hierarchical data to a Hive source in a dynamic mapping.

When you read hierarchical data from a Hive source, you cannot enable Hive LLAP for Hive queries.

When you read hierarchical data from a Hive source, the Spark engine converts float type data to double. Use the double data type when you read from and write to a Hive source to prevent precision errors.

When you write date/time data within a complex data type to a Hive target using HDP 3.1, configure the timezone as UTC. In the Hadoop connection Spark advanced properties, append “-Duser.timezone=UTC” to the end of the value for the following properties:

spark.driver.extraJavaOptions

spark.executor.extraJavaOptions

Hierarchical Data Processing

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal