Preface
Introduction to Informatica Data Engineering Integration
- Informatica Data Engineering Integration Overview
  - Example
- Data Engineering Integration Component Architecture
- Data Engineering Integration Engines
- Data Engineering Process
- Data Warehouse Optimization Mapping Example
Mappings
- Overview of Mappings
- Mapping Run-time Properties
- PreSQL and PostSQL Queries for JDBC Sources
- Sqoop Mappings in a Hadoop Environment
- Mapping Output Binding
- Rules and Guidelines for Mappings in a Non-native Environment
- Workflows that Run Mappings in a Non-native Environment
- Configuring a Mapping to Run in a Non-native Environment
  - Configure Mappings to Run on Dataproc
- Audits
- Mapping Execution Plans
- Troubleshooting Mappings in a Non-native Environment
- Mappings in the Native Environment
Mapping Optimization
- Mapping Optimization
- Mapping Recommendations and Analysis
- Enabling Data Compression on Temporary Staging Tables
  - Step 1. Enable Data Compression in the Hadoop Connection
  - Step 2. Enable Data Compression on the Hadoop Environment
- Truncating Partitions in a Hive Target
- Hive Warehouse Connector and Hive LLAP
  - Enabling the Hive Warehouse Connector and Hive LLAP
- Scheduling, Queuing, and Node Labeling
- Data Engineering Recovery
- Spark Engine Optimization for Sqoop Pass-Through Mappings
Sources
- Overview of Sources
- PowerExchange Adapter Sources
- Sources on Databricks
- File Sources on Hadoop
- Relational Sources on Hadoop
- Hive Sources on Hadoop
  - PreSQL and PostSQL Commands
  - Rules and Guidelines for Hive Sources on the Blaze Engine
- Sqoop Sources on Hadoop
Targets
- Overview of Targets
- PowerExchange Adapter Targets
- Targets on Databricks
- File Targets on Hadoop
- Message Targets on Hadoop
- Relational Targets on Hadoop
- Hive Targets on Hadoop
- Sqoop Targets on Hadoop
  - Rules and Guidelines for Sqoop Targets
Transformations
- Overview of Transformations
- Address Validator Transformation in a Non-native Environment
  - Address Validator Transformation on the Blaze Engine
  - Address Validator Transformation on the Spark Engine
    - Address Validator Transformation in a Streaming Mapping
  - Address Validator Transformation on the Databricks Spark Engine
- Aggregator Transformation in a Non-native Environment
  - Aggregator Transformation on the Blaze Engine
  - Aggregator Transformation on the Spark Engine
    - Aggregator Transformation in a Streaming Mapping
  - Aggregator Transformation on the Databricks Spark Engine
- Case Converter Transformation in a Non-native Environment
- Classifier Transformation in a Non-native Environment
- Comparison Transformation in a Non-native Environment
- Consolidation Transformation in a Non-native Environment
  - Consolidation Transformation on the Blaze Engine
  - Consolidation Transformation on the Spark Engine
  - Consolidation Transformation on the Databricks Spark Engine
- Data Masking Transformation in a Non-native Environment
  - Data Masking Transformation on the Blaze Engine
  - Data Masking Transformation on the Spark Engine
    - Data Masking Transformation in a Streaming Mapping
- Data Processor Transformation in a Non-native Environment
- Decision Transformation in a Non-native Environment
  - Decision Transformation on the Spark Engine
  - Decision Transformation on the Databricks Spark Engine
- Expression Transformation in a Non-native Environment
  - Expression Transformation on the Blaze Engine
  - Expression Transformation on the Spark Engine
    - Expression Transformation in a Streaming Mapping
  - Expression Transformation on the Databricks Spark Engine
- Filter Transformation in a Non-native Environment
  - Filter Transformation on the Blaze Engine
- Hierarchical to Relational Transformation in a Non-native Environment
- Java Transformation in a Non-native Environment
  - Java Transformation on the Blaze Engine
  - Java Transformation on the Spark Engine
    - Java Transformation in a Streaming Mapping
- Joiner Transformation in a Non-native Environment
  - Joiner Transformation on the Blaze Engine
  - Joiner Transformation on the Spark Engine
    - Joiner Transformation in a Streaming Mapping
  - Joiner Transformation on the Databricks Spark Engine
- Key Generator Transformation in a Non-native Environment
  - Key Generator Transformation on the Blaze Engine
  - Key Generator Transformation on the Spark Engine
  - Key Generator Transformation on the Databricks Spark Engine
- Labeler Transformation in a Non-native Environment
- Lookup Transformation in a Non-native Environment
  - Lookup Transformation on the Blaze Engine
  - Lookup Transformation on the Spark Engine
    - Lookup Transformation in a Streaming Mapping
  - Lookup Transformation on the Databricks Spark Engine
- Macro Transformation in a Non-native Environment
- Match Transformation in a Non-native Environment
  - Match Transformation on the Blaze Engine
  - Match Transformation on the Spark Engine
  - Match Transformation on the Databricks Spark Engine
- Merge Transformation in a Non-native Environment
- Normalizer Transformation in a Non-native Environment
- Parser Transformation in a Non-native Environment
- Rank Transformation in a Non-native Environment
  - Rank Transformation on the Blaze Engine
  - Rank Transformation on the Spark Engine
    - Rank Transformation in a Streaming Mapping
  - Rank Transformation on the Databricks Spark Engine
- Relational to Hierarchical Transformation in a Non-native Environment
- Router Transformation in a Non-native Environment
- Rule Specification Transformation in a Non-native Environment
- Sequence Generator Transformation in a Non-native Environment
  - Sequence Generator Transformation on the Blaze Engine
  - Sequence Generator Transformation on the Spark Engine
  - Sequence Generator Transformation on the Databricks Spark Engine
- Sorter Transformation in a Non-native Environment
  - Sorter Transformation on the Blaze Engine
  - Sorter Transformation on the Spark Engine
    - Sorter Transformation in a Streaming Mapping
  - Sorter Transformation on the Databricks Spark Engine
- Standardizer Transformation in a Non-native Environment
- Union Transformation in a Non-native Environment
  - Union Transformation in a Streaming Mapping
- Update Strategy Transformation in a Non-native Environment
  - Update Strategy Transformation on the Blaze Engine
  - Update Strategy Transformation on the Spark Engine
  - Update Strategy Transformation on the Databricks Spark Engine
- Weighted Average Transformation in a Non-native Environment
Python Transformation
- Python Transformation Overview
- Python Transformation Ports
- Python Transformation Advanced Properties
- Python Transformation Components
  - Resource File
  - Python Code
- Rules and Guidelines for the Python Transformation
  - Python Transformation in a Streaming Mapping
- Creating a Python Transformation
  - Creating a Reusable Python Transformation
  - Creating a Non-Reusable Python Transformation
- Example: Add an ID Column to Nonpartitioned Data
- Example: Use Partitions to Find the Highest Salary
- Use Case: Operationalize a Pre-Trained Model
Data Preview
- Overview of Data Preview
  - Connections and Cluster Distributions that Support Data Preview
- Data Preview Process
- Previewing Data
- Data Preview Interface for Hierarchical Data
- Data Preview on Transformations
- Data Preview Logs
- Rules and Guidelines for Data Preview on the Spark Engine
Cluster Workflows
- Cluster Workflows Overview
  - Cluster Workflows Platform Support
- Cluster Workflow Components
- Cluster Workflows Process
- Create the Workflow and the Create Cluster Task
- Where to Configure Parameters for Cluster Creation
  - Using the Developer Tool to Configure Cluster Creation Properties
  - Using a JSON File to Configure Cluster Creation Properties
    - Create the JSON File
    - Enable the JSON File
- Add a Mapping Task
- Add a Delete Cluster Task
- Deploy and Run the Workflow
  - Monitoring Azure HDInsight Cluster Workflow Jobs
- Configure Databricks Clusters Using Warm Pools
- Create Cluster Task Properties
Profiles
- Profiles Overview
- Native Environment
- Hadoop Environment
  - Column Profiles for Sqoop Data Sources
- Sampling Options
- Creating a Single Data Object Profile in Informatica Developer
- Creating an Enterprise Discovery Profile in Informatica Developer
- Creating a Column Profile in Informatica Analyst
- Creating an Enterprise Discovery Profile in Informatica Analyst
- Creating a Scorecard in Informatica Analyst
- Monitoring a Profile
- Profiling Functionality Support
- Troubleshooting
Monitoring
- Overview of Monitoring
- Hadoop Environment Logs
- Blaze Engine Monitoring
- Spark Engine Monitoring
Hierarchical Data Processing
- Overview of Hierarchical Data Processing
- How to Develop a Mapping to Process Hierarchical Data
- Complex Data Types
- Complex Ports
- Complex Data Type Definitions
- Type Configuration
- Complex Operators
  - Extracting an Array Element Using a Subscript Operator
  - Extracting a Struct Element Using the Dot Operator
- Complex Functions
- Rules and Guidelines for Processing Hierarchical Data on the Spark Engine
- Midstream Parsing of Hierarchical Data
Hierarchical Data Processing Configuration
- Hierarchical Data Conversion
- Convert Relational or Hierarchical Data to Struct Data
  - Creating a Struct Port
- Convert Relational or Hierarchical Data to Nested Struct Data
  - Creating A Nested Complex Port
- Extract Elements from Hierarchical Data
  - Extracting Elements from a Complex Port
- Flatten Hierarchical Data
  - Flattening a Complex Port
Hierarchical Data Processing with Schema Changes
- Overview of Hierarchical Data Processing with Schema Changes
- How to Develop a Dynamic Mapping to Process Schema Changes in Hierarchical Data
- Flatten Hierarchical Data with Schema Changes
  - Flatten a Dynamic Struct
- Dynamic Complex Ports
  - Dynamic Ports and Dynamic Complex Ports
  - Dynamic Complex Ports in Transformations
- Input Rules for a Dynamic Complex Port
- Port Selectors for Dynamic Complex Ports
- Dynamic Expressions
  - Example - Dynamic Expression to Construct a Dynamic Struct
- Complex Operators
- Complex Functions
- Rules and Guidelines for Dynamic Complex Ports
- Optimized Mappings
Intelligent Structure Models
- Overview of Intelligent Structure Models
- Intelligent Structure Discovery Process
- Use Case
- Using an Intelligent Structure Model in a Mapping
- Rules and Guidelines for Intelligent Structure Models
- How to Create a Mapping with an Intelligent Structure Model
  - Mapping Example
- Create an Intelligent Structure Model in Cloud Data Integration
Blockchain
- Blockchain Overview
  - Blockchain Process
- Blockchain Data Objects
- Blockchain Data Object Operations
- Use Case: Using a Blockchain Source to Improve Services in a Vehicle Lifecycle
  - Mapping Overview
Stateful Computing
- Overview of Stateful Computing
- Windowing Configuration
- Window Functions
- Windowing Examples
Appendix A: Connections Reference
- Connections Overview
- Cloud Provisioning Configuration
  - AWS Cloud Provisioning Configuration Properties
  - Azure Cloud Provisioning Configuration Properties
  - Databricks Cloud Provisioning Configuration Properties
- Amazon Redshift Connection Properties
- Amazon S3 Connection Properties
- Blockchain Connection Properties
- Cassandra Connection Properties
- Confluent Kafka Connection
  - General Properties
  - Confluent Kafka Broker Properties
  - SSL Properties
  - Creating a Confluent Kafka Connection Using infacmd
- Databricks Connection Properties
- Google Analytics Connection Properties
- Google BigQuery Connection Properties
- Google Cloud Spanner Connection Properties
- Google Cloud Storage Connection Properties
- Google PubSub Connection Properties
- Hadoop Connection Properties
  - Hadoop Cluster Properties
  - Common Properties
  - Reject Directory Properties
  - Blaze Configuration
  - Spark Configuration
- HDFS Connection Properties
- HBase Connection Properties
- HBase Connection Properties for MapR-DB
- Hive Connection Properties
- JDBC Connection Properties
  - JDBC Connection String
  - Sqoop Connection-Level Arguments
  - Delta Lake JDBC Connection Properties
- JDBC V2 Connection Properties
- Kafka Connection Properties
  - General Properties
  - Kafka Broker Properties
  - SSL Properties
  - Creating a Kafka Connection Using infacmd
- Kudu Connection Properties
- Microsoft Azure Blob Storage Connection Properties
- Microsoft Azure Cosmos DB SQL API Connection Properties
- Microsoft Azure Data Lake Storage Gen1 Connection Properties
- Microsoft Azure Data Lake Storage Gen2 Connection Properties
- Microsoft Azure SQL Data Warehouse Connection Properties
- Snowflake Connection Properties
- Creating a Connection to Access Sources or Targets
- Creating a Hadoop Connection
- Configuring Hadoop Connection Properties
  - Cluster Environment Variables
  - Cluster Library Path
  - Common Advanced Properties
  - Blaze Engine Advanced Properties
  - Spark Advanced Properties
Appendix B: Data Type Reference
- Data Type Reference Overview
- Transformation Data Type Support in a Non-native Environment
- Complex File and Transformation Data Types
- Flat File and Transformation Data Types
- Hive Data Types and Transformation Data Types
  - Hive Complex Data Types
- Sqoop Data Types
Appendix C: Function Reference
- Function Support in a Non-native Environment
- Function and Data Type Processing

User Guide

10.5.2
- 10.5.9
- 10.5.8
- 10.5.7
- 10.5.6
- 10.5.5
- 10.5.4
- 10.5.3
- 10.5.10
- 10.5.1
- 10.5
- 10.4.1
- 10.4.0
- 10.2.2 HotFix 1
- 10.2.2 Service Pack 1
- 10.2.2
- 10.2.1

Back Next

Sqoop Connection-Level Arguments

In the JDBC connection, you can define the arguments that Sqoop must use to connect to the database. The Data Integration Service merges the arguments that you specify with the default command that it constructs based on the JDBC connection properties. The arguments that you specify take precedence over the JDBC connection properties.

If you want to use the same driver to import metadata and run the mapping, and do not want to specify any additional Sqoop arguments, select

Sqoop v1.x

from the

Use Sqoop Version

list and leave the

Sqoop Arguments

field empty in the JDBC connection. The Data Integration Service constructs the Sqoop command based on the JDBC connection properties that you specify.

However, if you want to use a different driver for run-time tasks or specify additional run-time Sqoop arguments, select

Sqoop v1.x

from the

Use Sqoop Version

list and specify the arguments in the

Sqoop Arguments

field.

A mapping that contains an Update Strategy transformation cannot use a Sqoop-enabled JDBC connection to write to a target. To run the mapping, disable the Sqoop connector in the Write transformation.

You can configure the following Sqoop arguments in the JDBC connection:

driver: Defines the JDBC driver class that Sqoop must use to connect to the database.

Use the following syntax:

--driver <JDBC driver class>

For example, use the following syntax depending on the database type that you want to connect to:

Aurora:
--driver com.mysql.jdbc.Driver

Greenplum:
--driver org.postgresql.Driver

IBM DB2:
--driver com.ibm.db2.jcc.DB2Driver

IBM DB2 z/OS:
--driver com.ibm.db2.jcc.DB2Driver

Microsoft SQL Server:
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver

Netezza:
--driver org.netezza.Driver

Oracle:
--driver oracle.jdbc.driver.OracleDriver

Teradata:
--driver com.teradata.jdbc.TeraDriver
connect: Defines the JDBC connection string that Sqoop must use to connect to the database. The JDBC connection string must be based on the driver that you define in the driver argument.

Use the following syntax:

--connect <JDBC connection string>

For example, use the following syntax depending on the database type that you want to connect to:

Aurora:
--connect "jdbc:mysql://<host_name>:<port>/<schema_name>"

Greenplum:
--connect jdbc:postgresql://<host_name>:<port>/<database_name>

IBM DB2:
--connect jdbc:db2://<host_name>:<port>/<database_name>

IBM DB2 z/OS:
--connect jdbc:db2://<host_name>:<port>/<database_name>

Microsoft SQL Server:
--connect jdbc:sqlserver://<host_name>:<port or named_instance>;databaseName=<database_name>

Netezza:
--connect "jdbc:netezza://<database_server_name>:<port>/<database_name>;schema=<schema_name>"

Oracle:
--connect jdbc:oracle:thin:@<database_host_name>:<database_port>:<database_SID>

Teradata:
--connect jdbc:teradata://<host_name>/database=<database_name>

Use the following syntax to connect to an SSL-enabled database:

--connect <JDBC connection string>

For example, use the following syntax depending on the database type that you want to connect to:

Microsoft SQL Server:
--connect jdbc:sqlserver://<host_name>:<port>;databaseName=<database_name>;integratedSecurity=false;encrypt=true;trustServerCertificate=true;TrustStore=/<truststore_location>;TrustStorePassword=<truststore_password>;user=<user_name>;password=<password>

Oracle:
--connect jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=<host>)(PORT=<port_number>))(CONNECT_DATA=(SERVICE_NAME=<service_name>)))"
connection-param-file: Defines the extra JDBC parameters through a property file that Sqoop must use to connect to the database. The contents of this file are parsed as standard Java properties and passed into the driver when you create a connection.; Use the following syntax:
--connection-param-file <parameter file name>

For example, use the following syntax to use the parameter file when you connect to the Oracle database.
--connection-param-file param_file
connection-manager: Defines the connection manager class name that Sqoop must use to connect to the database.

Use the following syntax:

--connection-manager <connection manager class name>

For example, use the following syntax to use the generic JDBC manager class name:

--connection-manager org.apache.sqoop.manager.GenericJdbcManager
direct: When you read data from or write data to Oracle, you can configure the direct argument to enable Sqoop to use OraOop. OraOop is a specialized Sqoop plug-in for Oracle that uses native protocols to connect to the Oracle database. When you configure OraOop, the performance improves.

You can configure OraOop when you run Sqoop mappings on the Spark engine.

Use the following syntax:

--direct

When you use OraOop, you must use the following syntax to specify multiple arguments:

-D<argument=value> -D<argument=value>

If you specify multiple arguments and include a space character between -D and the argument name-value pair, Sqoop considers only the first argument and ignores the remaining arguments.

If you do not direct the job to a specific queue, the Spark engine uses the default queue.
-Dsqoop.connection.factories: To run the mapping on the Blaze engine with the Teradata Connector for Hadoop (TDCH) specialized connectors for Sqoop, you must configure the -Dsqoop.connection.factories argument. Use the argument to define the TDCH connection factory class that Sqoop must use. The connection factory class varies based on the TDCH Sqoop Connector that you want to use.
To use Cloudera Connector Powered by Teradata, configure the -Dsqoop.connection.factories argument as follows:
-Dsqoop.connection.factories=com.cloudera.connector.teradata.TeradataManagerFactory

To use Hortonworks Connector for Teradata (powered by the Teradata Connector for Hadoop), configure the -Dsqoop.connection.factories argument as follows:
-Dsqoop.connection.factories=org.apache.sqoop.teradata.TeradataManagerFactory

To run the mapping on the Spark engine, you do not need to configure the -Dsqoop.connection.factories argument. The Data Integration Service invokes Cloudera Connector Powered by Teradata and Hortonworks Connector for Teradata (powered by the Teradata Connector for Hadoop) by default.
--infaoptimize: Use this argument to disable the performance optimization of Sqoop pass-through mappings on the Spark engine.

When you run a Sqoop pass-through mapping on the Spark engine, the Data Integration Service optimizes mapping performance in the following scenarios:
You read data from a Sqoop source and write data to a Hive target that uses the Text format.
You read data from a Sqoop source and write data to an HDFS target that uses the Flat, Avro, or Parquet format.

If you want to disable the performance optimization, set the --infaoptimize argument to false. For example, if you see data type issues after you run an optimized Sqoop mapping, you can disable the performance optimization.

Use the following syntax:

--infaoptimize false

For a complete list of the Sqoop arguments that you can configure, see the Sqoop documentation.

Rename Saved Search

Table of Contents

User Guide

User Guide

Sqoop Connection-Level Arguments

Sqoop Connection-Level Arguments