Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Administration
  3. Authentication
  4. Running Mappings on a Cluster with Kerberos Authentication
  5. Authorization
  6. Cluster Configuration
  7. Cloud Provisioning Configuration
  8. Data Integration Service Processing
  9. Appendix A: Connections Reference
  10. Appendix B: Monitoring REST API

Sqoop Connection-Level Arguments

Sqoop Connection-Level Arguments

In the JDBC connection, you can define the arguments that Sqoop must use to connect to the database. The Data Integration Service merges the arguments that you specify with the default command that it constructs based on the JDBC connection properties. The arguments that you specify take precedence over the JDBC connection properties.
If you want to use the same driver to import metadata and run the mapping, and do not want to specify any additional Sqoop arguments, select
Sqoop v1.x
from the
Use Sqoop Version
list and leave the
Sqoop Arguments
field empty in the JDBC connection. The Data Integration Service constructs the Sqoop command based on the JDBC connection properties that you specify.
However, if you want to use a different driver for run-time tasks or specify additional run-time Sqoop arguments, select
Sqoop v1.x
from the
Use Sqoop Version
list and specify the arguments in the
Sqoop Arguments
field.
A mapping that contains an Update Strategy transformation cannot use a Sqoop-enabled JDBC connection to write to a target. To run the mapping, disable the Sqoop connector in the Write transformation.
You can configure the following Sqoop arguments in the JDBC connection:
driver
Defines the JDBC driver class that Sqoop must use to connect to the database.
Use the following syntax:
--driver <JDBC driver class>
For example, use the following syntax depending on the database type that you want to connect to:
  • Aurora:
    --driver com.mysql.jdbc.Driver
  • Greenplum:
    --driver org.postgresql.Driver
  • IBM DB2:
    --driver com.ibm.db2.jcc.DB2Driver
  • IBM DB2 z/OS:
    --driver com.ibm.db2.jcc.DB2Driver
  • Microsoft SQL Server:
    --driver com.microsoft.sqlserver.jdbc.SQLServerDriver
  • Netezza:
    --driver org.netezza.Driver
  • Oracle:
    --driver oracle.jdbc.driver.OracleDriver
  • Teradata:
    --driver com.teradata.jdbc.TeraDriver
connect
Defines the JDBC connection string that Sqoop must use to connect to the database. The JDBC connection string must be based on the driver that you define in the driver argument.
Use the following syntax:
--connect <JDBC connection string>
For example, use the following syntax depending on the database type that you want to connect to:
  • Aurora:
    --connect "jdbc:mysql://<host_name>:<port>/<schema_name>"
  • Greenplum:
    --connect jdbc:postgresql://<host_name>:<port>/<database_name>
  • IBM DB2:
    --connect jdbc:db2://<host_name>:<port>/<database_name>
  • IBM DB2 z/OS:
    --connect jdbc:db2://<host_name>:<port>/<database_name>
  • Microsoft SQL Server:
    --connect jdbc:sqlserver://<host_name>:<port or named_instance>;databaseName=<database_name>
  • Netezza:
    --connect "jdbc:netezza://<database_server_name>:<port>/<database_name>;schema=<schema_name>"
  • Oracle:
    --connect jdbc:oracle:thin:@<database_host_name>:<database_port>:<database_SID>
  • Teradata:
    --connect jdbc:teradata://<host_name>/database=<database_name>
Use the following syntax to connect to an SSL-enabled database:
--connect <JDBC connection string>
For example, use the following syntax depending on the database type that you want to connect to:
  • Microsoft SQL Server:
    --connect jdbc:sqlserver://<host_name>:<port>;databaseName=<database_name>;integratedSecurity=false;encrypt=true;trustServerCertificate=true;TrustStore=/<truststore_location>;TrustStorePassword=<truststore_password>;user=<user_name>;password=<password>
  • Oracle:
    --connect jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=<host>)(PORT=<port_number>))(CONNECT_DATA=(SERVICE_NAME=<service_name>)))"
connection-param-file
Defines the extra JDBC parameters through a property file that Sqoop must use to connect to the database. The contents of this file are parsed as standard Java properties and passed into the driver when you create a connection.
Use the following syntax:
--connection-param-file <parameter file name>
For example, use the following syntax to use the parameter file when you connect to the Oracle database.
--connection-param-file param_file
connection-manager
Defines the connection manager class name that Sqoop must use to connect to the database.
Use the following syntax:
--connection-manager <connection manager class name>
For example, use the following syntax to use the generic JDBC manager class name:
--connection-manager org.apache.sqoop.manager.GenericJdbcManager
direct
When you read data from or write data to Oracle, you can configure the direct argument to enable Sqoop to use OraOop. OraOop is a specialized Sqoop plug-in for Oracle that uses native protocols to connect to the Oracle database. When you configure OraOop, the performance improves.
You can configure OraOop when you run Sqoop mappings on the Spark engine.
Use the following syntax:
--direct
When you use OraOop, you must use the following syntax to specify multiple arguments:
-D<argument=value> -D<argument=value>
If you specify multiple arguments and include a space character between -D and the argument name-value pair, Sqoop considers only the first argument and ignores the remaining arguments.
If you do not direct the job to a specific queue, the Spark engine uses the default queue.
-Dsqoop.connection.factories
To run the mapping on the Blaze engine with the Teradata Connector for Hadoop (TDCH) specialized connectors for Sqoop, you must configure the -Dsqoop.connection.factories argument. Use the argument to define the TDCH connection factory class that Sqoop must use. The connection factory class varies based on the TDCH Sqoop Connector that you want to use.
  • To use Cloudera Connector Powered by Teradata, configure the -Dsqoop.connection.factories argument as follows:
    -Dsqoop.connection.factories=com.cloudera.connector.teradata.TeradataManagerFactory
  • To use Hortonworks Connector for Teradata (powered by the Teradata Connector for Hadoop), configure the -Dsqoop.connection.factories argument as follows:
    -Dsqoop.connection.factories=org.apache.sqoop.teradata.TeradataManagerFactory
To run the mapping on the Spark engine, you do not need to configure the -Dsqoop.connection.factories argument. The Data Integration Service invokes Cloudera Connector Powered by Teradata and Hortonworks Connector for Teradata (powered by the Teradata Connector for Hadoop) by default.
--infaoptimize
Use this argument to disable the performance optimization of Sqoop pass-through mappings on the Spark engine.
When you run a Sqoop pass-through mapping on the Spark engine, the Data Integration Service optimizes mapping performance in the following scenarios:
  • You read data from a Sqoop source and write data to a Hive target that uses the Text format.
  • You read data from a Sqoop source and write data to an HDFS target that uses the Flat, Avro, or Parquet format.
If you want to disable the performance optimization, set the --infaoptimize argument to false. For example, if you see data type issues after you run an optimized Sqoop mapping, you can disable the performance optimization.
Use the following syntax:
--infaoptimize false
For a complete list of the Sqoop arguments that you can configure, see the Sqoop documentation.

0 COMMENTS

We’d like to hear from you!