Table of Contents

Search

  1. Preface
  2. Command Line Programs and Utilities
  3. Installing and Configuring Command Line Utilities
  4. Using the Command Line Programs
  5. Environment Variables for Command Line Programs
  6. Using infacmd
  7. infacmd as Command Reference
  8. infacmd aud Command Reference
  9. infacmd autotune Command Reference
  10. Infacmd bg Command Reference
  11. infacmd ccps Command Reference
  12. infacmd cluster Command Reference
  13. infacmd cms Command Reference
  14. infacmd dis Command Reference
  15. infacmd dps Command Reference
  16. infacmd edl Command Reference
  17. Infacmd es Command Reference
  18. infacmd ihs Command Reference
  19. infacmd ipc Command Reference
  20. infacmd isp Command Reference
  21. infacmd ldm Command Reference
  22. infacmd mas Command Reference
  23. infacmd mi Command Reference
  24. infacmd mrs Command Reference
  25. infacmd ms Command Reference
  26. infacmd oie Command Reference
  27. infacmd ps Command Reference
  28. infacmd pwx Command Reference
  29. infacmd roh Command Reference
  30. infacmd rms Command Reference
  31. infacmd rtm Command Reference
  32. infacmd sch Command Reference
  33. infacmd search Command Reference
  34. infacmd sql Command Reference
  35. infacmd tdm Command Reference
  36. infacmd tools Command Reference
  37. infacmd wfs Command Reference
  38. infacmd ws Command Reference
  39. infacmd xrf Command Reference
  40. infacmd Control Files
  41. infasetup Command Reference
  42. pmcmd Command Reference
  43. pmrep Command Reference
  44. Working with pmrep Files

Command Reference

Command Reference

Hive Connection Options

Hive Connection Options

Use connection options to define a Hive connection.
Enter connection options in the following format:
... -o option_name='value' option_name='value' ...
To enter multiple options, separate them with a space.
The following table describes Hive connection options for infacmd isp CreateConnection and UpdateConnection commands that you configure when you want to use the Hive connection:
Option Description
connectionType Required. Type of connection is HIVE.
name The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
relationalSourceAndTarget Hive connection mode. Set this option to true if you want to use the connection to access the Hive data warehouse. If you want to access Hive target, you need to enable the same connection or another Hive connection to run the mapping in the Hadoop cluster.
If you enable relational source and target, you must provide the metadataDatabaseString option.
pushDownMode Hive connection mode. Set this option to true if you want to use the connection to run mappings in the Hadoop cluster.
If you enable the connection for pushdown mode, you must provide the options to run the Informatica mappings in the Hadoop cluster.
environmentSQL
SQL commands to set the Hadoop environment. In native environment type, the Data Integration Service executes the environment SQL each time it creates a connection to Hive metastore. If the Hive connection is used to run mappings in the Hadoop cluster, the Data Integration Service executes the environment SQL at the beginning of each Hive session.
The following rules and guidelines apply to the usage of environment SQL in both the connection modes:
  • Use the environment SQL to specify Hive queries.
  • Use the environment SQL to set the classpath for Hive user-defined functions and then use either environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. If you use Hive user-defined functions, you must copy the .jar files to the following directory:
    <Informatica installation directory>/services/shared/hadoop/<Hadoop distribution name>/extras/hive-auxjars
  • You can also use environment SQL to define Hadoop or Hive parameters that you intend to use in the PreSQL commands or in custom queries.
If the Hive connection is used to run mappings in the Hadoop cluster, only the environment SQL of the Hive connection is executed. The different environment SQL commands for the connections of the Hive source or target are not executed, even if the Hive sources and targets are on different clusters.
quoteChar The type of character used to identify special characters and reserved SQL keywords, such as WHERE. The Data Integration Service places the selected character around special characters and reserved SQL keywords. The Data Integration Service also uses this character for the Support mixed-case identifiers property.
clusterConfigId
The cluster configuration ID associated with the Hadoop cluster. You must enter a configuration ID to set up a Hadoop connection.

Properties to Access Hive as Source or Target

The following table describes the mandatory options for infacmd isp CreateConnection and UpdateConnection commands that you configure when you want to use the Hive connection to access Hive data:
Property Description
hiveJdbcDriverClassName Name of the JDBC driver class.
metadataConnString
The JDBC connection URI used to access the metadata from the Hadoop server.
The connection string uses the following format:
jdbc:hive://<hostname>:<port>/<db>
Where
  • hostname is name or IP address of the machine on which the Hive server is running.
  • port is the port on which the Hive server is listening.
  • db is the database to which you want to connect. If you do not provide the database details, the Data Integration Service uses the default database details.
To connect to HiveServer 2, use the connection string format that Apache Hive implements for that specific Hadoop Distribution. For more information about Apache Hive connection string formats, see the Apache Hive documentation.
If the Hadoop cluster uses SSL or TLS authentication, you must add ssl=true to the JDBC connection URI. For example: jdbc:hive2://<hostname>:<port>/<db>;ssl=true
If you use self-signed certificate for SSL or TLS authentication, ensure that the certificate file is available on the client machine and the Data Integration Service machine. For more information, see the Informatica Big Data Management Cluster Integration Guide.
bypassHiveJDBCServer JDBC driver mode. Enable this option to use the embedded JDBC driver (embedded mode).
To use the JDBC embedded mode, perform the following tasks:
  • Verify that Hive client and Informatica Services are installed on the same machine.
  • Configure the Hive connection properties to run mappings in the Hadoop cluster.
If you choose the non-embedded mode, you must configure the Data Access Connection String.
The JDBC embedded mode is preferred to the non-embedded mode.
sqlAuthorized When you select the option to observe fine-grained SQL authentication in a Hive source, the mapping observes row and column-level restrictions on data access. If you do not select the option, the Blaze run-time engine ignores the restrictions, and results include restricted data.
Applicable to Hadoop clusters where Sentry or Ranger security modes are enabled.
connectString
The connection string used to access data from the Hadoop data store. The non-embedded JDBC mode connection string must be in the following format:
jdbc:hive://<hostname>:<port>/<db>
Where
  • hostname is name or IP address of the machine on which the Hive server is running.
  • port is the port on which the Hive server is listening. Default is 10000.
  • db is the database to which you want to connect. If you do not provide the database details, the Data Integration Service uses the default database details.
To connect to HiveServer 2, use the connection string format that Apache Hive implements for that specific Hadoop Distribution. For more information about Apache Hive connection string formats, see the Apache Hive documentation.
If the Hadoop cluster uses SSL or TLS authentication, you must add ssl=true to the JDBC connection URI. For example: jdbc:hive2://<hostname>:<port>/<db>;ssl=true
If you use self-signed certificate for SSL or TLS authentication, ensure that the certificate file is available on the client machine and the Data Integration Service machine. For more information, see the Informatica Big Data Management Cluster Integration Guide.

Properties to Run Mappings in the Hadoop Cluster

The following table describes the mandatory options for infacmd isp CreateConnection and UpdateConnection commands that you configure when you want to use the Hive connection to run Informatica mappings in the Hadoop cluster:
Property Description
databaseName Namespace for tables. Use the name default for tables that do not have a specified database name.
customProperties Configures or overrides Hive or Hadoop cluster properties in the hive-site.xml configuration set on the machine on which the Data Integration Service runs. You can specify multiple properties.
Select Edit to specify the name and value for the property. The property appears in the following format:
<property1>=<value>
When you specify multiple properties, &: appears as the property separator.
The maximum length for the format is 1 MB.
If you enter a required property for a Hive connection, it overrides the property that you configure in the Advanced Hive/Hadoop Properties.
The Data Integration Service adds or sets these properties for each map-reduce job. You can verify these properties in the JobConf of each mapper and reducer job. Access the JobConf of each job from the Jobtracker URL under each map-reduce job.
The Data Integration Service writes messages for these properties to the Data Integration Service logs. The Data Integration Service must have the log tracing level set to log each row or have the log tracing level set to verbose initialization tracing.
For example, specify the following properties to control and limit the number of reducers to run a mapping job:
mapred.reduce.tasks=2&:hive.exec.reducers.max=10
stgDataCompressionCodecClass
Codec class name that enables data compression and improves performance on temporary staging tables. The codec class name corresponds to the code type.
stgDataCompressionCodecType
Hadoop compression library for a compression codec class name.
You can choose None, Zlib, Gzip, Snappy, Bz2, LZO, or Custom.
Default is None.


Updated August 15, 2019