Table of Contents

Search

  1. Preface
  2. Connectors and connections
  3. Connection configuration
  4. Connection properties
  5. Swagger file generation for REST V2 connections

Data Integration Connections

Data Integration Connections

Hadoop Files V2 connection properties

Hadoop Files V2 connection properties

When you set up a Hadoop Files V2 connection, you must configure the connection properties.
The following table describes the Hadoop Files V2 connection properties:
Connection property
Description
Connection Name
Name of the Hadoop Files V2 connection.
Description
Description of the connection. The description cannot exceed 765 characters.
Type
Type of connection. Select
Hadoop Files V2
.
Runtime Environment
The name of the runtime environment where you want to run the tasks.
User Name
Required to read data from HDFS. Enter a user name that has access to the single-node HDFS location to read data from or write data to.
NameNode URI
The URI to access HDFS.
Use the following format to specify the name node URI in Cloudera, Amazon EMR, and Hortonworks distributions:
hdfs://<namenode>:<port>/
Where
  • <namenode>
    is the host name or IP address of the name node.
  • <port>
    is the port that the name node listens for remote procedure calls (RPC).
If the Hadoop cluster is configured for high availability, you must copy the
fs.defaultFS
value in the
core-site.xml
file and append
/
to specify the name node URI.
For example, the following snippet shows the
fs.defaultFS
value in a sample
core-site.xml
file:
<property> <name>fs.defaultFS</name> <value>hdfs://nameservice1</value> <source>core-site.xml</source> </property>
In the above snippet, the
fs.defaultFS
value is
hdfs://nameservice1
and the corresponding name node URI is
hdfs://nameservice1/
Specify either the name node URI or the local path. Do not specify the name node URI if you want to read data from or write data to a local file system path.
Local Path
A local file system path to read and write data. Read the following conditions to specify the local path:
  • You must enter
    NA
    in local path if you specify the name node URI. If the local path does not contain
    NA
    , the name node URI does not work.
  • If you specify the name node URI and local path, the local path takes the preference. The connection uses the local path to run all tasks.
  • If you leave the local path blank, the agent configures the root directory (/) in the connection. The connection uses the local path to run all tasks.
  • If the file or directory is in the local system, enter the fully qualified path of the file or directory.
    For example,
    /user/testdir
    specifies the location of a directory in the local system.
Default value for Local Path is NA.
Configuration Files Path
The directory that contains the Hadoop configuration files.
Copy the core-site.xml, hdfs-site.xml, and hive-site.xmlfrom the Hadoop cluster and add them to a folder in Linux Box.
Keytab File
The file that contains encrypted keys and Kerberos principals to authenticate the machine.
Principal Name
Users assigned to the superuser privilege can perform all the tasks that a user with the administrator privilege can perform.
Impersonation Username
You can enable different users to run mappings in a Hadoop cluster that uses Kerberos authentication or connect to sources and targets that use Kerberos authentication. To enable different users to run mappings or connect to big data sources and targets, you must configure user impersonation.
When you read from or write to remote files, the
Name Node URI
and
Configuration Files Path
fields are mandatory. When you read from or write to local files only
Local Path
field is required.