Hadoop Files V2 Connector

Hadoop Files V2 Connector

Creating a Hadoop Files V2 connection

Creating a Hadoop Files V2 connection

To use Hadoop Files V2 Connector in a
mapping
task, you must create a connection in
Data Integration
.
Perform the following steps to create a Hive connection in
Data Integration
:
  1. On the
    Connections
    page, click
    New Connection
    .
    The
    New Connection
    page appears.
  2. On the
    New Connections
    page, configure the following connection properties:
    Connection property
    Description
    Connection Name
    Name of the
    Hadoop Files
    V2 connection.
    Description
    Description of the connection. The description cannot exceed 765 characters.
    Type
    Type of connection. Select
    Hadoop Files
    V2.
    Runtime Environment
    The name of the runtime environment where you want to run the tasks.
    User Name
    Required to read data from HDFS. Enter a user name that has access to the single-node HDFS location to read data from or write data to.
    NameNode URI
    The URI to access HDFS.
    Use the following format to specify the name node URI in Cloudera, Amazon EMR, and Hortonworks distributions:
    hdfs://<namenode>:<port>/
    Where
    • <namenode>
      is the host name or IP address of the name node.
    • <port>
      is the port that the name node listens for remote procedure calls (RPC).
    If the Hadoop cluster is configured for high availability, you must copy the
    fs.defaultFS
    value in the
    core-site.xml
    file and append
    /
    to specify the name node URI.
    For example, the following snippet shows the
    fs.defaultFS
    value in a sample
    core-site.xml
    file:
    <property> <name>fs.defaultFS</name> <value>hdfs://nameservice1</value> <source>core-site.xml</source> </property>
    In the above snippet, the
    fs.defaultFS
    value is
    hdfs://nameservice1
    and the corresponding name node URI is
    hdfs://nameservice1/
    Specify either the name node URI or the local path. Do not specify the name node URI if you want to read data from or write data to a local file system path.
    Local Path
    A local file system path to read data from or write data to. Do not specify local path if you want to read data from or write data to HDFS. Read the following conditions to specify the local path:
    • You must enter
      NA
      in local path if you specify the name node URI. If the local path does not contain
      NA
      , the name node URI does not work.
    • If you specify the name node URI and local path, the local path takes the preference. The connection uses the local path to run all tasks.
    • If you leave the local path blank, the agent configures the root directory (/) in the connection. The connection uses the local path to run all tasks.
    Default value for Local Path is NA.
    Configuration Files Path
    The directory that contains the Hadoop configuration files for the client.
    Keytab File
    The file that contains encrypted keys and Kerberos principals to authenticate the machine.
    Principle Name
    Users assigned to the superuser privilege can perform all the tasks that a user with the administrator privilege can perform.
    Impersonation Username
    You can enable different users to run mappings in a Hadoop cluster that uses Kerberos authentication or connect to sources and targets that use Kerberos authentication. To enable different users to run mappings or connect to big data sources and targets, you must configure user impersonation.
  3. Click
    Test Connection
    to evaluate the connection.
    The following image shows the connection page details in Kerberos distribution:
    connection properties on kerberos
    The following image shows the connection page details in non-Kerberos distribution:
    connection properties on non-kerberos
  4. Click
    Save
    to save the connection.

0 COMMENTS

We’d like to hear from you!