Table of Contents

Search

  1. Preface
  2. Installation Overview
  3. Before You Begin
  4. Pre-Installation Tasks
  5. Data Integration Hub Installation
  6. Post-Installation Tasks
  7. Upgrading Data Integration Hub
  8. Starting and Stopping Data Integration Hub
  9. Optional Data Integration Hub Configuration
  10. Installing and Configuring the Data Integration Hub Accelerator for Data Archive
  11. Troubleshooting the Data Integration Hub Installation
  12. Uninstallation

Installation and Configuration Guide

Installation and Configuration Guide

Configuring the Environment for a Hadoop Publication Repository

Configuring the Environment for a Hadoop Publication Repository

If you installed the
Data Integration Hub
Hadoop Service component, configure the environment for a Hadoop publication repository.
To complete the task you perform the following actions:
  1. Configure settings in the
    dx-configuration.properties
    file.
  2. Verify references in the
    dih-hadoop-service.xml
    file.
  3. Enable an option in Cloudera Manager.
  1. On the machine where the
    Data Integration Hub
    Hadoop Service is installed, use a text editor to open the
    dx-configuration.properties
    file from the following location:
    <
    DIH
    InstallationDir>/DataIntegrationHub/tomcat/shared/classes/
  2. Set the properties in the following sections of the file and then save the file:
    HIVE settings
    Property
    Description
    dih.hadoop.hive.username
    User name for connecting to the Apache Hive server.
    dih.hadoop.hive.password
    Password of the user that connects to the Apache Hive server.
    dih.hadoop.hive.url
    URL to the Apache Hive server in the following format:
    jdbc:hive2://<hostname>:<port>/<schema>
    Where:
    • hostname
      is the host name or IP number of the server.
    • port
      is the port number of the server. Default is 10000.
    • schema
      is the schema used with the Hive warehouse. Default is default. If the Hive warehouse uses a non default schema, set the property
      dih.hadoop.service.warehouse.dir
      .
    dih.hadoop.service.warehouse.dir
    Path to the Hive warehouse directory. Required if the Apache Hive server uses a non default schema. If the Apache Hive server uses a default schema, do not enter a value for this property.
    For example:
    dih.hadoop.hive.username=hive dih.hadoop.hive.password=password dih.hadoop.hive.url=jdbc:hive2://hive_host:10000/myschema dih.hadoop.service.warehouse.dir=/user/hive/mydatawarehousedir
    SPARK settings
    Property
    Description
    dih.hadoop.service.spark.version
    Apache Spark version. Takes the following values:
    • 1.2. Apache Spark version 1.2.
    • 1.3. Apache Spark version 1.3 and higher.
    Default is 1.3.
    dih.hadoop.service.spark.url
    Apache Spark URL.
    • If Apache Spark is running in YARN mode, use the default value:
      dih.hadoop.service.spark.url=yarn
    • If Apache Spark is running in standalone mode, enter the URL in the following format:
      spark://master_host:<port_number>
      Where:
      • master_host
        is the Master daemon which coordinates the operations of the Workers, which run the executors.
      • <port_number>
        is the port number of the Master daemon. Default is 7077.
      For example:
      spark://Mymasterhost:7077
      The value you enter here must be identical to the value that is shown in the Spark console. By default, the Spark console is located at
      http://<host_name>:18080
      .
    dih.hadoop.service.spark.additional.args
    Additional arguments for running jobs.
    For example:
    --executor-memory 20G --total-executor-cores 100
    For a complete list of arguments see the Spark documentation.
    KERBEROS settings
    If the Hadoop cluster uses Kerberos authentication, configure the following settings:
    Property
    Description
    dih.hadoop.principal
    Kerberos principal name in the following format:
    <principal>/<domain>@<realm>
    dih.hadoop.keytab.path
    Location and name of the keytab file.
    For example:
    dih.hadoop.principal=infa/admin@informatica.com dih.hadoop.keytab.path=/etc/security/keytabs/infa.keytab
    The file name does not have to be
    infa.keytab
    .
  3. On the machine where the
    Data Integration Hub
    Hadoop Service is installed, use a text editor to open the
    dih-hadoop-service.xml
    file from the following location:
    <
    DIH
    InstallationDir>/DataIntegrationHub/tomcat/conf/Catalina/localhost
  4. Verify that the file contains the correct references to your Hadoop classpath configuration. By default, the file references a Cloudera VM configuration in the section
    Cloudera VM - sample default configurations
    .
    The file also contains references to Cloudera Manager and to Hortonworks VM, in the following commented out sections:
    Cloudera CDH - sample cloudera manager configurations
    and
    Hortonworks VM - sample default configurations
    . If required, comment out the
    Cloudera VM - sample default configurations
    section and uncomment the section that is appropriate to your configuration.
  5. In Cloudera Manager, enable the option
    Bind NameNode to Wildcard Address
    and then restart the HDFS service.

Back to Top

0 COMMENTS

We’d like to hear from you!