Table of Contents

Search

  1. Preface
  2. Installation Overview
  3. Before You Begin
  4. Pre-Installation Tasks
  5. B2B Data Exchange Installation
  6. Post-Installation Tasks
  7. Installing the Partners Portal on Non-B2B Data Exchange Nodes
  8. Upgrading B2B Data Exchange
  9. Starting and Stopping B2B Data Exchange
  10. Optional B2B Data Exchange Configuration
  11. Installing and Configuring the B2B Data Exchange Accelerator for Data Archive
  12. Uninstallation

Installation and Configuration Guide

Installation and Configuration Guide

Configuring the Environment for a Hadoop Publication Repository

Configuring the Environment for a Hadoop Publication Repository

If you installed the
B2B Data Exchange
Hadoop Service component, configure the environment for a Hadoop publication repository.
To complete the task you perform the following actions:
  1. Configure settings in the
    dx-configuration.properties
    file.
  2. Verify references in the
    dih-hadoop-service.xml
    file.
  3. Enable an option in Cloudera Manager.
  1. On the machine where the
    B2B Data Exchange
    Hadoop Service is installed, use a text editor to open the
    dx-configuration.properties
    file from the following location:
    <
    DX
    InstallationDir>/DataIntegrationHub/tomcat/shared/classes/
  2. Set the properties in the following sections of the file and then save the file:
    HIVE settings
    Property
    Description
    dih.hadoop.hive.username
    User name for connecting to the Apache Hive server.
    dih.hadoop.hive.password
    Password of the user that connects to the Apache Hive server.
    dih.hadoop.hive.url
    URL to the Apache Hive server in the following format:
    jdbc:hive2://<hostname>:<port>/<schema>
    Where:
    • hostname
      is the host name or IP number of the server.
    • port
      is the port number of the server. Default is 10000.
    • schema
      is the schema used with the Hive warehouse. Default is default. If the Hive warehouse uses a non default schema, set the property
      dih.hadoop.service.warehouse.dir
      .
    dih.hadoop.service.warehouse.dir
    Path to the Hive warehouse directory. Required if the Apache Hive server uses a non default schema. If the Apache Hive server uses a default schema, do not enter a value for this property.
    For example:
    dih.hadoop.hive.username=hive dih.hadoop.hive.password=password dih.hadoop.hive.url=jdbc:hive2://hive_host:10000/myschema dih.hadoop.service.warehouse.dir=/user/hive/mydatawarehousedir
    SPARK settings
    Property
    Description
    dih.hadoop.service.spark.version
    Apache Spark version. You can configure the version to be 1.3 or 1.2. Default is 1.3.
    dih.hadoop.service.spark.url
    Apache Spark URL.
    • If Apache Spark is running in YARN mode, use the default value:
      dih.hadoop.service.spark.url=yarn
    • If Apache Spark is running in standalone mode, enter the URL in the following format:
      spark://master_host:<port_number>
      Where:
      • master_host
        is the Master daemon which coordinates the operations of the Workers, which run the executors.
      • <port_number>
        is the port number of the Master daemon. Default is 7077.
      For example:
      spark://Mymasterhost:7077
      The value you enter here must be identical to the value that is shown in the Spark console. By default, the Spark console is located at
      http://<host_name>:18080
      .
    dih.hadoop.service.spark.additional.args
    Additional arguments for running jobs.
    For example:
    --executor-memory 20G --total-executor-cores 100
    For a complete list of arguments see the Spark documentation.
    KERBEROS settings
    If the Hadoop cluster uses Kerberos authentication, configure the following settings:
    Property
    Description
    dih.hadoop.principal
    Kerberos principal name in the following format:
    <principal>/<domain>@<realm>
    dih.hadoop.keytab.path
    Location and name of the keytab file.
    For example:
    dih.hadoop.principal=infa/admin@informatica.com dih.hadoop.keytab.path=/etc/security/keytabs/infa.keytab
    The file name does not have to be
    infa.keytab
    .
  3. On the machine where the
    B2B Data Exchange
    Hadoop Service is installed, use a text editor to open the
    dih-hadoop-service.xml
    file from the following location:
    <
    DX
    InstallationDir>/DataIntegrationHub/tomcat/conf/Catalina/localhost
  4. Verify that the file contains the correct references to your Hadoop classpath configuration. By default, the file references a Cloudera VM configuration in the section
    Cloudera VM - sample default configurations
    .
    The file also contains references to Cloudera Manager and to Hortonworks VM, in the following commented out sections:
    Cloudera CDH - sample cloudera manager configurations
    and
    Hortonworks VM - sample default configurations
    . If required, comment out the
    Cloudera VM - sample default configurations
    section and uncomment the section that is appropriate to your configuration.
  5. In Cloudera Manager, enable the option
    Bind NameNode to Wildcard Address
    and then restart the HDFS service.

0 COMMENTS

We’d like to hear from you!