Installation and Configuration Guide

10.5
- 10.4.1
- 10.4.0

Back Next

Configuring the Environment for a Hadoop Publication Repository

If you installed the

Data Integration Hub

Hadoop Service component, configure the environment for a Hadoop publication repository.

To complete the task you perform the following actions:

Configure settings in the

dx-configuration.properties

file.

Verify references in the

dih-hadoop-service.xml

file.

Enable an option in Cloudera Manager.

On the machine where the

Data Integration Hub

Hadoop Service is installed, use a text editor to open the

dx-configuration.properties

file from the following location:

< DIHInstallationDir>/DataIntegrationHub/tomcat/shared/classes/

Set the properties in the following sections of the file and then save the file:

HIVE settings: Property
Description

dih.hadoop.hive.username
User name for connecting to the Apache Hive server.

dih.hadoop.hive.password
Password of the user that connects to the Apache Hive server.

dih.hadoop.hive.url
URL to the Apache Hive server in the following format:
jdbc:hive2://<hostname>:<port>/<schema>
Where:

hostname
is the host name or IP number of the server.

port
is the port number of the server. Default is 10000.

schema
is the schema used with the Hive warehouse. Default is default. If the Hive warehouse uses a non default schema, set the property
dih.hadoop.service.warehouse.dir
.

dih.hadoop.service.warehouse.dir
Path to the Hive warehouse directory. Required if the Apache Hive server uses a non default schema. If the Apache Hive server uses a default schema, do not enter a value for this property.
For example:

dih.hadoop.hive.username=hive dih.hadoop.hive.password=password dih.hadoop.hive.url=jdbc:hive2://hive_host:10000/myschema dih.hadoop.service.warehouse.dir=/user/hive/mydatawarehousedir
SPARK settings: Property
Description

dih.hadoop.service.spark.version
Apache Spark version. Takes the following values:
1.2. Apache Spark version 1.2.
1.3. Apache Spark version 1.3 and higher.

Default is 1.3.

dih.hadoop.service.spark.url
Apache Spark URL.
If Apache Spark is running in YARN mode, use the default value:
dih.hadoop.service.spark.url=yarn

If Apache Spark is running in standalone mode, enter the URL in the following format:
spark://master_host:<port_number>

Where:

master_host
is the Master daemon which coordinates the operations of the Workers, which run the executors.

<port_number>
is the port number of the Master daemon. Default is 7077.

For example:
spark://Mymasterhost:7077

The value you enter here must be identical to the value that is shown in the Spark console. By default, the Spark console is located at
http://<host_name>:18080
.

dih.hadoop.service.spark.additional.args
Additional arguments for running jobs.
For example:
--executor-memory 20G --total-executor-cores 100
For a complete list of arguments see the Spark documentation.
KERBEROS settings: If the Hadoop cluster uses Kerberos authentication, configure the following settings:

Property
Description

dih.hadoop.principal
Kerberos principal name in the following format:
<principal>/<domain>@<realm>

dih.hadoop.keytab.path
Location and name of the keytab file.
For example:

dih.hadoop.principal=infa/admin@informatica.com dih.hadoop.keytab.path=/etc/security/keytabs/infa.keytab

The file name does not have to be
infa.keytab
.

Property	Description
dih.hadoop.hive.username	User name for connecting to the Apache Hive server.
dih.hadoop.hive.password	Password of the user that connects to the Apache Hive server.
dih.hadoop.hive.url	URL to the Apache Hive server in the following format: jdbc:hive2://<hostname>:<port>/<schema> Where: hostname is the host name or IP number of the server. port is the port number of the server. Default is 10000. schema is the schema used with the Hive warehouse. Default is default. If the Hive warehouse uses a non default schema, set the property dih.hadoop.service.warehouse.dir .
dih.hadoop.service.warehouse.dir	Path to the Hive warehouse directory. Required if the Apache Hive server uses a non default schema. If the Apache Hive server uses a default schema, do not enter a value for this property. For example: dih.hadoop.hive.username=hive dih.hadoop.hive.password=password dih.hadoop.hive.url=jdbc:hive2://hive_host:10000/myschema dih.hadoop.service.warehouse.dir=/user/hive/mydatawarehousedir

Property	Description
dih.hadoop.service.spark.version	Apache Spark version. Takes the following values: 1.2. Apache Spark version 1.2. 1.3. Apache Spark version 1.3 and higher. Default is 1.3.
dih.hadoop.service.spark.url	Apache Spark URL. If Apache Spark is running in YARN mode, use the default value: dih.hadoop.service.spark.url=yarn If Apache Spark is running in standalone mode, enter the URL in the following format: spark://master_host:<port_number> Where: master_host is the Master daemon which coordinates the operations of the Workers, which run the executors. <port_number> is the port number of the Master daemon. Default is 7077. For example: spark://Mymasterhost:7077 The value you enter here must be identical to the value that is shown in the Spark console. By default, the Spark console is located at http://<host_name>:18080 .
dih.hadoop.service.spark.additional.args	Additional arguments for running jobs. For example: --executor-memory 20G --total-executor-cores 100 For a complete list of arguments see the Spark documentation.

Property	Description
dih.hadoop.principal	Kerberos principal name in the following format: <principal>/<domain>@<realm>
dih.hadoop.keytab.path	Location and name of the keytab file. For example: dih.hadoop.principal=infa/admin@informatica.com dih.hadoop.keytab.path=/etc/security/keytabs/infa.keytab The file name does not have to be infa.keytab .

On the machine where the

Data Integration Hub

Hadoop Service is installed, use a text editor to open the

dih-hadoop-service.xml

file from the following location:

< DIHInstallationDir>/DataIntegrationHub/tomcat/conf/Catalina/localhost

Verify that the file contains the correct references to your Hadoop classpath configuration. By default, the file references a Cloudera VM configuration in the section

Cloudera VM - sample default configurations

The file also contains references to Cloudera Manager and to Hortonworks VM, in the following commented out sections:

Cloudera CDH - sample cloudera manager configurations

and

Hortonworks VM - sample default configurations

. If required, comment out the

Cloudera VM - sample default configurations

section and uncomment the section that is appropriate to your configuration.

In Cloudera Manager, enable the option

Bind NameNode to Wildcard Address

and then restart the HDFS service.

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal

0 COMMENTS

Back Next

We’d like to hear from you! Log in to comment.

Rename Saved Search

Table of Contents

Installation and Configuration Guide

Installation and Configuration Guide

Configuring the Environment for a Hadoop Publication Repository

Configuring the Environment for a Hadoop Publication Repository