Configuring YARN in Informatica Big Data Management®

Back Next

Informatica Administrator Tasks

After the Hadoop administrator configures node labels on the Hadoop cluster, the Informatica administrator must enable node labeling in the domain environment.

The Informatica administrator must complete the following tasks:

Verify the scheduler in the domain environment.

Ensure that the domain environment uses a capacity scheduler to allocate resources to jobs that run on the cluster.

Enable node labeling in the domain environment.

To enable node labeling, configure the following property in the yarn-site.xml file:

yarn.node-labels.enabled: <property> <name>yarn.node-labels.enabled</name> <value>TRUE</value> </property>

Create a location to store node labels.

To configure the HDFS directory, configure the following property in the yarn-site.xml file:

yarn.node-labels.fs-store.root-dir: <property> <name>yarn.node-labels.fs-store.root-dir</name> <value>hdfs://[Node name]:[Port]/[Path to store]/[Node labels]/</value> </property>

The ResourceManager must be able to access the directory.

To store node labels on a local file system of the ResourceManager instead of HDFS, you can configure a path that is similar to the following path:
file:///home/yarn/node-label

Optionally, the Informatica administrator can complete the following task:

Start the Blaze engine using a node label.: Use node labels to start the Blaze engine on nodes that have the node label. To start the Blaze engine on nodes with specific labels, configure the following Blaze configuration property in the Hadoop connection:

Property
Description

Blaze YARN Node Label

Node label that determines the node on the Hadoop cluster where the Blaze engine runs. If you do not specify a node label, the Blaze engine runs on the nodes in the default partition.

If the Hadoop cluster supports logical operators for node labels, you can specify a list of node labels. To list the node labels, use the operators
&&
(AND),
||
(OR), and
!
(NOT).

When the Blaze engine uses node labels, Blaze components might be redundant on the labeled nodes. If a node contains multiple labels and you specify the labels in different Hadoop connections, multiple Grid Manager, Orchestrator, or Job Monitor instances might run on the same node.

Rename Saved Search

Table of Contents

Configuring YARN in Informatica Big Data Management®

Configuring YARN in Informatica Big Data Management®

Informatica Administrator Tasks

Informatica Administrator Tasks