Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in the Hadoop Environment
  5. Mapping Objects in the Hadoop Environment
  6. Monitoring Mappings in the Hadoop Environment
  7. Mappings in the Native Environment
  8. Profiles
  9. Native Environment Optimization
  10. Data Type Reference
  11. Function Reference
  12. Parameter Reference
  13. Multiple Blaze Instances on a Cluster

Scheduling, Queuing, and Node Labeling

Scheduling, Queuing, and Node Labeling

You can use scheduling, YARN queues, and node labeling to optimize performance when you run a mapping in the Hadoop environment.
A scheduler assigns resources on the cluster to applications that need them, while honoring organizational policies on sharing resources. You can configure YARN to use the capacity scheduler or the fair scheduler. The capacity scheduler allows multiple organizations to share a large cluster and distributes resources based on capacity allocations. The fair scheduler shares resources evenly among all jobs running on the cluster.
Queues are the organizing structure for YARN schedulers, allowing multiple tenants to share the cluster. The capacity of each queue specifies the percentage of cluster resources that are available for applications submitted to the queue. You can direct the Blaze and Spark engines to a YARN scheduler queue.
You can use node labels to run YARN applications on cluster nodes. Node labels partition a cluster into sub-clusters so that jobs can run on nodes with specific characteristics. You can then associate node labels with capacity scheduler queues.
You must install and configure Big Data Management for every node on the cluster, even if the cluster is not part of the queue you are using.


Updated July 03, 2018