Table of Contents


  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in the Hadoop Environment
  5. Mapping Objects in the Hadoop Environment
  6. Processing Hierarchical Data on the Spark Engine
  7. Stateful Computing on the Spark Engine
  8. Monitoring Mappings in the Hadoop Environment
  9. Mappings in the Native Environment
  10. Profiles
  11. Native Environment Optimization
  12. Data Type Reference
  13. Complex File Data Object Properties
  14. Function Reference
  15. Parameter Reference

Scheduling, Queuing, and Node Labeling

Scheduling, Queuing, and Node Labeling

You can use scheduling, YARN queues, and node labeling to optimize performance when you run a mapping in the Hadoop environment.
A scheduler assigns resources on the cluster to applications that need them, while honoring organizational policies on sharing resources. You can configure YARN to use the capacity scheduler or the fair scheduler. The capacity scheduler allows multiple organizations to share a large cluster and distributes resources based on capacity allocations. The fair scheduler shares resources evenly among all jobs running on the cluster.
Queues are the organizing structure for YARN schedulers, allowing multiple tenants to share the cluster. The capacity of each queue specifies the percentage of cluster resources that are available for applications submitted to the queue. You can direct the Blaze and Spark engines to a YARN scheduler queue.
You can use node labels to run YARN applications on cluster nodes. Node labels partition a cluster into sub-clusters so that jobs can run on nodes with specific characteristics. You can then associate node labels with capacity scheduler queues.
You must install and configure Big Data Management for every node on the cluster, even if the cluster is not part of the queue you are using.

Updated December 13, 2018