Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in a Hadoop Environment
  5. Mappings in the Native Environment
  6. Profiles
  7. Native Environment Optimization
  8. POWERCENTERHELP
  9. Data Type Reference

Optimization for the Hadoop Environment

Optimization for the Hadoop Environment

You can optimize the Hadoop environment and the Hadoop cluster to increase performance.
You can optimize the Hadoop environment and the Hadoop cluster in the following ways:
Configure a highly available Hadoop cluster
You can configure the Data Integration Service and the Developer tool to read from and write to a highly available Hadoop cluster. The steps to configure a highly available Hadoop cluster depend on the type of Hadoop distribution. For more information about configuration steps for a Hadoop distribution, see the
Informatica Big Data Management Installation and Configuration Guide
.
Truncate partitions in a Hive target
You can truncate partitions in a Hive target to increase performance. To truncate partitions in a Hive target, you must choose to both truncate the partition in the Hive target and truncate the target table. You can enable data compression on temporary staging tables to optimize performance.
Compress data on temporary staging tables
You can enable data compression on temporary staging tables to increase mapping performance.
Parallel sort
When you use a Sorter transformation in a mapping, the Data Integration Service enables parallel sorting by default when it pushes the mapping logic to the Hadoop cluster. Parallel sorting improves mapping performance with some restrictions.


Updated July 03, 2018