Tuning the Hive Engine for Big Data Management®

Tuning the Hive Engine for Big Data Management®

HBase Read or Write

HBase Read or Write

When the Data Integration Service needs to read from an HBase table, it creates one map task per region of the table. For better performance, Informatica recommends suitably splitting the table.
Writes to HBase tables are also heavily impacted by table splits. The table can be pre-split with pre-created regions for better performance. Splitting should be done in such a way that one specific region does not get much higher requests than the others. The data should be distributed across the regions and the split strategy should not just be based on the key range but also on the keys that are being written.
If you use PowerExchange® for HBase to write, you can disable auto flush for the writer for improved performance.

0 COMMENTS

We’d like to hear from you!