Tuning the Hive Engine for Big Data Management®

Tuning the Hive Engine for Big Data Management®

HParser and Integrated Data Transformation

HParser and Integrated Data Transformation

When comparing HParser with the integrated Data Transformation that comes with Big Data Management, HParser was found to perform better. Integrated Data Transformation was about ~2.57X slower in some internal tests. Informatica recommends trying out and comparing the performance of HParser and integrated Data Transformation before arriving at a conclusion.
If you use Data Transformation, Informatica recommends using a splittable input format if available and design the Data Processor transformation's streaming service to process a batch of records using the count property of the streamer.
For more information, see the following topics in the
Informatica PowerExchange for HDFS User Guide
:
  • HDFS Data Extraction
  • Complex Files Partitioning

0 COMMENTS

We’d like to hear from you!