Tuning the Hive Engine for Big Data Management®

Tuning the Hive Engine for Big Data Management®

Case Study: Join Reordering

Case Study: Join Reordering

This example demonstrates the performance benefit of join re-ordering.

Joining Five Tables Before Tuning for Performance

The mapping uses the following files:
Number
File Name
Size
1
PART
23.1 GB
2
CUSTOMER
23.08 GB
3
LINEITEM
757.86 GB
4
ORDERS
168.5 GB
5
PARTSUPP
115.2 GB
The following image shows the mapping that is not optimized:
The largest source file that the mapping uses is LINEITEM of size 757.86 GB, which is many times larger than the other files that the mapping uses. In the mapping, LINEITEM is successively joined three times with other files.

Joining Five Tables After Tuning for Performance

The mapping is tuned by reordering the Joiner transformations in such a way that the smaller tables are joined before they get joined with LINEITEM.
The following image shows the mapping that is optimized for join reordering:
The mapping joins the smaller tables ORDERS (168.5 GB) and CUSTOMER (23.08 GB) as well as PART (23.1 GB) and PARTSUPP (115.2 GB) before it joins the LINEITEM table.

Result

The mapping optimized for join reordering completed 40% faster than the mapping that was not optimized.

Back to Top

0 COMMENTS

We’d like to hear from you!