Tuning the Hive Engine for Big Data Management®

Back Next

Case Study: Join Reordering

This example demonstrates the performance benefit of join re-ordering.

Joining Five Tables Before Tuning for Performance

The mapping uses the following files:

Number	File Name	Size
1	PART	23.1 GB
2	CUSTOMER	23.08 GB
3	LINEITEM	757.86 GB
4	ORDERS	168.5 GB
5	PARTSUPP	115.2 GB

The following image shows the mapping that is not optimized:

The largest source file that the mapping uses is LINEITEM of size 757.86 GB, which is many times larger than the other files that the mapping uses. In the mapping, LINEITEM is successively joined three times with other files.

Joining Five Tables After Tuning for Performance

The mapping is tuned by reordering the Joiner transformations in such a way that the smaller tables are joined before they get joined with LINEITEM.

The following image shows the mapping that is optimized for join reordering:

The mapping joins the smaller tables ORDERS (168.5 GB) and CUSTOMER (23.08 GB) as well as PART (23.1 GB) and PARTSUPP (115.2 GB) before it joins the LINEITEM table.