To implement the partition merge solution, you create three mappings. The first mapping has two branches. One branch writes inserts to the temporary base insert table. The other branch processes the update and delete operations separately, and merges with a Union transformation before writing to the temporary base update table. The second two mappings are pass-through mappings that insert the temporary tables into the historic base table.
The following image is a graphical representation of the partition merge solution:
Perform the following steps to implement the partition merge solution:
Define a Router transformation to separate insert operations from update and delete operations. Create one pipeline to process the inserts, and one pipeline to process the updates and deletes.
Define an Aggregator transformation to group the rows marked for update and delete based on the partitioned column and detect the impacted partitions.
Define a Joiner transformation to join the Aggregator output with the historic base table on the partitioned column and fetch rows only for those impacted partitions.
Define a Filter transformation to filter out rows marked for update. Join them with the impacted partitions to get all updatable records.
Use a detail outer join and filter NULL keys to perform a minus operation on the impacted partitions. The minus operation eliminates rows marked for delete and rows from the historic base table that were updated.
Define a Union transformation to merge the resulting rows from Step 4 and Step 5 and insert them into the temporary base update table. Enable the
Truncate Hive Target Partition
option on the temporary update base table.
Add the insert rows into the temporary update base table.
Use a workflow to ensure that Step 6 runs before Step 7. Hadoop mappings do not support target load order constraints.
Create a pass-through mapping to load the temporary insert base table into the historic base table.
Create a pass-through mapping to load the temporary update base table into the historic base table.