Strategies for Incremental Updates on Hive in Big Data Management 10.2

Strategies for Incremental Updates on Hive in Big Data Management 10.2

Create a Partition Merge

Create a Partition Merge

To implement the partition merge solution, you create three mappings. The first mapping has two branches. One branch writes inserts to the temporary base insert table. The other branch processes the update and delete operations separately, and merges with a Union transformation before writing to the temporary base update table. The second two mappings are pass-through mappings that insert the temporary tables into the historic base table.
The following image is a graphical representation of the partition merge solution: The image shows a mapping with two branches. The first branch writes inserts to the temporary Base insert table. The second branch processes the update and delete operations before loading them to the temporary Base update table.
Perform the following steps to implement the partition merge solution:
  1. Define a Router transformation to separate insert operations from update and delete operations. Create one pipeline to process the inserts, and one pipeline to process the updates and deletes.
  2. Define an Aggregator transformation to group the rows marked for update and delete based on the partitioned column and detect the impacted partitions.
  3. Define a Joiner transformation to join the Aggregator output with the historic base table on the partitioned column and fetch rows only for those impacted partitions.
  4. Define a Filter transformation to filter out rows marked for update. Join them with the impacted partitions to get all updatable records.
  5. Use a detail outer join and filter NULL keys to perform a minus operation on the impacted partitions. The minus operation eliminates rows marked for delete and rows from the historic base table that were updated.
  6. Define a Union transformation to merge the resulting rows from Step 4 and Step 5 and insert them into the temporary base update table. Enable the
    Truncate Hive Target Partition
    option on the temporary update base table.
  7. Add the insert rows into the temporary update base table.
    Use a workflow to ensure that Step 6 runs before Step 7. Hadoop mappings do not support target load order constraints.
  8. Create a pass-through mapping to load the temporary insert base table into the historic base table.
  9. Create a pass-through mapping to load the temporary update base table into the historic base table.

0 COMMENTS

We’d like to hear from you!