Strategies for Incremental Updates on Hive in Big Data Management 10.2

Strategies for Incremental Updates on Hive in Big Data Management 10.2

Overview

Overview

Many organizations want to create data lakes and enterprise data warehouses on Hadoop clusters to perform near real-time analytics based on business requirements. Building data lakes on a Hadoop cluster requires a one-time initial load from legacy warehouse systems and frequent incremental loads. In most cases, Hive is the preferred analytic store.
Although Hive versions 0.13 and later support transactions, they pose challenges with incremental loads, such as limited ACID compliance and requirements for ORC file formats and bucketed tables.
This article describes various strategies for updating Hive tables to support incremental loads and ensuring that targets are in sync with source systems.
Informatica Big Data Management supports the following methods to perform incremental updates:
  • Update Strategy transformation
  • Update Strategy transformation using MERGE statement
  • Updates using the partition merge solution
  • Updates using key-value stores

0 COMMENTS

We’d like to hear from you!