Strategies for Incremental Updates on Hive in Big Data Management 10.2

Back Next

Overview

Many organizations want to create data lakes and enterprise data warehouses on Hadoop clusters to perform near real-time analytics based on business requirements. Building data lakes on a Hadoop cluster requires a one-time initial load from legacy warehouse systems and frequent incremental loads. In most cases, Hive is the preferred analytic store.

Although Hive versions 0.13 and later support transactions, they pose challenges with incremental loads, such as limited ACID compliance and requirements for ORC file formats and bucketed tables.

This article describes various strategies for updating Hive tables to support incremental loads and ensuring that targets are in sync with source systems.

Informatica Big Data Management supports the following methods to perform incremental updates:

Update Strategy transformation

Update Strategy transformation using MERGE statement

Updates using the partition merge solution

Updates using key-value stores

Strategies for Incremental Updates on Hive in Big Data Management 10.2

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal