User Guide

10.2.2
- 10.5.9
- 10.5.8
- 10.5.7
- 10.5.6
- 10.5.5
- 10.5.4
- 10.5.3
- 10.5.2
- 10.5.10
- 10.5.1
- 10.5
- 10.4.1
- 10.4.0
- 10.2.2 HotFix 1
- 10.2.2 Service Pack 1
- 10.2.1

Back Next

Hive Targets on Hadoop

A mapping that is running in the Hadoop environment can write to a Hive target.

A Hive target can be an internal table or an external table. Internal Hive tables are managed by Hive and are also known as managed tables. External Hive tables are managed by an external source such as HDFS, Amazon S3, or Microsoft Azure Blob Storage.

Consider the following restrictions when you configure a Hive target in a mapping that runs in the Hadoop environment:

A mapping fails to run when you use Unicode characters in a Hive target definition.

The Hive target cannot reside on a remote cluster. A remote cluster is a cluster that is remote from the machine that the Hadoop connection references in the mapping.

When you set up a dynamic target for a partitioned Hive table, the value used for the partition is the final column in the table. If the table has a dynamic partition column, the final column of the table is the dynamic partition column. To use a different column for the partition, move it to the last column of the table. If the table has multiple partition columns, the dynamic partition values are selected from the last columns of the upstream transformation. You can use an Expression transformation to reorder the columns if necessary.

When a mapping creates or replaces a Hive table, the type of table that the mapping creates depends on the run-time engine that you use to run the mapping.

The following table shows the table type for each run-time engine:

Run-Time Engine	Resulting Table Type
Blaze	MANAGED_TABLE
Spark	EXTERNAL_TABLE

You can design a mapping to truncate an internal or external Hive table that is bucketed and partitioned.

In a mapping that runs on the Spark engine or the Blaze engine, you can create a custom DDL query that creates or replaces a Hive table at run time. However, with the Blaze engine, you cannot use a backtick (`) character in the DDL query. The backtick character is required in HiveQL when you include special characters or keywords in a query.

The Spark engine can write to bucketed Hive targets. Bucketing and partitioning of Hive tables can improve performance by reducing data shuffling and sorting.

Targets

PreSQL and PostSQL Commands

Truncating Hive Targets

Updating Hive Targets with an Update Strategy Transformation

Rules and Guidelines for Hive Targets on the Blaze Engine

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal