Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Data Preview
  8. Cluster Workflows
  9. Profiles
  10. Monitoring
  11. Hierarchical Data Processing
  12. Hierarchical Data Processing Configuration
  13. Hierarchical Data Processing with Schema Changes
  14. Intelligent Structure Models
  15. Stateful Computing
  16. Connections
  17. Data Type Reference
  18. Function Reference

Hive Targets on Hadoop

Hive Targets on Hadoop

A mapping run in the Hadoop environment can write to a Hive target. When you write to a Hive target, consider the processing differences for functionality such as table types, DDL queries, and bucketing.

Hive Table Types

A Hive target can be an internal table or an external table. Internal Hive tables are managed by Hive and are also known as managed tables. External Hive tables are tables managed by an external source such as HDFS, Amazon S3, or Microsoft Azure Blob Storage.
When a mapping creates or replaces a Hive table, the type of table that the mapping creates depends on the run-time engine that you use to run the mapping:
  • On the Blaze engine, mappings create managed tables.
  • On the Spark engine, mappings create external tables.

DDL Queries

For mappings that run on the Spark engine or the Blaze engine, you can create a custom DDL query that creates or replaces a Hive table at run time. However, with the Blaze engine, you cannot use a backtick (`) character in the DDL query. The backtick character is required in HiveQL when you include special characters or keywords in a query.

Bucketing

The Spark engine can write to bucketed Hive targets. Bucketing and partitioning of Hive tables can improve performance by reducing data shuffling and sorting.

Hortonworks HDP 3.1

Hortonworks HDP 3.1 uses ACID-enabled ORC targets by default and uses ACID for all managed tables. If you do not want to use ACID-enabled tables, use external tables.


Updated January 20, 2020