Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Data Preview
  8. Cluster Workflows
  9. Profiles
  10. Monitoring
  11. Hierarchical Data Processing
  12. Hierarchical Data Processing Configuration
  13. Hierarchical Data Processing with Schema Changes
  14. Intelligent Structure Models
  15. Stateful Computing
  16. Appendix A: Connections
  17. Appendix B: Data Type Reference
  18. Appendix C: Function Reference

Hive Targets on Hadoop

Hive Targets on Hadoop

A mapping that is running in the Hadoop environment can write to a Hive target.
A Hive target can be an internal table or an external table. Internal Hive tables are managed by Hive and are also known as managed tables. External Hive tables are managed by an external source such as HDFS, Amazon S3, or Microsoft Azure Blob Storage.
Consider the following restrictions when you configure a Hive target in a mapping that runs in the Hadoop environment:
  • A mapping fails to run when you use Unicode characters in a Hive target definition.
  • The Hive target cannot reside on a remote cluster. A remote cluster is a cluster that is remote from the machine that the Hadoop connection references in the mapping.
  • When you set up a dynamic target for a partitioned Hive table, the value used for the partition is the final column in the table. If the table has a dynamic partition column, the final column of the table is the dynamic partition column. To use a different column for the partition, move it to the last column of the table. If the table has multiple partition columns, the dynamic partition values are selected from the last columns of the upstream transformation. You can use an Expression transformation to reorder the columns if necessary.
When a mapping creates or replaces a Hive table, the type of table that the mapping creates depends on the run-time engine that you use to run the mapping.
The following table shows the table type for each run-time engine:
Run-Time Engine
Resulting Table Type
Blaze
MANAGED_TABLE
Spark
EXTERNAL_TABLE
You can design a mapping to truncate an internal or external Hive table that is bucketed and partitioned.
In a mapping that runs on the Spark engine or the Blaze engine, you can create a custom DDL query that creates or replaces a Hive table at run time. However, with the Blaze engine, you cannot use a backtick (`) character in the DDL query. The backtick character is required in HiveQL when you include special characters or keywords in a query.
The Spark engine can write to bucketed Hive targets. Bucketing and partitioning of Hive tables can improve performance by reducing data shuffling and sorting.

0 COMMENTS

We’d like to hear from you!