Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Connections
  4. Mappings in the Hadoop Environment
  5. Mapping Objects in the Hadoop Environment
  6. Monitoring Mappings in the Hadoop Environment
  7. Mappings in the Native Environment
  8. Profiles
  9. Native Environment Optimization
  10. Data Type Reference
  11. Function Reference
  12. Parameter Reference

Hive Sources

Hive Sources

You can include Hive sources in an Informatica mapping that runs in the Hadoop environment.
Consider the following limitations when you configure a Hive source in a mapping that runs in the Hadoop environment:
  • The Data Integration Service can run pre-mapping SQL commands against the source database before it reads from a Hive source. When you create a SQL override on a Hive source, you must enclose keywords or special characters in backtick (`) characters. When you run a mapping with a Hive source in the Hadoop environment, references to a local path in pre-mapping SQL commands are relative to the Data Integration Service node. When you run a mapping with a Hive source in the native environment, references to local path in pre-mapping SQL commands are relative to the Hive server node.
  • A mapping fails to validate when you configure post-mapping SQL commands. The Data Integration Service does not run post-mapping SQL commands against a Hive source.
  • A mapping fails to run when you have Unicode characters in a Hive source definition.
  • The third-party Hive JDBC driver does not return the correct precision and scale values for the Decimal data type. As a result, when you import Hive tables with a Decimal data type into the Developer tool, the Decimal data type precision is set to 38 and the scale is set to 0. Consider the following configuration rules and guidelines based on the version of Hive:
    • Hive 0.11. Accept the default precision and scale for the Decimal data type in the Developer tool.
    • Hive 0.12. Accept the default precision and scale for the Decimal data type in the Developer tool.
    • Hive 0.12 with Cloudera CDH 5.0. You can configure the precision and scale fields for source columns with the Decimal data type in the Developer tool.
    • Hive 0.13 and above. You can configure the precision and scale fields for source columns with the Decimal data type in the Developer tool.
    • Hive 0.14 or above. The precision and scale used for the Decimal data type in the Hive database also appears in the Developer tool.
A mapping that runs on the Spark engine can have partitioned Hive source tables and bucketed sources.
You can use a RCFile as a source in a mapping that runs on the Blaze engine. However, the Blaze engine supports only the ColumarSerDe SerDe. In Hortonworks, the default SerDe for an RCFile is LazyBinaryColumnarSerDe. To read and write to an RCFile table, you must create the table by specifying the SerDe as
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
.
For example:
CREATE TABLE TEST_RCFIle (id int, name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE;
You can also set the default RCFile SerDe from the Ambari or Cloudera manager. Set the property
hive.default.rcfile.serde
to
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
.


Updated November 09, 2018