Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Rules and Guidelines for Hive Sources on the Blaze Engine

Rules and Guidelines for Hive Sources on the Blaze Engine

You can include Hive sources in an Informatica mapping that runs on the Blaze engine.
Consider the following rules and guidelines when you configure a Hive source in a mapping that runs on the Blaze engine:
  • Hive sources for a Blaze mapping include the TEXT, Sequence, Avro, RCfile, ORC, and Parquet storage formats.
  • A mapping that runs on the Blaze engine can have bucketed Hive sources and Hive ACID tables.
  • Hive ACID tables must be bucketed.
  • The Blaze engine supports Hive tables that are enabled for locking.
  • Hive sources can contain quoted identifiers in Hive table names, column names, and schema names.
  • The TEXT storage format in a Hive source for a Blaze mapping can support ASCII characters as column delimiters and the newline characters as a row separator. You cannot use hex values of ASCII characters. For example, use a semicolon (;) instead of 3B.
  • You can define an SQL override in the Hive source for a Blaze mapping.
  • The Blaze engine can read from an RCFile as a Hive source. To read from an RCFile table, you must create the table with the
    SerDe
    clause.
  • The Blaze engine can read from Hive tables that are compressed. To read from a compressed Hive table, you must set the
    TBLPROPERTIES
    clause.

RCFile as Hive Tables

The Blaze engine can read and write to RCFile as Hive tables. However, the Blaze engine supports only the ColumnarSerDe SerDe. In Hortonworks, the default SerDe for an RCFile is LazyBinaryColumnarSerDe. To read and write to an RCFile table, you must create the table by specifying the SerDe as
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
.
For example:
CREATE TABLE TEST_RCFIle (id int, name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE;
You can also set the default RCFile SerDe from the Ambari or Cloudera manager. Set the property
hive.default.rcfile.serde
to
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
.

Compressed Hive Tables

The Blaze engine can read and write to Hive tables that are compressed. However, to read from a compressed Hive table or write to a Hive table in compressed format, you must set the
TBLPROPERTIES
clause as follows:
  • When you create the table, set the table properties:
    TBLPROPERTIES ('property_name'='property_value')
  • If the table already exists, alter the table to set the table properties:
    ALTER TABLE table_name SET TBLPROPERTIES ('property_name' = 'property_value');
The property name and value are not case sensitive. Depending on the file format, the table property can take different values.
The following table lists the property names and values for different file formats:
File Format
Table Property Name
Table Property Values
Avro
avro.compression
BZIP2, deflate, Snappy
ORC
orc.compress
Snappy, ZLIB
Parquet
parquet.compression
GZIP, Snappy
RCFile
rcfile.compression
Snappy, ZLIB
Sequence
sequencefile.compression
BZIP2, GZIP, LZ4, Snappy
Text
text.compression
BZIP2, GZIP, LZ4, Snappy
Unlike the Hive engine, the Blaze engine does not write data in the default ZLIB compressed format when it writes to a Hive target stored as ORC format. To write in a compressed format, alter the table to set the TBLPROPERTIES clause to use ZLIB or Snappy compression for the ORC file format.
The following text shows sample commands to create table and alter table:
  • Create table:
    create table CBO_3T_JOINS_CUSTOMER_HIVE_SEQ_GZIP (C_CUSTKEY DECIMAL(38,0), C_NAME STRING,C_ADDRESS STRING, C_PHONE STRING,C_ACCTBAL DECIMAL(10,2), C_MKTSEGMENT VARCHAR(10),C_COMMENT vARCHAR(117)) partitioned by (C_NATIONKEY DECIMAL(38,0)) TBLPROPERTIES ('sequencefile.compression'='gzip') stored as SEQUENCEFILE;
  • Alter table:
    ALTER TABLE table_name SET TBLPROPERTIES (avro.compression'='BZIP2');

0 COMMENTS

We’d like to hear from you!