Hive Connector

Back Next

Rules and guidelines for Hive objects in mappings

Consider the following rules and guidelines for Hive objects in mappings and

mapping

tasks:

General guidelines

When you select a connection in the mapping and leave it unused for a long duration, and then select the object in the mapping, an error occurs. You must reselect the connection in the mapping and then select the object.

Create target at runtime

When you create a target at runtime, consider the following guidelines:

When you re-run a mapping task, the task fails when the table is dropped at the backend. You must edit the mapping or mapping task and then run the mapping task.

If you create a Hive target and the data is of the RC file format, you cannot insert null values to the binary data type column.

You cannot write float and double data types that contain exponential data to a Hive target on the Cloudera EMR cluster.

When you use the Create New at Runtime option to create a Hive target, you can parameterize only the target connection and the object name using a parameter file. You cannot parameterize the other properties such as the path, location, file format, and additional properties.

When you configure a mapping to create a new Hive target at runtime to write data from a Kafka source, where the selected objects in the mapping are parameterized, the mapping fails with the following error:

com.informatica.powercenter.sdk.SDKException: Invalid operation field name

If you enable the Alter and Apply Changes schema handling option for a mapping that creates a new target at runtime and the source column names contain upper or mixed case characters, the alter command does not trigger and the mapping fails.

Data types

When you run a mapping to write data to a Hive table and the columns consist of complex data types, the mapping fails.

You cannot preview a column in a table that uses the binary data type.

If the data is of the Avro file format and contains the timestamp data types, Data Integration writes the data only up to milliseconds.

You cannot read or write hierarchical data with decimal data types. The mapping fails with the following error:

java.lang.RuntimeException

Parameterization

You can enter a value up to milliseconds in a filter that is not parameterized. For example:

hive_timestamp_only.c.timestamp > 2085-11-06 06:24:12.018

You cannot enter a value in nanoseconds in a filter that is not parameterized. For example:

hive_timestamp_only.c.timestamp > 2085-11-06 06:24:12.01888

You can use an advanced filter to enter a value in nanoseconds.

To enter a date without specifying the time in a filter that is not parameterized, use the format specified in the following example:

hive_date_only.c.date > 2085-11-06 00:00:00.00

If the target object is parameterized and the field mapping is not set to automap, the option to create a target option does not display in the task properties. You must set the field mapping to automap when you use a parameterized object.

When you read from multiple databases, if the connection does not have the database name, you cannot use a parameter file from the mapping task.

Avoid using an existing connection to read from multiple databases in the new mapping. If you edit an existing connection to remove the database name so that you can read from multiple databases, do not use that connection for the existing mapping. If you use the same connection in the existing mapping, it fails.

If you parameterize the source or target connection and object and then specify an override in the advance properties with in-out parameters, and also use a parameter file in the mapping task, the data is read from the imported table at design time and not from the overridden table.

Special characters

You cannot import data from Hive that contains Unicode characters in the table or column names. The mapping fails to run.

When you read from or write data to a Hive table, ensure that the column name in the Hive table does not exceed 74 characters.

When you import data from Hive tables, special characters in the table or column names appear as '_' in the source object and in the data preview in the Source transformation.

When you import a Hive source or target table where the table or column name contains special characters such as a slash '/', the staging files do not get deleted from the staging location.

SQL override

When you specify an SQL override to read data from a Hive source, ensure that the data type specified in the query is compatible with the data type in the selected source.

Specify the query without a semicolon ';' at the end of the SQL statement. If you insert a semi-colon at the end of the SQL statement, the task fails.

The SQL override in the source can contain only SELECT queries with Join or Union clauses. If the SQL override contains any other clause such as WHERE and WITH, the mapping fails.