Rules and guidelines for Hive objects in mappings configured in advanced mode
Rules and guidelines for Hive objects in mappings configured in advanced mode
In advanced mode, consider the following rules and guidelines for Hive objects in mappings configured in advanced mode:
Data types
When you write data with the Timestamp data type from a Hive table stored in either the ORC or Parquet file format, the Timestamp data type is written only up to microseconds and the rest of the data is omitted.
When you use a simple source filter to import a Hive source object that contains the Date or Timestamp data types, you can use the following formats in the filter condition:
Date = YYYY-MM-DD HH24:MM:SS.US
Timestamp = YYYY-MM-DD HH24:MM:SS.US
Unicode data from a Hive table is written incorrectly to the target.
You cannot read or write Array or Struct complex data types that contain nested data types such as Map that the Hive Connector cannot read. Data Integration fails to invalidate the mapping both during the design time and run time and fails with the following error:
java.sql.SQLException
If the data is of the Avro file format and contains the timestamp data types, Data Integration writes the data only up to milliseconds.
When you override the source or target object in a mapping from a task with objects that contain complex data type columns, data preview fails. The error message that appears does not contain helpful information.
If you use a parameterized source or target object in the mapping to run a dynamic mapping task, and the objects contain complex data type columns, the dynamic mapping task fails.
When you specify a simple filter to import data that contains the Decimal data type with a precision above 28, the mapping fails with the following error: java.lang.RuntimeException
Multiple objects
When the source type selected in a mapping is multiple objects,
You cannot override the table name.
When you specify an override to the schema, ensure that all the objects selected in the Source transformation belongs to the same schema.
When you read from multiple Hive databases, you cannot join the tables from different databases using the advanced relationship.
If the connection does not have the database name, you cannot specify the database name through the parameter file from the mapping task.
If you parameterize the source or target connection and object and then specify an override in the advance properties with in-out parameters, and also use a parameter file in the mapping task, the data is read from the imported table at design time and not from the overridden table.
When you parameterize the connection and source object and select "Allow parameter to be overridden at runtime" in the Source transformation, you cannot select the multiple object source type in the mapping task.
Avoid using an existing connection to read from multiple objects in the new mapping. If you edit an existing connection to remove the database name so that you can read from multiple databases, do not use that connection for the existing mapping. If you use the same connection in the existing mapping, it fails.
When you configure a Joiner transformation in a mapping to join two Hive tables, ensure that the field names for the complex data type column in both the tables are different.
Special characters, column types, and column names
When you read from or write data to a Hive table that contains the INFA_META_ROW_TYPE as one of its columns, data corruption is encountered.
When you read from or write data to a Hive table, ensure that the column name in the Hive table does not exceed 74 characters.
When you import data from Hive tables, special characters in the table or column names appear as '_' in the source object and in the data preview in the Source transformation and the mapping fails to run.
When you import a Hive source or target table where the table or column name contains special characters such as a slash '/', the staging files do not get deleted from the staging location.
Mappings that read from or write data to more than 500 columns fail with the following error:
HTTP POST request failed due to IO error
SQL override
When you specify an SQL override to read data from a Hive source, ensure that the data type specified in the query is compatible with the data type in the selected source. Specify the query without a semicolon ';' at the end of the SQL statement.
The SQL override in the source must contain only SELECT queries. If the SQL override contains any other clause, the mapping fails.
Target operations
When you configure an upsert operation, ensure that you select all the columns in the field mapping. If there are unconnected columns, null is inserted to the target table.
You cannot upsert data to an external or non-transactional Hive table.
A mapping configured with a data driven operation might encounter errors in the following scenarios:
When the length of the update column name exceeds 128 characters, the name is truncated. The agent fails to recognize the column name as a connected field and the mapping fails with the following error:
com.informatica.sdk.dtm.InvalidMappingException
When the column name in a data driven operation exceeds 74 characters, the name is truncated.
When you configure an upsert operation to write data that contains non-unique columns to a Hive target, set the following property in the
Pre-SQL
field in the Target transformation:
set hive.merge.cardinality.check=false
Do not configure an override to the update strategy from the task properties. The agent does not honor the order of precedence and considers the update strategy value specified in the mapping.
When you configure the Upsert (Update or Insert) operation, you must always keep the
Update as Upsert
selected for the upsert operation to work. When selected, the
Update as Upsert
upserts data. When you deslect the
Update as Upsert
option and perform an update, it does not work.
When you configure a data driven expression to update the rows and do not specify a third argument in the IIF condition, the agent treats the operation as an insert for all non-matching rows.
For example, if you give a condition
IIF(Update_column=2,DD_UPDATE)
without including the third argument, such as
DD_DELETE
in the following expression
IIF(Update_column=2,DD_UPDATE,DD_DELETE)
, for any other row where the value of the update_column is not equal to 2, the agent performs an insert by default.
In the example, from the expression
IIF(COL_INTEGER = 2, DD_UPDATE)
, COL_INTEGER=2 resolves to 1 and COL_INTEGER!=2 resolves to 0,
where 1 is the internal ID for the Update operation and 0 is the internal ID for the Insert operation. The value of the third argument when not specified defaults to 0.
When you use a parameterized Hive connection in the Target transformation, the Update Column field does not display in the target properties. In the Target transformation, select a valid connection and object that display in the list and then select the operation.
When you include an unassigned data field from an ISD source to write to a Hive target, the mapping fails with the following error:
[UnassignedData] in the transformation [Target] because the type configurations do not match
When you configure the update, upsert, or data driven update strategy in mappings and the following conditions are true, the output can be unpredictable:
More than one row in the source matches the data in target.
You have set the flag
hive.merge.cardinality.check
to false in the PreSQL field in the source or target transformation properties.
If you set the flag to true, the mapping fails and an error displays in the logs.
Truncate
When you configure a mapping to write data to a Hive external table, the truncate target option is not applicable.
When you configure an insert operation in a mapping and choose to truncate the table before inserting new rows, the Secure Agent truncates only the partitions in the Hive table for which the transformation received the input data. As a workaround, when you want to truncate the entire Hive table, specify the following pre-SQL statement in the Target transformation:
truncate table <table_name>;
Truncate table is ignored for update and delete operations.
Dynamic mapping task
When the Hive target object is parameterized and the selected operation is data driven or upsert in the mapping, the Update Column field does not display in the dynamic mapping task target properties.
Data preview for transformations
When you create a mapping, you cannot preview data for individual transformations to test the mapping logic.
Parameterization
If the target object is parameterized and the field mapping is not set to automap, the option to create a target option does not display in the task properties. You must set the field mapping to automap when you use a parameterized object.
Create target
When you re-run a mapping task, the task fails when the table is dropped at the backend. You must edit the mapping or mapping task and then run the mapping task.
If you create a Hive target and the data is of the RC file format, you cannot insert null values to the binary data type column.
You cannot write float and double data types that contain exponential data to a Hive target on the Cloudera EMR cluster.
When you create a new Hive target at runtime, you can parameterize only the target connection and the object name. You cannot parameterize the other properties such as the partitioning, bucketing, path, location, file format, and additional properties.
If the source data contains complex fields, the metadata fails to appear on the
Target Fields
tab. If the selected operation is update, upsert, delete, or data driven operation in the Hive Target transformation, the metadata does not appear in the
Update Columns
tab.
When you run a task to create a new Hive target at runtime, the mapping cannot fetch records from the source and the mapping fails when both of the following conditions are true:
Source or target contains complex data type columns.
Source or target selected in the mapping is parameterized.
A mapping that contains 15 or more Hive targets selected to create at runtime fails with a translation error.
If the target table column name contains special characters and the update strategy condition is parameterized, the mapping fails.