Table of Contents


  1. Abstract
  2. Installation and Upgrade
  3. 10.1.1 Fixed Limitations and Closed Enhancements
  4. 10.1.1 Known Limitations
  5. Informatica Global Customer Support

Big Data Known Limitations

The following table describes known limitations:
You cannot run a mapping in the native environment when the following conditions are true:
  • You select a native validation environment and a Hive or Blaze validation environment for the mapping.
  • The mapping contains a Match transformation.
When you use Sqoop and define a join condition in the custom query, the mapping fails. (457397)
When you use Sqoop and join two tables that contain a column with the same name, the mapping fails. (457072)
When you use Sqoop and the first mapper task fails, the subsequent mapper tasks fail with the following error message:
File already exists
The Developer tool allows you to change an Avro data type in a complex file object to one that Avro does not support. As a result, mapping errors occur at run time.
Workaround: If you change an Avro data type, verify that it is a supported type. (456866)
When you use Sqoop to import data from an Aurora database by using the MariaDB JDBC driver, the mapping stops responding. (456704)
When you export data through Sqoop and there are primary key violations, the mapping fails and bad records are not written to the bad file. (456616)
When you export data to a Netezza database through Sqoop and the database contains a column of the float data type, the mapping fails. (456285)
Sqoop does not read the OraOop arguments that you configure in the
Workaround: Specify the OraOop arguments as part of the Sqoop arguments in the mapping. (455750)
When you use Sqoop for a data object and update its properties in the associated Read or Write transformation, the mapping terminates with an IVector error message.
Workaround: Create a new data object and mapping. (453097)
When you enable Sqoop for a data object and a table or column name contains Unicode characters, the mapping fails. (452114)
Mappings that read from one of the following sources fail to run in the native environment when the Data Integration Service is configured to run jobs in separate remote processes:
  • Flat file or complex file in the Hadoop Distributed File System (HDFS)
  • HIVE table
  • HBase table
Workaround: On the Compute view for the Data Integration Service, configure the INFA_HADOOP_DIST_DIR environment variable for each node with the compute role. Set the environment variable to the same value configured for the Data Integration Service Hadoop Distribution Directory execution option for the Data Integration Service. (443164)
If you configure MapR 5.1 on SUSE 11 and run a Sqoop mapping on a Hadoop cluster, the mapping fails with the following error:;Ljava/lang/String;Ljava/lang/String;)Isqoop
When you run a Sqoop mapping on the Blaze engine to import data from multiple sources and the join condition contains an OR clause, the mapping fails.
In a Sqoop mapping, if you add a Filter transformation to filter timestamp data from a Teradata source and export the data to a Teradata target, the mapping runs successfully on the Blaze engine. However, the Sqoop program does not write the timestamp data to the Teradata target.
When you use a JDBC connection in a mapping to connect to a Netezza source that contains the Time data type, the mapping fails to run on the Blaze engine.
The Union transformation produces incorrect results for Sqoop mappings that you run on the Hortonworks distribution by using the TEZ engine. (460889)
The path of the resource file in a complex file object appears as a recursive path of directories starting with the root directory and ending with a string. (437196)
When you export data to an IBM DB2 z/OS database through Sqoop and do not configure the batch argument, the mapping fails.
Workaround: Configure the batch argument in the mapping and run the mapping again. (459671)
When you use an ODBC connection to write time data to a Netezza database, the mapping fails. This issue occurs when you run the mapping on Cloudera 5u4. (440423)
When you enable Sqoop for a logical data object and export data to an IBM DB2 database, the Sqoop export command fails. However, the mapping runs successfully without any error. (456455)
Mappings and profiles that use snappy compression fail in HiveServer2 mode on HDP and CDH SUSE clusters.
On the Informatica domain, edit the property that contains the location of the cluster native library:
  1. Back up the following file, then open it for editing:
    <Informatica Installation Directory>/services/shared/hadoop/<Hadoop_distribution_name>_<version_number>/infaConf/
  2. Find the $HADOOP_NODE_HADOOP_DIST/lib/native property, and replace the value with the location of the cluster native library.
    Hortonworks example:
    Cloudera example:
On the Hadoop cluster:
  1. Open the
    file for editing.
  2. Change the value of
    <Informatica distribution home>/services/shared/hadoop/<Hadoop_distribution>/lib/native
    to the location of the cluster native library.
  3. Copy the contents of the
  4. Open the
    file for editing, and paste the entire contents of the
Sqoop mappings fail with a null pointer exception on the Spark engine if you do not configure the Spark HDFS staging directory in the Hadoop connection.
If the Data Integration Service becomes unavailable while running mappings with Hive sources and targets on the Blaze engine, the lock acquired on a Hive target table may fail to be released.
Workaround: Connect to Hive using a Hive client such as the Apache Hive CLI or Hadoop Hive Beeline, and then use the
UNLOCK TABLE <table_name>
command to release the lock.
The Data Integration Service fails with out of memory errors when you run a large number of concurrent mappings on the Spark engine.
Workaround: Increase the heap memory settings on the machine where the Data Integration Service runs
In a Hortonworks HDP or an Azure HDInsight environment, a mapping that runs on the Hive engine enabled for Tez loads only the first data table to the target if the mapping contains a Union transformation.
Workaround: Run the mapping on the Hive engine enabled for MapReduce.
If an SQL override in the Hive source contains a DISTINCT or LIMIT clause, the mapping fails on the Spark engine.
If the Blaze Job Monitor starts on a node different from the node that it last ran on, the Administrator tool displays the Monitoring URL of the previous node.
Workaround: Correct the URL with the current job monitor host name from the log. Or restart the Grid Manager to correct the URL for the new jobs that start.
If a Sqoop source or target contains a column name with double quotes, the mapping fails on the Blaze engine. However, the Blaze Job Monitor incorrectly indicates that the mapping ran successfully and that rows were written into the target.
If a mapping or workflow contains a parameter, the mapping does not return system-defined mapping outputs when run in the Hadoop environment.
Blaze mappings fail with the error "The Integration Service failed to generate the grid execution plan for the mapping" when any of the following conditions are true:
  • The Apache Ranger KMS is not configured correctly on a Hortonworks HDP cluster.
  • The Hadoop KMS is not configured correctly for HDFS transparent encryption on a Cloudera CDH cluster.
  • The properties hadoop.kms.proxyuser.<SPN_user>.groups and hadoop.kms.proxyuser.<SPN_USER>.hosts for the Kerberos SPN are not set on the Hadoop cluster.
When you run a Sqoop mapping on the Blaze engine to export Netezza numeric data, the scale part of the data is truncated.
When the Blaze engine runs a mapping that uses source or target files in the WASB location on a cluster, the mapping fails with an error like:
java.lang.RuntimeException: [<error_code>] The Integration Service failed to run Hive query [exec0_query_6] for task [exec0] due to following error: <error_code> message [FAILED: ... Cannot run program "/usr/lib/python2.7/dist-packages/hdinsight_common/": error=2, No such file or directory], ...
The mapping fails because the cluster attempts to decrypt the data but cannot find a file needed to perform the decryption operation.
Workaround: Find the following files on the cluster and copy them to the
directory on the machine that runs the Data Integration Service:
  • key_decryption_cert.prv
Sqoop mappings fail on the Blaze engine if there are unconnected ports in a target. This issue occurs when you run the Sqoop mapping on any cluster other than a Cloudera 5.8 cluster.
Workaround: Before you run the mapping, create a table in the target database with columns corresponding to the connected ports.
When a Hadoop cluster is restarted without stopping the components of the Blaze engine, stale Blaze processes remain on the cluster.
Workaround: Kill the stale processes using the pkill command.
When you run a Sqoop mapping on the Spark engine, the Sqoop map-reduce jobs run in the default yarn queue instead of the yarn queue that you configure.
Workaround: To run a map-reduce job in a particular yarn queue, configure the following property in the
Sqoop Arguments
field of the JDBC connection:
To run a Spark job in a particular yarn queue, configure the following property in the Hadoop connection:
When you run a Sqoop mapping and abort the mapping from the Developer tool, the Sqoop map-reduce jobs continue to run.
Workaround: On the Sqoop data node, run the following command to kill the Sqoop map-reduce jobs:
yarn application -kill <application_ID>
When the proxy user setting is not correctly configured in core-site.xml, a mapping that you run with the Spark engine hangs with no error message.
Workaround: Set the value of the following properties in core-site.xml to “*” (asterisk):
  • hadoop.proxyuser.<Data Integration Service user name>.groups
  • hadoop.proxyuser.<Data Integration Service user name>.hosts
When you run a mapping on a cluster where Ranger KMS authorization is configured, the mapping fails with an "UndeclaredThrowableException" error.
To address this issue, choose one of the following workarounds:
  • If the cluster uses Ranger KMS for authorization, and the mapping which access the encryption zone, verify that the dfs.encryption.key.provider.uri property is correctly configured in hive-site.xml or hdfs-site.xml.
  • If the cluster does not use Ranger KMS, and you still encounter this issue, remove the dfs.encryption.key.provider.uri property from hive-site.xml and hdfs-site.xml.
When you run a Sqoop mapping on the Blaze engine and the columns contain Unicode characters, the Sqoop program reads them as null values.
On a Blaze engine, when an unconnected Lookup expression is referenced in a join condition, the mapping fails if the master source is branched and the Joiner transformation is optimized with a map-side join. The mapping fails with the following error: [TE_7017] Internal error. Failed to initialize transformation [producer0]. Contact Informatica Global Customer Support.
A user who is not in the Administrator group, but who has the privileges and permissions to access the domain and its services, does not have access to the Rest application properties in the Administrator tool when the applications are deployed by another user.
When mappings fail, the Spark engine does not drop temporary Hive tables used to store data during mapping execution. You can manually remove the tables. (450507)
The Spark engine does not run the footer row command configured for a flat file target. (459942)
The summary and detail statistics empty for mappings run on Tez. (452224)
Mapping with a Hive source and target that uses an ABS function with an IIF function fails in the Hadoop environment. (424789)
Mapping in the Hadoop environment fails when it contains a Hive source and a filter condition that uses the default table name prefixed to the column name.
Workaround: Edit the filter condition to remove the table name prefixed to the column name and run the mapping again. (422627)
Mapping in the Hadoop environment fails because the Hadoop connection uses 128 characters in its name. (421834)
Sqoop mappings that import data from or export data to an SSL-enabled database fail on the Blaze engine.
If you define an SQL override in the Hive source and choose to update the output ports based on the custom query, the mapping fails on the Blaze engine.
Mappings with an HDFS connection fail with a permission error on the Spark and Hive engines when all the following conditions are true:
  • The HDFS connection user is different from the Data Integration Service user.
  • The Hadoop connection does not have an impersonation user defined.
  • The Data Integration Service user does not have write access to the HDFS target folder.
Workaround: In the Hadoop connection, define an impersonation user with write permission to access the HDFS target folder.