Table of Contents


  1. Abstract
  2. Informatica Installation
  3. Informatica Upgrade
  4. 10.1 Fixed Limitations and Closed Enhancements
  5. 10.1 Known Limitations
  6. Informatica Global Customer Support

Big Data Known Limitations

The following table describes known limitations:
If you configure MapR 5.1 on SUSE 11 and run a Sqoop mapping on a Hadoop cluster, the mapping fails with the following error:;Ljava/lang/String;Ljava/lang/String;)Isqoop
If you run a mapping on HiveServer2 on a SUSE 11 Hortonworks cluster that is enabled with Kerberos authentication, a MySQL connection leak occurs and the mapping fails with the following error:
[HiveServer2-Handler-Pool: Thread-3439]: transport.TSaslTransport ( - SASL negotiation failure GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Workaround: You can reduce the number of MySQL connection leak occurrences. To do this, after you run the Big Data Management Configuration Utility, change the
property value from
in the hive-site.xml/hdfs-site.xml files:
<property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property>
When you run a mapping with the Spark engine on a Hadoop cluster, the mapping fails with an error like:
[ Permission denied: user=<USER>, access=WRITE, …
Workaround: The cluster requires additional configuration to properly authenticate users when Spark runs the mapping.
Non-Kerberos-enabled clusters:
If the cluster does not use Kerberos for authentication, contact the Hadoop administrator to perform the following steps:
  1. Browse to
    /<Big Data Management installation home>/services/shared/hadoop/<Hadoop_distribution><version>/conf
    and add the following property to the file yarn-site.xml:
    <property> <name>fs.permissions.umask-mode</name> <value>002</value> </property>
  2. Browse to the HDFS service configuration page in the cluster management web interface, or open the file hdfs.xml.
    Locate the superusergroup property. This will be named dfs.permissions.superusergroup or dfs.permissions.supergroup.
  3. Add the following users to the superusergroup property:
    • Data Integration Service process user or impersonation user
    • YARN user
    • HDFS connection user, if the mapping uses an HDFS target.
      If the mapping uses a Hive target, it is not necessary to add the HDFS connection user.
Kerberos-enabled clusters:
Administrators of Kerberos-enabled clusters commonly enable the impersonation user as a user on all nodes of the cluster. No further workaround is necessary.​
A mapping fails to run in the Blaze environment if multiple transformation strategies in the mapping identify the same probabilistic model file or classifier model file.
Column profile with data domain discovery fails when the data source is a Hive source, you choose the sampling option as
All rows
, and you run the profile on the Blaze engine.
Workaround: Choose the sampling option as
Sample first
Random sample
, or
Random sample (auto)
and run the profile.
When you run mappings on the Spark engine within a very short time span, such as 20 seconds, the mappings fail with OSGI errors.
Workaround: Append the text "&:eclipse.stateSaveDelayInterval:2000" to the value for the following OSGI properties in the file:



When you run mappings on the Spark engine, the mapping run fails with a compilation error.
Cause: The cluster uses an instance of Java other than the Java that ships with Informatica Big Data Management.
Workaround: Set the cluster to use the instance of Java that is installed with Big Data Management.
  1. Set the JAVA_HOME variable to
    <Big Data Management Installation Home>/java/jre
  2. Set the JAVA_HOME/bin variable like this:
    PATH=${INFA_HOME_BIN}:${JAVA_HOME}/bin:${PATH} export PATH
Then restart the Informatica domain and services.
If you configure user impersonation and run a Sqoop mapping on a Hadoop cluster that uses Kerberos authentication, the mapping fails.
Workaround: Use the Hadoop service principal name in the Hadoop connection and run the mapping.
The performance of a Data Masking mapping that includes a non-partitioned relational source is slow in a Blaze environment.
The Union transformation produces incorrect results for Sqoop mappings that you run on the Hortonworks distribution by using the TEZ engine.
Big Data Management supports Hortonworks Hadoop clusters that use Java 1.8. When the cluster uses Java 1.7, mappings that you execute using the Hive engine fail. You see an error like:
Unrecognized VM option 'MaxMetaspaceSize=256M' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.​
Workaround: Perform the following steps to edit Hadoop properties on the VM that hosts the Data Integration Service:
  1. On the VM in the Informatica Big Data Management implementation that runs the Data Integration Service, open the following file for editing:
    <Informatica installation directory>/services/shared/hadoop/<Hadoop_distribution_name>_<version_number>/infaConf/
  2. Find the following line and comment it out by placing a # character at the beginning:$HADOOP_NODE_INFA_HOME/services/shared/bin:$HADOOP_NODE_HADOOP_DIST/lib/native:$HADOOP_NODE_HADOOP_DIST/lib/*:$HADOOP_NODE_HADOOP_DIST/lib/native -Xms512m -Xmx512m -XX:MaxMetaspaceSize=256M
  3. Find the following line and uncomment it by removing the # character at the beginning.$HADOOP_NODE_INFA_HOME/services/shared/bin:$HADOOP_NODE_HADOOP_DIST/lib/native:$HADOOP_NODE_HADOOP_DIST/lib/*:$HADOOP_NODE_HADOOP_DIST/lib/native -Xms512m -Xmx512m -XX:MaxPermSize=512m
When you export data to an Oracle database through Sqoop, the mapping fails in certain situations. This issue occurs when all of the following conditions are true:
  • You configure the direct argument to use OraOop.
  • The data contains a column of the float data type.
The Spark engine does not run the footer row command configured for a flat file target.
When you export data to an IBM DB2 z/OS database through Sqoop and do not configure the batch argument, the mapping fails.
Workaround: Configure the batch argument in the mapping and run the mapping again.
Lookup performance on the Spark engine is very slow when the lookup data contains null values.
When you use Sqoop and define a join condition in the custom query, the mapping fails.
When you use Sqoop and join two tables that contain a column with the same name, the mapping fails.
When you generate and execute a DDL script to create or replace a Hive target table in the Blaze run-time environment, the mapping fails.
When you use Sqoop and the first mapper task fails, the subsequent mapper tasks fail with the following error message:
File already exists
The Developer tool allows you to change an Avro data type in a complex file object to one that Avro does not support. As a result, mapping errors occur at run time.
Workaround: If you change an Avro data type, verify that it is a supported type.
When you use Sqoop to import data from an Aurora database by using the MariaDB JDBC driver, the mapping stops responding.
When you export data through Sqoop and there are primary key violations, the mapping fails and bad records are not written to the bad file.
If you update a data object that uses Sqoop and synchronize the data object, the updates are not included in the Sqoop import command.
When you enable Sqoop for a logical data object and export data to an IBM DB2 database, the Sqoop export command fails. However, the mapping runs successfully without any error.
When you export data to a Netezza database through Sqoop and the database contains a column of the float data type, the mapping fails.
Sqoop does not read the OraOop arguments that you configure in the
Workaround: Specify the OraOop arguments as part of the Sqoop arguments in the mapping.
If you run multiple concurrent mappings on the Spark engine, performance might be slow and the log messages indicate that resources are not available. The Data Integration Service indicates that the mapping failed even though it is still running in the cluster.
When you use Sqoop for a data object and update its properties in the associated Read or Write transformation, the mapping terminates with an IVector error message.
Workaround: Create a new data object and mapping.
Mappings and profiles that use snappy compression fail in HiveServer2 mode on HDP and CDH SUSE clusters.
On the Informatica domain, edit the property that contains the location of the cluster native library:
  1. Back up the following file, then open it for editing:
    <Informatica Installation Directory>/services/shared/hadoop/<Hadoop_distribution_name>_<version_number>/infaConf/
  2. Find the $HADOOP_NODE_HADOOP_DIST/lib/native property, and replace the value with the location of the cluster native library.
    Hortonworks example:
    Cloudera example:
On the Hadoop cluster:
  1. Open the
    file for editing.
  2. Change the value of
    <Informatica distribution home>/services/shared/hadoop/<Hadoop_distribution>/lib/native
    to the location of the cluster native library.
  3. Copy the contents of the
  4. Open the
    file for editing, and paste the entire contents of the
The summary and detail statistics empty for mappings run on Tez.
When you enable Sqoop for a data object and a table or column name contains Unicode characters, the mapping fails.
When mappings fail, the Spark engine does not drop temporary Hive tables used to store data during mapping execution. You can manually remove the tables.
Mappings that read from one of the following sources fail to run in the native environment when the Data Integration Service is configured to run jobs in separate remote processes:
  • Flat file or complex file in the Hadoop Distributed File System (HDFS)
  • HIVE table
  • HBase table
Workaround: On the Compute view for the Data Integration Service, configure the INFA_HADOOP_DIST_DIR environment variable for each node with the compute role. Set the environment variable to the same value configured for the Data Integration Service Hadoop Distribution Directory execution option for the Data Integration Service.
When you use an ODBC connection to write time data to a Netezza database, the mapping fails. This issue occurs when you run the mapping on Cloudera 5u4.
The path of the resource file in a complex file object appears as a recursive path of directories starting with the root directory and ending with a string.
Mapping with a Hive source and target that uses an ABS function with an IIF function fails in the Hadoop environment.
Mapping in the Hadoop environment fails when it contains a Hive source and a filter condition that uses the default table name prefixed to the column name.
Workaround: Edit the filter condition to remove the table name prefixed to the column name and run the mapping again.
Mapping in the Hadoop environment fails because the Hadoop connection uses 128 characters in its name.