Table of Contents

Search

  1. Abstract
  2. Informatica Installation
  3. Informatica Upgrade
  4. 10.1 Fixed Limitations and Closed Enhancements
  5. 10.1 Known Limitations
  6. Informatica Global Customer Support

Big Data Known Limitations

The following table describes known limitations:
CR
Description
PLAT-8729
If you configure MapR 5.1 on SUSE 11 and run a Sqoop mapping on a Hadoop cluster, the mapping fails with the following error:
com.mapr.security.JNISecurity.SetClusterOption(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Isqoop
PLAT-8714
If you run a mapping on HiveServer2 on a SUSE 11 Hortonworks cluster that is enabled with Kerberos authentication, a MySQL connection leak occurs and the mapping fails with the following error:
[HiveServer2-Handler-Pool: Thread-3439]: transport.TSaslTransport (TSaslTransport.java:open(315)) - SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Workaround: You can reduce the number of MySQL connection leak occurrences. To do this, after you run the Big Data Management Configuration Utility, change the
hive.server2.authentication
property value from
NONE
to
KERBEROS
in the hive-site.xml/hdfs-site.xml files:
<property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property>
BDM-741
BDM-692
BDM-691
When you run a mapping with the Spark engine on a Hadoop cluster, the mapping fails with an error like:
[org.apache.hadoop.security.AccessControlException: Permission denied: user=<USER>, access=WRITE, …
Workaround: The cluster requires additional configuration to properly authenticate users when Spark runs the mapping.
Non-Kerberos-enabled clusters:
If the cluster does not use Kerberos for authentication, contact the Hadoop administrator to perform the following steps:
  1. Browse to
    /<Big Data Management installation home>/services/shared/hadoop/<Hadoop_distribution><version>/conf
    and add the following property to the file yarn-site.xml:
    <property> <name>fs.permissions.umask-mode</name> <value>002</value> </property>
  2. Browse to the HDFS service configuration page in the cluster management web interface, or open the file hdfs.xml.
    Locate the superusergroup property. This will be named dfs.permissions.superusergroup or dfs.permissions.supergroup.
  3. Add the following users to the superusergroup property:
    • Data Integration Service process user or impersonation user
    • YARN user
    • HDFS connection user, if the mapping uses an HDFS target.
      If the mapping uses a Hive target, it is not necessary to add the HDFS connection user.
Kerberos-enabled clusters:
Administrators of Kerberos-enabled clusters commonly enable the impersonation user as a user on all nodes of the cluster. No further workaround is necessary.​
461622
A mapping fails to run in the Blaze environment if multiple transformation strategies in the mapping identify the same probabilistic model file or classifier model file.
461610 
Column profile with data domain discovery fails when the data source is a Hive source, you choose the sampling option as
All rows
, and you run the profile on the Blaze engine.
Workaround: Choose the sampling option as
Sample first
,
Random sample
, or
Random sample (auto)
and run the profile.
461286
When you run mappings on the Spark engine within a very short time span, such as 20 seconds, the mappings fail with OSGI errors.
Workaround: Append the text "&:eclipse.stateSaveDelayInterval:2000" to the value for the following OSGI properties in the hadoopEnv.properties file:

    infaspark.executor.osgi.config

    infaspark.driver.cluster.mode.osgi.config

461044
When you run mappings on the Spark engine, the mapping run fails with a compilation error.
Cause: The cluster uses an instance of Java other than the Java that ships with Informatica Big Data Management.
Workaround: Set the cluster to use the instance of Java that is installed with Big Data Management.
  1. Set the JAVA_HOME variable to
    <Big Data Management Installation Home>/java/jre
    .
  2. Set the JAVA_HOME/bin variable like this:
    PATH=${INFA_HOME_BIN}:${JAVA_HOME}/bin:${PATH} export PATH
Then restart the Informatica domain and services.
460997
If you configure user impersonation and run a Sqoop mapping on a Hadoop cluster that uses Kerberos authentication, the mapping fails.
Workaround: Use the Hadoop service principal name in the Hadoop connection and run the mapping.
460915
The performance of a Data Masking mapping that includes a non-partitioned relational source is slow in a Blaze environment.
460889
The Union transformation produces incorrect results for Sqoop mappings that you run on the Hortonworks distribution by using the TEZ engine.
460640
Big Data Management supports Hortonworks Hadoop clusters that use Java 1.8. When the cluster uses Java 1.7, mappings that you execute using the Hive engine fail. You see an error like:
Unrecognized VM option 'MaxMetaspaceSize=256M' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.​
Workaround: Perform the following steps to edit Hadoop properties on the VM that hosts the Data Integration Service:
  1. On the VM in the Informatica Big Data Management implementation that runs the Data Integration Service, open the following file for editing:
    <Informatica installation directory>/services/shared/hadoop/<Hadoop_distribution_name>_<version_number>/infaConf/hadoopEnv.properties
  2. Find the following line and comment it out by placing a # character at the beginning:
    infapdo.java.opts=-Djava.library.path=$HADOOP_NODE_INFA_HOME/services/shared/bin:$HADOOP_NODE_HADOOP_DIST/lib/native:$HADOOP_NODE_HADOOP_DIST/lib/*:$HADOOP_NODE_HADOOP_DIST/lib/native -Djava.security.egd=file:/dev/./urandom -Xms512m -Xmx512m -XX:MaxMetaspaceSize=256M
  3. Find the following line and uncomment it by removing the # character at the beginning.
    infapdo.java.opts=-Djava.library.path=$HADOOP_NODE_INFA_HOME/services/shared/bin:$HADOOP_NODE_HADOOP_DIST/lib/native:$HADOOP_NODE_HADOOP_DIST/lib/*:$HADOOP_NODE_HADOOP_DIST/lib/native -Djava.security.egd=file:/dev/./urandom -Xms512m -Xmx512m -XX:MaxPermSize=512m
460412
When you export data to an Oracle database through Sqoop, the mapping fails in certain situations. This issue occurs when all of the following conditions are true:
  • You configure the direct argument to use OraOop.
  • The data contains a column of the float data type.
459942
The Spark engine does not run the footer row command configured for a flat file target.
459671
When you export data to an IBM DB2 z/OS database through Sqoop and do not configure the batch argument, the mapping fails.
Workaround: Configure the batch argument in the mapping and run the mapping again.
458238
Lookup performance on the Spark engine is very slow when the lookup data contains null values.
457397
When you use Sqoop and define a join condition in the custom query, the mapping fails.
457072
When you use Sqoop and join two tables that contain a column with the same name, the mapping fails.
456892
When you generate and execute a DDL script to create or replace a Hive target table in the Blaze run-time environment, the mapping fails.
456884
When you use Sqoop and the first mapper task fails, the subsequent mapper tasks fail with the following error message:
File already exists
456866
The Developer tool allows you to change an Avro data type in a complex file object to one that Avro does not support. As a result, mapping errors occur at run time.
Workaround: If you change an Avro data type, verify that it is a supported type.
456704
When you use Sqoop to import data from an Aurora database by using the MariaDB JDBC driver, the mapping stops responding.
456616
When you export data through Sqoop and there are primary key violations, the mapping fails and bad records are not written to the bad file.
456608
If you update a data object that uses Sqoop and synchronize the data object, the updates are not included in the Sqoop import command.
456455
When you enable Sqoop for a logical data object and export data to an IBM DB2 database, the Sqoop export command fails. However, the mapping runs successfully without any error.
456285
When you export data to a Netezza database through Sqoop and the database contains a column of the float data type, the mapping fails.
455750
Sqoop does not read the OraOop arguments that you configure in the
oraoop-site.xml
file.
Workaround: Specify the OraOop arguments as part of the Sqoop arguments in the mapping.
453313
If you run multiple concurrent mappings on the Spark engine, performance might be slow and the log messages indicate that resources are not available. The Data Integration Service indicates that the mapping failed even though it is still running in the cluster.
453097
When you use Sqoop for a data object and update its properties in the associated Read or Write transformation, the mapping terminates with an IVector error message.
Workaround: Create a new data object and mapping.
452819
Mappings and profiles that use snappy compression fail in HiveServer2 mode on HDP and CDH SUSE clusters.
Workaround:
On the Informatica domain, edit the property that contains the location of the cluster native library:
  1. Back up the following file, then open it for editing:
    <Informatica Installation Directory>/services/shared/hadoop/<Hadoop_distribution_name>_<version_number>/infaConf/hadoopEnv.properties
  2. Find the $HADOOP_NODE_HADOOP_DIST/lib/native property, and replace the value with the location of the cluster native library.
    Hortonworks example:
    /usr/hdp/2.4.2.0-258/hadoop/lib/native
    Cloudera example:
    /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
On the Hadoop cluster:
  1. Open the
    HiveServer2_EnvInfa.txt
    file for editing.
  2. Change the value of
    <Informatica distribution home>/services/shared/hadoop/<Hadoop_distribution>/lib/native
    to the location of the cluster native library.
  3. Copy the contents of the
    HiveServer2_EnvInfa.txt
    file.
  4. Open the
    hive-env.sh
    file for editing, and paste the entire contents of the
    HiveServer2_EnvInfa.txt
    file.
452224
The summary and detail statistics empty for mappings run on Tez.
452114
When you enable Sqoop for a data object and a table or column name contains Unicode characters, the mapping fails.
450507
When mappings fail, the Spark engine does not drop temporary Hive tables used to store data during mapping execution. You can manually remove the tables.
443164
Mappings that read from one of the following sources fail to run in the native environment when the Data Integration Service is configured to run jobs in separate remote processes:
  • Flat file or complex file in the Hadoop Distributed File System (HDFS)
  • HIVE table
  • HBase table
Workaround: On the Compute view for the Data Integration Service, configure the INFA_HADOOP_DIST_DIR environment variable for each node with the compute role. Set the environment variable to the same value configured for the Data Integration Service Hadoop Distribution Directory execution option for the Data Integration Service.
440423
When you use an ODBC connection to write time data to a Netezza database, the mapping fails. This issue occurs when you run the mapping on Cloudera 5u4.
437196
The path of the resource file in a complex file object appears as a recursive path of directories starting with the root directory and ending with a string.
424789
Mapping with a Hive source and target that uses an ABS function with an IIF function fails in the Hadoop environment.
422627
Mapping in the Hadoop environment fails when it contains a Hive source and a filter condition that uses the default table name prefixed to the column name.
Workaround: Edit the filter condition to remove the table name prefixed to the column name and run the mapping again.
421834
Mapping in the Hadoop environment fails because the Hadoop connection uses 128 characters in its name.