Table of Contents

Search

  1. Preface
  2. Introduction to Intelligent Streaming
  3. Intelligent Streaming Configuration
  4. Connections
  5. Sources and Targets in a Streaming Mapping
  6. Intelligent Streaming Mappings
  7. Window Transformation
  8. Data Type Reference

Troubleshooting Streaming Mappings

Troubleshooting Streaming Mappings

When I run a streaming mapping, the mapping fails, and I see the following errors in the application logs of the Hadoop cluster:
User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, localhost): java.lang.Exception: Retry Failed: Total 3 attempts made at interval 10000ms at com.informatica.adapter.streaming.hdfs.common.RetryHandler.errorOccured(RetryHandler.java:74) at com.informatica.adapter.streaming.hdfs.HDFSMessageSender.sendMessages(HDFSMessageSender.java:55) at com.informatica.bootstrap.InfaStreaming$$anonfun$writeToHdfsPathRealtime$1$$anonfun$apply$5.apply(InfaStreaming.scala:144) at com.informatica.bootstrap.InfaStreaming$$anonfun$writeToHdfsPathRealtime$1$$anonfun$apply$5.apply(InfaStreaming.scala:132) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
This error occurs if the HDFS NameNode is configured incorrectly.
To resolve this error, ensure that you specify the NameNode URI correctly in the HDFS connection and that the NameNode is up and running.
When I try to run streaming mappings concurrently, a few of the mappings fail and I get the following error in the Data Integration Service logs:
Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuilder.append(StringBuilder.java:136)
This error occurs when the Data Integration Service does not have sufficient memory to run concurrent mappings. The Data Integration Service logs are located at
<INFA_HOME>/logs/<node name>/services/DataIntegrationService/disLogs/
To resolve this error, configure the following advanced properties of the Data Integration Service:
  • Maximum Heap Size. Specify a minimum value of 2048M. Default is 640M.
  • JVM command Line Options. Specify a minimum value of 1024M for the
    XX:MaxMetaspaceSize
    attribute. Default is 192M.
The streaming mapping execution fails with the following error in the in the application logs of the Hadoop cluster:
Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudqa/.staging/job_1475754687186_0406 PriviledgedActionException as:cloudqa (auth:PROXY) via yarn (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudqa, access=EXECUTE, inode="/tmp/hadoop-yarn/staging":yarn:supergroup:drwx------ at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:206) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:158) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6621) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6603) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOwner(FSNamesystem.java:6522)
This error occurs when a YARN user, Spark engine user, or mapping impersonation user does not have sufficient permission on the
/tmp/hadoop-yarn/staging
directory. Assign required permissions and run the mapping again.
The streaming mapping execution fails with the following error in the in the application logs of the Hadoop cluster:
Mapping execution fails with error "Error: : Unsupported major.minor version 52.0"
This error occurs if there is a mismatch in the JDK version on which Cloudera processes are running and the JDK version specified in
jdk_home
directory.
To resolve this error, ensure that both the versions are set to the version supported by Informatica.
To configure the
jdk_home
directory, perform the following steps:
  1. Find the
    hadoopEnv.properties
    in the following directory:
    <INFA_HOME>/services/shared/hadoop/<Hadoop distribution>/infaConf
  2. Set the
    jdk_home
    property correctly.
When I run a Streaming mapping that contains Kafka data objects, the mapping execution does not fail, but the target does not receive any data.
This error occurs if there is a mismatch in the version of the Kafka server that is configured in the
hadoopRes.properties
and
hadoopEnv.properties
files. These files are located in the following path:
<InformaticaInstallationDir>/services/shared/hadoop/<Hadoop_distribution_name>/infaConf
To resolve this error, ensure that the Kafka version is the same in the both the files.
For example, if the Kafka version is 10.1.1, perform the following steps:
  1. Configure the following property in the
    hadoopRes.properties
    file:
    kafka.version=0.10.1.1
  2. Configure the following property in the
    hadoopEnv.properties
    file:
    infapdo.substvar.kafka.version=kafka.version=0.10.1.1
When I run a Streaming mapping that contains an HBase data object, I get the following error:
HBaseDataAdapter : java.lang.NullPointerException at com.informatica.products.extensions.adapter.hadoop.hive.storagehandler.utils.PwxWriter.close(PwxWriter.java:165) at com.informatica.products.extensions.adapter.hadoop.hive.storagehandler.PwxHiveRecordWriter.close(PwxHiveRecordWriter.java:119) at com.informatica.platform.dtm.executor.hive.boot.storagehandler.INFAOutputFormat$INFAHiveRecordWriter.close(INFAOutputFormat.java:145) at org.apache.spark.sql.hive.SparkHiveWriterContainer.close(hiveWriterContainers.scala:109) at org.apache.spark.sql.hive.SparkHiveWriterContainer.writeToFile(hiveWriterContainers.scala:194) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
This error occurs when you try to write a null value to a ROW column of an HBase table.
Ensure that you do not write a null value to a ROW column of an HBase table.
When I test a MapRStreams connection, the Developer tool crashes.
This error occurs if you have not completed the required prerequisites.
Ensure that you copy the conf files to the following directory:
<INFA_HOME>\clients\DeveloperClient\hadoop\mapr_5.2.0\conf
For more information about the prerequisite tasks, see the
Informatica Big Data Management Cluster Integration Guide
.