Table of Contents

Search

  1. Preface
  2. Introduction to Big Data Streaming
  3. Big Data Streaming Configuration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Window Transformation
  8. Appendix A: Connections
  9. Appendix B: Data Type Reference
  10. Appendix C: Sample Files

Big Data Streaming User Guide

Big Data Streaming User Guide

Troubleshooting Streaming Mappings

Troubleshooting Streaming Mappings

When I run a streaming mapping, the mapping fails, and I see the following errors in the application logs of the Hadoop cluster:
User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, localhost): java.lang.Exception: Retry Failed: Total 3 attempts made at interval 10000ms at com.informatica.adapter.streaming.hdfs.common.RetryHandler.errorOccured(RetryHandler.java:74) at com.informatica.adapter.streaming.hdfs.HDFSMessageSender.sendMessages(HDFSMessageSender.java:55) at com.informatica.bootstrap.InfaStreaming$$anonfun$writeToHdfsPathRealtime$1$$anonfun$apply$5.apply(InfaStreaming.scala:144) at com.informatica.bootstrap.InfaStreaming$$anonfun$writeToHdfsPathRealtime$1$$anonfun$apply$5.apply(InfaStreaming.scala:132) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
This error occurs if the HDFS NameNode is configured incorrectly.
To resolve this error, ensure that you specify the NameNode URI correctly in the HDFS connection and that the NameNode is up and running.
When I try to run streaming mappings concurrently, a few of the mappings fail and I get the following error in the Data Integration Service logs:
Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuilder.append(StringBuilder.java:136)
This error occurs when the Data Integration Service does not have sufficient memory to run concurrent mappings. The Data Integration Service logs are located at
<INFA_HOME>/logs/<node name>/services/DataIntegrationService/disLogs/
To resolve this error, configure the following advanced properties of the Data Integration Service:
  • Maximum Heap Size. Specify a minimum value of 2048M. Default is 640M.
  • JVM command Line Options. Specify a minimum value of 1024M for the
    XX:MaxMetaspaceSize
    attribute. Default is 192M.
The streaming mapping execution fails with the following error in the in the application logs of the Hadoop cluster:
Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudqa/.staging/job_1475754687186_0406 PriviledgedActionException as:cloudqa (auth:PROXY) via yarn (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudqa, access=EXECUTE, inode="/tmp/hadoop-yarn/staging":yarn:supergroup:drwx------ at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:206) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:158) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6621) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6603) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOwner(FSNamesystem.java:6522)
This error occurs when a YARN user, Spark engine user, or mapping impersonation user does not have sufficient permission on the
/tmp/hadoop-yarn/staging
directory. Assign required permissions and run the mapping again.
The streaming mapping execution fails with the following error in the in the application logs of the Hadoop cluster:
Mapping execution fails with error "Error: : Unsupported major.minor version 52.0"
This error occurs if there is a mismatch in the JDK version on which Cloudera processes are running and the JDK version specified in cluster environment variables.
To resolve this error, ensure that both the versions are set to the version supported by Informatica.
To configure the
jdk_home
directory, perform the following steps:
  1. Edit the Hadoop connection in the Developer tool or the Administrator tool.
  2. On the Common Attributes tab, edit the
    Cluster Environment Variables
    property
  3. Set the
    HADOOP_NODE_JDK_HOME
    property correctly.
When I run a Streaming mapping that contains an HBase data object, I get the following error:
HBaseDataAdapter : java.lang.NullPointerException at com.informatica.products.extensions.adapter.hadoop.hive.storagehandler.utils.PwxWriter.close(PwxWriter.java:165) at com.informatica.products.extensions.adapter.hadoop.hive.storagehandler.PwxHiveRecordWriter.close(PwxHiveRecordWriter.java:119) at com.informatica.platform.dtm.executor.hive.boot.storagehandler.INFAOutputFormat$INFAHiveRecordWriter.close(INFAOutputFormat.java:145) at org.apache.spark.sql.hive.SparkHiveWriterContainer.close(hiveWriterContainers.scala:109) at org.apache.spark.sql.hive.SparkHiveWriterContainer.writeToFile(hiveWriterContainers.scala:194) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
This error occurs when you try to write a null value to a ROW column of an HBase table.
Ensure that you do not write a null value to a ROW column of an HBase table.
When I test a MapRStreams connection, the Developer tool crashes.
This error occurs if you have not completed the required prerequisites.
Ensure that you copy the conf files to the following directory:
<INFA_HOME>\clients\DeveloperClient\hadoop\mapr_5.2.0\conf
For more information about the prerequisite tasks, see the
Informatica Big Data Management Cluster Integration Guide
.
I use Sqoop as a Lookup transformation in the streaming mapping. The mapping fails, and I see the following error in the application logs of the CDH cluster:
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
This error occurs if the MapReduce directory is configured incorrectly.
To resolve this error, perform the following steps:
  1. In the Yarn configuration, find the
    NodeManager Advanced Configuration Snippet (Safety Valve) for mapred-site.xml
    property.
  2. Add the following xml snippet:
    <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/,$HADOOP_MAPRED_HOME/lib/, $MR2_CLASSPATH</value> </property>
  3. Restart the affected services as indicated by Cloudera Manager and run the mapping again.
I use Sqoop as a Lookup transformation in the streaming mapping. The mapping validation fails, and I see the following errors in the Developer tool:
Mapping1 Mapping The transformation [output] contains a binary data type which you cannot use in a Streaming mapping. Use a valid data type. Mapping1 Mapping The transformation [output] contains a binary data type which you cannot use in a Streaming mapping. Use a valid data type. [ID:BINARY_FIELD_NOT_SUPPORTED_STREAMING] Lookup_ORACLE_TEST_CHAR MRS/Sqoop_test/Mapping1 ORACLE_TEST_CHAR Relational Data Object In relational column [TEST_NUMBER] with native datatype [decimal], the scale [-127] is not valid. [ID:INVALID_SCALE] TEST_NUMBER MRS/Sqoop_test/ORACLE_TEST_CHAR
The errors occur if the Lookup transformation has a data type, such as binary, that Spark Streaming or Sqoop import does not support.
To resolve this error, delete the columns of the unsupported data type in the Lookup transformation and then validate the mapping.
For more information about data type support, see the
Informatica Big Data Management User Guide
.
I use Sqoop as a Lookup transformation in the streaming mapping. The mapping fails, and the following error appears in the application logs of the Hadoop cluster:
User class threw exception: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/bin/../lib/hadoop-yarn/bin/yarn: line 318: /usr/java/default1/bin/java: No such file or directory /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/bin/../lib/hadoop-yarn/bin/yarn: line 318: exec: /usr/java/default1/bin/java: cannot execute: No such file or directory
This error occurs when
jdk_home
of the Hadoop distribution is configured incorrectly.
To configure the
jdk_home
directory, perform the following steps:
  1. Edit the Hadoop connection in the Developer tool or the Administrator tool.
  2. On the Common Attributes tab, edit the
    Cluster Environment Variables
    property
  3. Set the
    HADOOP_NODE_JDK_HOME
    property correctly.
I use Sqoop as a Lookup transformation in a streaming mapping. The mapping fails, and I see the following error in the mapping logs:
<INFA_HOME>/logs/node_automation/services/DataIntegrationService/disLogs/ms : Caused by: java.io.IOException: Cannot run program "<INFA_HOME>/services/shared/hadoop/<Hadoop distribution>/scripts/ HadoopFsRmRf" (in directory "."): error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at java.lang.Runtime.exec(Runtime.java:620)
This error occurs when you do not have sufficient permissions on the
<Informatica installation directory>\externaljdbcjars
directory in the Informatica domain. Get the required permissions and then run the mapping again.
For more information about the JDBC driver JAR files for Sqoop connectivity, see the
Informatica Big Data Management Cluster Integration Guide
.
When I import a data object with Avro schema in a streaming mapping the mapping fails with the following error:
com.informatica.adapter.sdkadapter.exceptions.AdapterSDKException: [SDK_APP_COM_20000] error [getSchemaConfig()::java.io.IOException: Not a data file.]
This occurs when the sample schema file is invalid. When you specify Avro format, provide a valid Avro schema in a .avsc file.
When I run a streaming mapping with an Event Hub source or target, the mapping fails with the following error:
com.microsoft.azure.eventhubs.AuthorizationFailedException: Put token failed. status-code: 401, status-description: InvalidSignature: The token has an invalid signature.
This occurs when the Shared Access Policy Name or Shared Access Policy Primary Key is configured incorrectly in the Azure EventHub read data operation, Azure EventHub write data operation, or Azure Eventhub connection. Configure valid values for these properties. When you configure the properties in the Azure EventHub connection, ensure that the policy applies to all data objects that are associated with the connection.
A streaming mapping created in 10.2.0 might not run after upgrading to 10.2.1 when you use JSON schema for column projection for the data you are writing
This might occur if the payload has malformed data.
Remove the malformed data run the mapping.
When I run a streaming mapping on a Cloudera CDH cluster that contains a Kafka source or target, the mapping fails.
This error occurs because the default value of the
Offset Commit Topic Replication Factor
property for the Kafka broker is 3 and you are running the mapping on a Cloudera CDH cluster with one or two nodes.
Perform the following steps to resolve this error:
  1. Delete all Kafka topics.
  2. Stop the Kafka broker and clear the Kafka log directory specified in the
    log.dirs
    property of the Kafka broker.
  3. Stop Zookeeper and clear the log directory specified in the
    dataDir
    property of ZooKeeper.
  4. Configure the
    Offset Commit Topic Replication Factor
    property for the Kafka broker. Specify a value of 1 or 2 for
    offsets.topic.replication.factor
    property depending on the number of nodes in the cluster.
  5. Start ZooKeeper and the Kafka broker.
  6. Create the Kafka topics and run the mapping again.
When I run a streaming mapping that contains a JMS target, I get the following error:
WebSphere MQ call failed with compcode '2' ('MQCC_FAILED') reason '2053' ('MQRC_Q_FULL'). at com.ibm.msg.client.wmq.common.internal.Reason.createException
This error occurs when the JMS queue that you are writing to is full.
Increase the queue depth of the JMS server and run the mapping again.
In a streaming mapping, when I perform a self-join on source data that has metadata of complex data type, the mapping validation fails at design time.
This error might occur if you have selected the
Sorted Input
property in the advanced properties of the Joiner transformation.
To resolve this error, deselect the
Sorted Input
property and run the mapping again.
When I run a streaming mapping that contains a JMS source, it fails with an
Unexpected JMSMessage payload type
error.
This error might occur in the following situations:
  • There is a mismatch between the data type that you write to the queue and the data present in the queue. Clear the JMS queue and then run the mapping.
  • There is a mismatch in the data type that you configured for the JMS source and the data type of the streamed data. Verify that the data type that you configure for the source is the same as the data type of the streamed data.
When I edit the schema of a data that is of complex data type, the schema type does not change.
This error occurs because the
Project Column as Complex Data Type
option is not selected.
To resolve this error, when you edit the schema, select the
Project Column as Complex Data Type
option in the columns projection properties of the data object read or write operation properties.

0 COMMENTS

We’d like to hear from you!