Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Streaming
  3. Data Engineering Streaming Administration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Window Transformation
  8. Appendix A: Connections
  9. Appendix B: Monitoring REST API Reference
  10. Appendix C: Sample Files

Troubleshooting Streaming Mappings

Troubleshooting Streaming Mappings

When I run a streaming mapping, the mapping fails, and I see the following errors in the application logs of the Hadoop cluster:
User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, localhost): java.lang.Exception: Retry Failed: Total 3 attempts made at interval 10000ms at com.informatica.adapter.streaming.hdfs.common.RetryHandler.errorOccured(RetryHandler.java:74) at com.informatica.adapter.streaming.hdfs.HDFSMessageSender.sendMessages(HDFSMessageSender.java:55) at com.informatica.bootstrap.InfaStreaming$$anonfun$writeToHdfsPathRealtime$1$$anonfun$apply$5.apply(InfaStreaming.scala:144) at com.informatica.bootstrap.InfaStreaming$$anonfun$writeToHdfsPathRealtime$1$$anonfun$apply$5.apply(InfaStreaming.scala:132) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
This error occurs if the HDFS NameNode is configured incorrectly.
To resolve this error, ensure that you specify the NameNode URI correctly in the HDFS connection and that the NameNode is up and running.
When I try to run streaming mappings concurrently, a few of the mappings fail and I get the following error in the Data Integration Service logs:
Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuilder.append(StringBuilder.java:136)
This error occurs when the Data Integration Service does not have sufficient memory to run concurrent mappings. The Data Integration Service logs are located at
<INFA_HOME>/logs/<node name>/services/DataIntegrationService/disLogs/
To resolve this error, configure the following advanced properties of the Data Integration Service:
  • Maximum Heap Size. Specify a minimum value of 2048M. Default is 640M.
  • JVM command Line Options. Specify a minimum value of 1024M for the
    XX:MaxMetaspaceSize
    attribute. Default is 192M.
The streaming mapping execution fails with the following error in the application logs of the Hadoop cluster:
Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudqa/.staging/job_1475754687186_0406 PriviledgedActionException as:cloudqa (auth:PROXY) via yarn (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudqa, access=EXECUTE, inode="/tmp/hadoop-yarn/staging":yarn:supergroup:drwx------ at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:206) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:158) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6621) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6603) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOwner(FSNamesystem.java:6522)
This error occurs when a YARN user, Spark engine user, or mapping impersonation user does not have sufficient permission on the
/tmp/hadoop-yarn/staging
directory. Assign required permissions and run the mapping again.
When I run a Streaming mapping that contains an HBase data object, I get the following error:
HBaseDataAdapter : java.lang.NullPointerException at com.informatica.products.extensions.adapter.hadoop.hive.storagehandler.utils.PwxWriter.close(PwxWriter.java:165) at com.informatica.products.extensions.adapter.hadoop.hive.storagehandler.PwxHiveRecordWriter.close(PwxHiveRecordWriter.java:119) at com.informatica.platform.dtm.executor.hive.boot.storagehandler.INFAOutputFormat$INFAHiveRecordWriter.close(INFAOutputFormat.java:145) at org.apache.spark.sql.hive.SparkHiveWriterContainer.close(hiveWriterContainers.scala:109) at org.apache.spark.sql.hive.SparkHiveWriterContainer.writeToFile(hiveWriterContainers.scala:194) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:131) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
This error occurs when you try to write a null value to a ROW column of an HBase table.
Ensure that you do not write a null value to a ROW column of an HBase table.
When I test a MapR Streams connection, the Developer tool crashes.
This error occurs if you have not completed the required prerequisites.
Ensure that you copy the conf files to the following directory:
<INFA_HOME>\clients\DeveloperClient\hadoop\mapr_5.2.0\conf
For more information about the prerequisite tasks, see the
Data Engineering Integration Guide
.
When I import a data object with Avro schema in a streaming mapping the mapping fails with the following error:
com.informatica.adapter.sdkadapter.exceptions.AdapterSDKException: [SDK_APP_COM_20000] error [getSchemaConfig()::java.io.IOException: Not a data file.]
This occurs when the sample schema file is invalid. When you specify Avro format, provide a valid Avro schema in a .avsc file.
When I run a streaming mapping with an Event Hub source or target, the mapping fails with the following error:
com.microsoft.azure.eventhubs.AuthorizationFailedException: Put token failed. status-code: 401, status-description: InvalidSignature: The token has an invalid signature.
This occurs when the Shared Access Policy Name or Shared Access Policy Primary Key is configured incorrectly in the Azure EventHub read data operation, Azure EventHub write data operation, or Azure Eventhub connection. Configure valid values for these properties. When you configure the properties in the Azure EventHub connection, ensure that the policy applies to all data objects that are associated with the connection.
A streaming mapping created in 10.2.0 might not run after upgrading to 10.2.1 when you use JSON schema for column projection for the data you are writing
This might occur if the payload has malformed data.
Remove the malformed data run the mapping.
When I run a streaming mapping on a Cloudera CDH cluster that contains a Kafka source or target, the application does not process the data.
This error occurs when the offset commit topic replication factor configured for the Kafka broker does not match the number of nodes running on the Cloudera CDH cluster.
Perform the following steps to resolve this error:
  1. Delete all Kafka topics.
  2. Stop the Kafka broker and clear the Kafka log directory specified in the
    log.dirs
    property of the Kafka broker.
  3. Stop ZooKeeper and clear the log directory specified in the
    dataDir
    property of ZooKeeper.
  4. Configure the
    Offset Commit Topic Replication Factor
    property for the Kafka broker. Specify a value of 1 or 2 for
    offsets.topic.replication.factor
    property depending on the number of nodes in the cluster.
  5. Start ZooKeeper and the Kafka broker.
  6. Create the Kafka topics and run the mapping again.
When I run a streaming mapping that contains a JMS target, I get the following error:
WebSphere MQ call failed with compcode '2' ('MQCC_FAILED') reason '2053' ('MQRC_Q_FULL'). at com.ibm.msg.client.wmq.common.internal.Reason.createException
This error occurs when the JMS queue that you are writing to is full.
Increase the queue depth of the JMS server and run the mapping again.
In a streaming mapping, when I perform a self-join on source data that has metadata of complex data type, the mapping validation fails at design time.
This error might occur if you have selected the
Sorted Input
property in the advanced properties of the Joiner transformation.
To resolve this error, deselect the
Sorted Input
property and run the mapping again.
When I run a streaming mapping that contains a JMS source, it fails with an
Unexpected JMSMessage payload type
error.
This error might occur in the following situations:
  • There is a mismatch between the data type that you write to the queue and the data present in the queue. Clear the JMS queue and then run the mapping.
  • There is a mismatch in the data type that you configured for the JMS source and the data type of the streamed data. Verify that the data type that you configure for the source is the same as the data type of the streamed data.
When I edit the schema of a data that is of complex data type, the schema type does not change.
This error occurs because the
Project Column as Complex Data Type
option is not selected.
To resolve this error, when you edit the schema, select the
Project Column as Complex Data Type
option in the columns projection properties of the data object read or write operation properties.
When I run a streaming mapping with a Kafka source and a Hive target, the mapping fails with the following error message:
java.io.IOException: Mkdirs failed to create file
To resolve this error, set 777 permissions on the hive.exec.scratchdir directory for the user on all the nodes of the cluster, and then run the mapping.
When I run a streaming mapping in AWS Databricks service with a Kafka source and an Amazon S3 target, the mapping returns malformed data and becomes unresponsive. The mapping then stops processing new data.
This error occurs when some of the Databricks nodes do not have access to the Kafka cluster. Configure the Kafka cluster correctly so that it can be accessed by the Databricks cluster.
On AWS Databricks, a streaming mapping fails if the following conditions are true:
  • Reads from an Amazon Kinesis source
  • Writes to an Amazon S3 target
  • Sets the Amazon S3 connection as the state store connection
The mapping fails with the following error message:
ERROR Uncaught throwable from user code: com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [BasicAWSCredentialsProvider: Access key or secret key is null, com.amazonaws.auth.InstanceProfileCredentialsProvider@51634e29: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/]
The AWS access key ID and AWS secret access key are case-sensitive. For the state store functionality to work, you must add the following access key ID and secret access key to the
Spark Configuration
tab of the Databricks cluster:
spark.hadoop.fs.s3n.awsAccessKeyId
spark.hadoop.fs.s3n.awsSecretAccessKey