Troubleshooting Mappings in a Non-native Environment
Troubleshooting Mappings in a Non-native Environment
Consider troubleshooting tips for mappings in a non-native environment.
Hadoop Environment
When I run a mapping with a Hive source or a Hive target on a different cluster, the Data Integration Service fails to push the mapping to Hadoop with the following error:
Failed to execute query [exec0_query_6] with error code [10], error message [FAILED: Error in semantic analysis: Line 1:181 Table not found customer_eur], and SQL state [42000]].
When you run a mapping in a Hadoop environment, the Hive connection selected for the Hive source or Hive target, and the mapping must be on the same Hive metastore.
When I run a mapping with SQL overrides concurrently, the mapping hangs.
There are not enough available resources because the cluster is being shared across different engines.
Configure YARN to use the capacity scheduler and use different YARN scheduler queues for Blaze and Spark.
When I configure a mapping to create a partitioned Hive table, the mapping fails with the error "Need to specify partition columns because the destination table is partitioned."
This issue happens because of internal Informatica requirements for a query that is designed to create a Hive partitioned table. For details and a workaround, see
Knowledge Base article 516266.
Databricks Environment
Mappings fail with the following error:
SEVERE: Run with ID [1857] failed with state [INTERNAL_ERROR] and error message [Library installation timed out after 1800 seconds. Libraries that are not yet installed: jar: "dbfs:/tmp/DATABRICKS/sess6250142538173973565/staticCode.jar"
This might happen when you run concurrent jobs. When Databricks does not have resources to process a job, it queues the job for a maximum of 1,800 seconds (30 minutes). If resources are not available in 30 minutes, the job fails. Consider the following actions to avoid timeouts:
Configure preemption environment variables on the Databricks cluster to control the amount of resources that get allocated to each job. For more information about preemption, see the
Big Data Management Integration Guide
.
Run cluster workflows to create ephemeral clusters. You can configure the workflow to create a cluster, run the job, and then delete the cluster. For more information about ephemeral clusters, see
Cluster Workflows.
Informatica integrates with Databricks, supporting standard concurrency clusters. Standard concurrency clusters have a maximum queue time of 30 minutes, and jobs fail when the timeout is reached. The maximum queue time cannot be extended. Setting the preemption threshold allows more jobs to run concurrently, but with a lower percentage of allocated resources, the jobs can take longer to run. Also, configuring the environment for preemption does not ensure that all jobs will run. In addition to configuring preemption, you might choose to run cluster workflows to create ephemeral clusters that create the cluster, run the job, and then delete the cluster. For more information about Databricks concurrency, contact Azure Databricks.