When I run a mapping with a Hive source or a Hive target on a different cluster, the Data Integration Service fails to push the mapping to Hadoop with the following error:
Failed to execute query [exec0_query_6] with error code , error message [FAILED: Error in semantic analysis: Line 1:181 Table not found customer_eur], and SQL state ].
When you run a mapping in a Hadoop environment, the Hive connection selected for the Hive source or Hive target, and the mapping must be on the same Hive metastore.
When I run a mapping with SQL overrides concurrently, the mapping hangs.
There are not enough available resources because the cluster is being shared across different engines.
Use different YARN scheduler queues for the Blaze and Spark engines to allow HiveServer2 to run SQL overrides through these engines.
Mappings run on the Blaze engine fail with the following preemption error messages:
2018-09-27 11:05:27.208 INFO: Container completion status: id
[container_e135_1537815195064_4755_01_000012]; state [COMPLETE];
diagnostics [Container preempted by scheduler]; exit status [-102]..
2018-09-27 11:05:27.208 SEVERE: Service
[OOP_Container_Manager_Service_2] has stopped running..
The Blaze engine does not support YARN preemption on either the Capacity Scheduler or the Fair Scheduler. Ask the Hadoop administrator to disable preemption on the queue allocated to the Blaze engine. For more information, see
Mappings Fail with Preemption Errors
When I configure a mapping to create a partitioned Hive table, the mapping fails with the error "Need to specify partition columns because the destination table is partitioned."
This issue happens because of internal Informatica requirements for a query that is designed to create a Hive partitioned table. For details and a workaround, see
Knowledge Base article 516266
When Spark runs a mapping with a Hive source and target, and uses the Hive Warehouse Connector, the mapping fails with the following error:
[[SPARK_1003] Spark task [<task name>] failed with the following error: [User class threw exception: java.lang.reflect.InvocationTargetException ... java.sql.SQLException: Cannot create PoolableConnectionFactory (Could not open client transport for any of the Server URI's in ZooKeeper: Could not establish connection...)
The issue occurs because the Data Integration Service fails to fetch the Hive DT.
Workaround: Add the URL for HiveServer2 Interactive to the advanced properties of the Hadoop connection:
In the Ambari web console, browse to
and copy the value of the property hive.server2.authentication.kerberos.principal.
Edit the Advanced Properties of the Hadoop connection to add the property spark.sql.hive.hiveserver2.jdbc.url.principal.
Paste the value that you copied in step 1 as the value of spark.sql.hive.hiveserver2.jdbc.url.principal.
Time stamp data that is precise to the nanosecond is truncated when a mapping runs on the Spark engine
Spark stores time stamp data to a precision of 1 microsecond (1us) and does not support nanosecond precision. When a mapping that runs on the Spark engine reads datetime data that has nanosecond precision, the data is truncated to the microsecond. For example,
is truncated to
The Blaze engine supports nanosecond precision.