Enable the Hive Warehouse Connector and Hive LLAP for faster execution of Hive queries when you read from and write to Hive tables. You can use the Hive Warehouse Connector and Hive LLAP with Hortonworks HDP 3.x and Microsoft Azure HDInsight 4.x clusters on the Spark engine.
The Hive Warehouse Connector reads from and writes to Hive tables without using temporary staging tables that require additional storage overhead. Use the Hive Warehouse Connector on the Spark engine to allow Spark code to interact with Hive targets and to use ACID-enabled Hive tables. When you enable the Hive Warehouse Connector, mappings use Hive LLAP to run Hive queries rather than HiveServer2.
Consider the following limitations when you use the Hive Warehouse Connector and Hive LLAP:
The Hive Warehouse Connector and Hive LLAP are used to run insert queries to ACID-enabled tables that are not bucketed.
You cannot use the Hive Warehouse Connector and Hive LLAP when you read hierarchical data from a source.
When you use the Hive Warehouse Connector on Hortonworks HDP clusters, you must use an ORC format target. Data corruption might occur if the target does not use ORC format.
For more information, see Hortonworks documentation on supported target tables:
Apache Hive 3 tables.
When you use an external table that has compression properties set, the mapping executes using Spark SQL instead of HiveServer2. The mapping fails if the value of the compression property is not one of the following values: LZO, NONE, SNAPPY, ZLIB.
The property value is case sensitive and must use upper case.
When you use choose RETAIN as the target schema strategy, configure the property
hive.llap.daemon.num.enabled.executors
on the Hadoop cluster. Set the value of this property to the same value as
hive.llap.daemon.num.executors
.
When you import a mapping with an ACID-enabled source and target, the Summary Statistics view does not reflect any throughput statistics for the mapping job.