Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Managing Distribution Packages
  5. Appendix B: Connections Reference

Configure the Hive Warehouse Connector and Hive LLAP

Configure the Hive Warehouse Connector and Hive LLAP

Optionally, you can configure the Hive Warehouse Connector and Hive LLAP to improve performance when you read from and write to Hive targets.
Perform this task in the following situations:
  • You are integrating for the first time.
  • You upgraded from any previous version.
The Hive Warehouse Connector reads from and writes to Hive tables without using temporary staging tables that require additional storage overhead. Use the Hive Warehouse Connector on the Spark engine to allow Spark code to interact with Hive targets and to use ACID-enabled Hive tables. When you enable the Hive Warehouse Connector, mappings use Hive LLAP to run Hive queries rather than HiveServer2.
Before you enable the Hive Warehouse Connector, enable Hive LLAP on the Hadoop cluster. To enable the connector, configure the following properties in the Spark advanced properties for the Hadoop connection:
infaspark.useHiveWarehouseAPI
Enables the Hive Warehouse Connector. Set to TRUE.
For example,
infaspark.useHiveWarehouseAPI=true
.
spark.datasource.hive.warehouse.load.staging.dir
Directory for the temporary HDFS files used for batch writes to Hive. Required when you enable the Hive Warehouse Connector.
For example, set to
/tmp
spark.datasource.hive.warehouse.metastoreUri
URI for the Hive metastore. Required when you enable the Hive Warehouse Connector. Use the value for
hive.metastore.uris
from the hive_site_xml cluster configuration properties.
For example, set the value to
thrift://mycluster-1.com:9083
.
spark.hadoop.hive.llap.daemon.service.hosts
Application name for the LLAP service. Required when you enable the Hive Warehouse Connector. Use the value for
hive.llap.daemon.service.hosts
from the hive_site_xml cluster configuration properties.
spark.hadoop.hive.zookeeper.quorum
Zookeeper hosts used by Hive LLAP. Required when you enable the Hive Warehouse Connector. Use the value for
hive.zookeeper.quorum
from the hive_site_xml cluster configuration properties.
spark.sql.hive.hiveserver2.jdbc.url
URL for HiveServer2 Interactive. Required to use the Hive Warehouse Connector. Use the value in Ambari for HiveServer2 JDBC URL.

0 COMMENTS

We’d like to hear from you!