Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Configure Glue as the Hive Metastore

Configure Glue as the Hive Metastore

You can configure Amazon Glue as the Hive metastore with an Amazon EMR 5.29 or 6.1 cluster.
Consider the following rules and guidelines:
  • Glue does not support Hive transactions.
  • Amazon supports Glue only when the EMR cluster is not Kerberos enabled.
To enable integration with an EMR cluster with Glue, copy .jar files from the cluster to the domain, and then enable the Hive metastore setting in the hive-site.xml configuration before you create the cluster configuration.
  1. Copy .jar files from the cluster to the domain.
    1. Depending on the cluster version, copy the Hive .jar file from the cluster to the domain.
      Copy the file from the following directory of the Glue-enabled EMR cluster:
      /usr/lib/spark/jars/
      Paste the file in the following domain directory on the domain machine:
      <Informatica installation directory>/services/shared/spark/lib_spark_2.4.3_hadoop_2.7.0
      . Overwrite the existing Hive .jar file in the directory.
      • For EMR 5.29, copy the following file: hive-exec-1.2.1-spark2-amzn-1.jar
      • For EMR 6.1, copy the following file: hive-exec-3.1.2-amzn-2.jar
    2. Depending on the cluster version, copy the Glue datacatalog file from the cluster to the domain.
      Copy the file from the following directory of the Glue-enabled EMR cluster:
      /usr/share/aws/hmclient/lib
      .
      Paste the file in the following domain directory on the domain machine:
      <Informatica installation directory>/services/shared/spark/lib_spark_2.4.3_hadoop_2.7.0
      .
      • For EMR 5.29, copy the following file: aws-glue-datacatalog-spark-client-1.11.0.jar
      • For EMR 6.1, copy the following file: aws-glue-datacatalog-spark-client-3.0.0.jar
  2. When the property hive.metastore.uris is not present in hive-site.xml, add the hive.metastore.uris property with the following value:
    thrift://<Hive host name>:<port>
    Edit the hive-site.xml file in the cluster configuration .zip archive:
    1. Locate the cluster configuration .zip archive file. For more information about preparing for cluster configuration import, see the Amazon EMR chapter in the
      Data Engineering Integration Guide.
    2. Edit the hive-site.xml file in the archive to add the hive.metastore.uris property-value pair.
    After you save the changes to the hive-site.xml property, use the cluster configuration .zip archive to create the cluster configuration and Hadoop connection.
For more information about Amazon Glue, see the following Amazon documentation:
For information about Informatica support for Amazon Glue, see the Product Availability Matrix at https://network.informatica.com/community/informatica-network/product-availability-matrices.

0 COMMENTS

We’d like to hear from you!