Table of Contents

Search

  1. Preface
  2. Understanding Domains
  3. Managing Your Account
  4. Using Informatica Administrator
  5. Using the Domain View
  6. Domain Management
  7. Nodes
  8. High Availability
  9. Connections
  10. Connection Properties
  11. Schedules
  12. Domain Object Export and Import
  13. License Management
  14. Monitoring
  15. Log Management
  16. Domain Reports
  17. Node Diagnostics
  18. Understanding Globalization
  19. Informatica Cloud Administration
  20. Appendix A: Code Pages
  21. Appendix B: Custom Roles
  22. Appendix C: Informatica Platform Connectivity
  23. Appendix D: Configure the Web Browser

Hive Pushdown Configuration

Hive Pushdown Configuration

The following table describes the connection properties that you configure for the Hive engine:
Property
Description
Environment SQL
SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan.
The following rules and guidelines apply to the usage of environment SQL:
  • Use the environment SQL to specify Hive queries.
  • Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. If you use Hive user-defined functions, you must copy the .jar files to the following directory:
    <Informatica installation directory>/services/shared/hadoop/<Hadoop distribution name>/extras/hive-auxjars
  • You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries.
  • If you use multiple values for the environment SQL, ensure that there is no space between the values.
Hive Warehouse Directory
Optional. The absolute HDFS file path of the default database for the warehouse that is local to the cluster.
If you do not configure the Hive warehouse directory, the Hive engine first tries to write to the directory specified in the cluster configuration property
hive.metastore.warehouse.dir
. If the cluster configuration does not have the property, the Hive engine writes to the default directory
/user/hive/warehouse
.
Engine Type
The engine that the Hadoop environment uses to run a mapping on the Hadoop cluster. You can choose MRv2 or Tez. You can select Tez if it is configured for Amazon EMR, Azure HDInsight, or Hortonworks HDP.
Advanced Properties
List of advanced properties that are unique to the Hive engine. The advanced properties include a list of default properties.
You can configure run-time properties for the Hadoop environment in the Data Integration Service, the Hadoop connection, and in the mapping. You can override a property configured at a high level by setting the value at a lower level. For example, if you configure a property in the Data Integration Service custom properties, you can override it in the Hadoop connection or in the mapping. The Data Integration Service processes property overrides based on the following priorities:
  1. Mapping custom properties set using
    infacmd ms runMapping
    with the
    -cp
    option
  2. Mapping run-time properties for the Hadoop environment
  3. Hadoop connection advanced properties for run-time engines
  4. Hadoop connection advanced general properties, environment variables, and classpaths
  5. Data Integration Service custom properties
Informatica does not recommend changing these property values before you consult with third-party documentation, Informatica documentation, or Informatica Global Customer Support. If you change a value without knowledge of the property, you might experience performance degradation or other unexpected results.

0 COMMENTS

We’d like to hear from you!