Table of Contents

Search

  1. Preface
  2. Understanding Domains
  3. Managing Your Account
  4. Using Informatica Administrator
  5. Using the Domain View
  6. Domain Management
  7. Nodes
  8. High Availability
  9. Connections
  10. Connection Properties
  11. Schedules
  12. Domain Object Export and Import
  13. License Management
  14. Monitoring
  15. Log Management
  16. Domain Reports
  17. Node Diagnostics
  18. Understanding Globalization
  19. Appendix A: Code Pages
  20. Appendix B: Custom Roles
  21. Appendix C: Informatica Platform Connectivity
  22. Appendix D: Configure the Web Browser

Administrator Guide

Administrator Guide

Spark Configuration

Spark Configuration

The following table describes the connection properties that you configure for the Spark engine:
Property
Description
Spark Staging Directory
The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs. The YARN user, Data Integration Service user, and mapping impersonation user must have write permission on this directory.
If you do not specify a file path, by default, the temporary files are written to the Hadoop staging directory
/tmp/SPARK_<user name>
.
When you run Sqoop jobs on the Spark engine, the Data Integration Service creates a Sqoop staging directory within the Spark staging directory to store temporary files:
<Spark staging directory>/sqoop_staging
Spark Event Log Directory
Optional. The HDFS file path of the directory that the Spark engine uses to log events.
YARN Queue Name
The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster. The name is case sensitive.
Advanced Properties
List of advanced properties that are unique to the Spark engine. The advanced properties include a list of default properties.
You can configure run-time properties for the Hadoop environment in the Data Integration Service, the Hadoop connection, and in the mapping. You can override a property configured at a high level by setting the value at a lower level. For example, if you configure a property in the Data Integration Service custom properties, you can override it in the Hadoop connection or in the mapping. The Data Integration Service processes property overrides based on the following priorities:
  1. Mapping custom properties set using
    infacmd ms runMapping
    with the
    -cp
    option
  2. Mapping run-time properties for the Hadoop environment
  3. Hadoop connection advanced properties for run-time engines
  4. Hadoop connection advanced general properties, environment variables, and classpaths
  5. Data Integration Service custom properties
When a mapping uses Hive Server 2 to run a job or parts of a job, you cannot override properties that are configured on the cluster level in preSQL or post-SQL queries or SQL override statements.
Workaround: Instead of attempting to use the cluster configuration on the domain to override cluster properties, pass the override settings to the JDBC URL. For example:
beeline -u "jdbc:hive2://<domain host>:<port_number>/tpch_text_100" --hiveconf hive.execution.engine=tez
Informatica does not recommend changing these property values before you consult with third-party documentation, Informatica documentation, or Informatica Global Customer Support. If you change a value without knowledge of the property, you might experience performance degradation or other unexpected results.

0 COMMENTS

We’d like to hear from you!