Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Create a Spark Staging Directory

Create a Spark Staging Directory

When the Spark engine runs a job, it stores temporary files in a staging directory.
Optionally, create a staging directory on HDFS for the Spark engine. For example:
hadoop fs -mkdir -p /spark/staging
If you want to write the logs to the Informatica Hadoop staging directory, you do not need to create a Spark staging directory. By default, the Data Integration Service uses the HDFS directory
/tmp/SPARK_<user name>
.
Grant permission to the following users:
  • Hadoop impersonation user
  • SPN of the Data Integration Service
  • Mapping impersonation users
Optionally, you can assign -777 permissions on the directory.