Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Create a Spark Staging Directory

Create a Spark Staging Directory

When the Spark engine runs a job, it stores temporary files in a staging directory.
Optionally, create a staging directory on HDFS for the Spark engine. For example:
hadoop fs -mkdir -p /spark/staging
If you want to write the logs to the Informatica Hadoop staging directory, you do not need to create a Spark staging directory. By default, the Data Integration Service uses the HDFS directory
/tmp/SPARK_<user name>
.
Grant permission to the following users:
  • Hadoop impersonation user
  • SPN of the Data Integration Service
  • Mapping impersonation users
Optionally, you can assign -777 permissions on the directory.
If you create a staging directory on a CDP Data Hub cluster, grant Access Control List (ACL) permissions for the staging directory to the Hive user and the impersonation user. To grant ACL permissions, run the following command on the CDP Data Hub cluster:
hadoop fs -setfacl -m user:user:rwx <staging directory>