Table of Contents

Search

  1. Preface
  2. Introduction to Hadoop Integration
  3. Before You Begin
  4. Amazon EMR Integration Tasks
  5. Azure HDInsight Integration Tasks
  6. Cloudera CDH Integration Tasks
  7. Hortonworks HDP Integration Tasks
  8. MapR Integration Tasks
  9. Appendix A: Connections

Create a Spark Staging Directory

Create a Spark Staging Directory

When the Spark engine runs job, it stores temporary files in a staging directory.
Optionally, create a staging directory on HDFS for the Spark engine. For example:
hadoop fs -mkdir -p /spark/staging
If you want to write the logs to the Informatica Hadoop staging directory, you do not need to create a Spark staging directory. By default, the Data Integration Service uses the HDFS directory
/tmp/spark_<user name>
.
Grant permission to the following users:
  • Hadoop impersonation user
  • SPN of the Data Integration Service
  • Mapping impersonation users
Optionally, you can assign -777 permissions on the directory.

0 COMMENTS

We’d like to hear from you!