Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Create a Sqoop Staging Directory

Create a Sqoop Staging Directory

When you run Sqoop jobs on the Spark engine, the Data Integration Service creates a Sqoop staging directory named
sqoop_staging
within the Spark staging directory by default. You can configure the Spark staging directory that you want to use in the Hadoop connection.
However, based on your processing requirements, you might need to create the directory manually and give write permissions to the Hive super user. When you create the
sqoop_staging
directory manually, the Data Integration Service uses this directory instead of creating another one.
Create a Sqoop staging directory named
sqoop_staging
manually in the following situations:
  • You run a Sqoop pass-through mapping on the Spark engine to read data from a Sqoop source and write data to a Hive target that uses the Text format.
  • You use a Cloudera CDH cluster with Sentry authorization, a Cloudera CDP cluster with Ranger authorization, or a Hortonworks HDP cluster with Ranger authorization.
After you create the
sqoop_staging
directory, you must add an Access Control List (ACL) for the
sqoop_staging
directory and grant write permissions to the Hive super user. Run the following command on the Cloudera CDH cluster or the Hortonworks HDP cluster to add an ACL for the
sqoop_staging
directory and grant write permissions to the Hive super user:
hdfs dfs -setfacl -m default:user:hive:rwx /
<Spark staging directory>
/
sqoop_staging
/
For information about Sentry authorization, see the Cloudera documentation. For information about Ranger authorization, see the Hortonworks documentation.