Big Data Release Notes

Big Data Release Notes

Sqoop Mappings on the Spark Engine

Sqoop Mappings on the Spark Engine

When you run Sqoop jobs on the Spark engine, the Data Integration Service creates a Sqoop staging directory named
sqoop_staging
within the Spark staging directory by default. You configure the Spark staging directory in the Hadoop connection.
Based on your processing requirements, you might need to create the directory manually. When you create the
sqoop_staging
directory manually, the Data Integration Service uses the directory that you create and does not create another one.
Create a Sqoop staging directory named
sqoop_staging
manually when the following cases apply:
  • You run a Sqoop pass-through mapping on the Spark engine to read data from a Sqoop source and write data to a Hive target that uses the Text format.
  • You use a Cloudera CDH cluster with Sentry authorization or a Hortonworks HDP cluster with Ranger authorization.
After you create the
sqoop_staging
directory, you must add an Access Control List (ACL) for the
sqoop_staging
directory and grant write permissions on the directory to the Hive super user.
If you use a Cloudera CDH cluster or a Hortonworks HDP cluster, run the following command on the cluster to add an ACL for the sqoop_staging directory and grant write permissions to the Hive super user:
hdfs dfs -setfacl -m default:user:hive:rwx /
<Spark staging directory>
/
sqoop_staging
/
For information about Sentry authorization, see the Cloudera documentation. For information about Ranger authorization, see the Hortonworks documentation.
If you do not define a Spark staging directory on the Hadoop connection, create the Spark staging directory at the following location:
/tmp/SPARK_<impersonation_user_name>/sqoop_staging
.

0 COMMENTS

We’d like to hear from you!