Big Data Release Notes

10.2.1

Back Next

Sqoop Mappings on the Spark Engine

When you run Sqoop jobs on the Spark engine, the Data Integration Service creates a Sqoop staging directory named

sqoop_staging

within the Spark staging directory by default. You configure the Spark staging directory in the Hadoop connection.

Based on your processing requirements, you might need to create the directory manually. When you create the

sqoop_staging

directory manually, the Data Integration Service uses the directory that you create and does not create another one.

Create a Sqoop staging directory named

sqoop_staging

manually when the following cases apply:

You run a Sqoop pass-through mapping on the Spark engine to read data from a Sqoop source and write data to a Hive target that uses the Text format.

You use a Cloudera CDH cluster with Sentry authorization or a Hortonworks HDP cluster with Ranger authorization.

After you create the

sqoop_staging

directory, you must add an Access Control List (ACL) for the

sqoop_staging

directory and grant write permissions on the directory to the Hive super user.

If you use a Cloudera CDH cluster or a Hortonworks HDP cluster, run the following command on the cluster to add an ACL for the sqoop_staging directory and grant write permissions to the Hive super user:

hdfs dfs -setfacl -m default:user:hive:rwx /

sqoop_staging

For information about Sentry authorization, see the Cloudera documentation. For information about Ranger authorization, see the Hortonworks documentation.

If you do not define a Spark staging directory on the Hadoop connection, create the Spark staging directory at the following location:

Rename Saved Search

Table of Contents

Big Data Release Notes

Big Data Release Notes

Sqoop Mappings on the Spark Engine

Sqoop Mappings on the Spark Engine