Create a Connection to Apache Spark
You can use Apache Spark as a data source with Privitar Data Security Platform.
To connect to Apache Spark, you must:
Meet the Apache Spark Connection Prerequisites
Note
Most of the settings for the Spark Thrift server are the same as those for HiveServer2. To learn more, see https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html
Before you connect to Apache Spark, you must:
- Have a system user that is able to authenticate to Apache Spark using a username and password and has read access to the relevant databases and tables 
- Have access to the SSL certificate used to encrypt the connection (or the relevant certificate authority certificates) 
If your Secure Sockets Layer (SSL) source uses privately-signed server certificates, you must modify the truststore of your data plane in order to trust the server certificates as follows:
- Obtain the SSL certificate from the data source. 
- Convert the SSL certificate to a JKS truststore. 
- Copy the truststore into the - shared/truststores/location of your data plane configuration mounted volume (the volume used to store JDBC drivers).- Note- You will need to refer to this truststore when configuring the SSL JDBC properties. By default, the truststore is mounted on - /config/shared/truststores/truststore.jks.- The mounted volume's directory structure should look similar to the following: - ├─shared/ | └── jdbc-drivers/ | └── hive-42.2.23.jar | └── truststores/ | └── truststore.jks ├─data-agent | └── EMPTY ├── data-proxy | └── EMPTY 
- Download the JDBC JAR driver that you will use to connect to the data source. 
- Place the JDBC JAR driver into the - shared/jdbc-drivers/location of your data plane configuration mounted volume (the volume used to store JDBC drivers).
For example, the SSL settings for Spark might look like the following:
jdbc:hive2://ip-172-31-26-172.eu-west-2.compute.internal:10000/default;ssl=true;sslTrustStore=/config/shared/truststores/truststore.jks;trustStorePassword=changeit
Build an Apache Spark Connection String
Note
Most of the settings for the Spark Thrift server are the same as those for HiveServer2. To learn more, see https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html
The following is an example of a complete Apache Spark connection string:
jdbc:hive2://localhost:10000/database1
Note
The Spark Thrift server uses the same JDBC driver as HiveServer2.
To build an Apache Spark connection string, follow this example. Note that it has the following segments:
jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars>
If you have configured to use SSL in the previous section, the SSL settings for Spark might look like the following:
jdbc:hive2://ip-172-31-26-172.eu-west-2.compute.internal:10000/default;ssl=true;sslTrustStore=/config/shared/truststores/truststore.jks;trustStorePassword=changeit
The following table includes a description of each segment.
| String Segment | Description | 
|---|---|
| 
 | The Spark server hosting node. Required. | 
| 
 | The port that the Spark server listens to. Required. | 
| 
 | The name of the Hive database. Required. | 
| 
 | Key-value pairs for the JDBC driver in the format  | 
| 
 | Key-value pairs for Hive in the format  | 
| 
 | Key-value pairs for Hive variables in the format  | 
Authenticate to Apache Spark
The Privitar Data Security Platform currently supports username/password authentication for Apache Spark.
Enter the system user's Apache Spark credentials in the Username and Password fields on the platform's Connections page.