Data Discovery Guide

10.5.1
- 10.5.7
- 10.5.6
- 10.5.2
- 10.5
- 10.4.0

Back Next

Data Domain Discovery on the Databricks Cluster

Use the Databricks cluster to perform data discovery on the Spark engine. The Databricks cluster is a environment to run the spark jobs. You can run a profile to perform data discovery for the Azure sources using the Databricks cluster.

You need to perform the following steps to connect to the Azure sources in the Databricks cluster:

Prerequisite

Add the following Advanced Spark configuration parameters for the Databricks cluster and restart the cluster:

fs.azure.account.auth.type OAuth

fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider

fs.azure.account.oauth2.client.id <your-service-client-id>

fs.azure.account.oauth2.client.secret <your-service-client-secret-key>

fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<directory-ID-of-Azure-AD>/oauth2/token

spark.hadoop.fs.azure.account.key.<<ACCOUNT_NAME>>. dfs.core.windows.net <<VALUE>>

Download and Copy the JAR files for the Profiling Warehouse

Get the Oracle DataDirect JDBC driver JAR files for the profiling warehouse. You can copy the files from the following location:

<INFA_HOME>/services/shared/jars/thirdparty/com.informatica.datadirect-dworacle-6.0.0_F.jar

Place the Oracle DataDirect JDBC driver JAR files in the following locations:

<INFA_HOME>/connectors/thirdparty/informatica.jdbc_v2/spark

<INFA_HOME>/connectors/thirdparty/informatica.jdbc_v2/common

<INFA_HOME>/services/shared/hadoop/<DataBricksversion>/runtimeLib

Download and Copy the JAR files for the JBDC Delta Objects

Get the JDBC .jar files for JDBC delta objects. You can download the files from the database vendor website.

Place the .jar files at the Developer tool location

\clients\externaljdbcjars

to access the metadata.

Restart the Developer tool.

Configure Custom Properties on the Data Integration Service

Launch Informatica Administrator, and then select the

Data Integration Service

in the

Domain Navigator

Click the

Custom Properties

option on the

Properties

tab.

Set the following custom property to perform automatic installation of the Informatica libraries into the Databricks cluster:

ExecutionContextOptions.databricks.enable.infa.libs.autoinstall:true

Recycle the Data Integration Service.

Supported sources for data domain discovery on the Databricks Cluster

JDBC Delta.

Azure Data Lake Store Gen2.

Data Domain Discovery Concepts

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal

0 COMMENTS

We’d like to hear from you! Log in to comment.

Rename Saved Search

Table of Contents

Data Discovery Guide

Data Discovery Guide

Data Domain Discovery on the Databricks Cluster

Data Domain Discovery on the Databricks Cluster

Prerequisite

Download and Copy the JAR files for the Profiling Warehouse

Download and Copy the JAR files for the JBDC Delta Objects

Configure Custom Properties on the Data Integration Service

Supported sources for data domain discovery on the Databricks Cluster