Table of Contents

Search

  1. Preface
  2. Connectors and connections
  3. Connection configuration
  4. Connection properties
  5. Swagger file generation for REST V2 connections

Connections

Connections

Databricks Delta connection properties

Databricks Delta connection properties

When you set up a Databricks Delta connection, configure the connection properties.
The following table describes the Databricks Delta connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Type
The Databricks Delta connection type.
Runtime Environment
Name of the runtime environment where you want to run the tasks.
You can specify a Secure agent, Hosted Agent, or serverless runtime environment.
Hosted Agent is not applicable for mappings in advanced mode.
You cannot run an application ingestion, database ingestion, or streaming ingestion task on a Hosted Agent or serverless runtime environment.
Databricks Host
The host name of the endpoint the Databricks account belongs to.
Use the following syntax:
jdbc:spark://
<Databricks Host>
:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<Org Id>/<Cluster ID>;AuthMech=3;UID=token;PWD=<personal-access-token>
You can get the URL from the Advanced Options of JDBC or ODBC in the Databricks Delta analytics cluster or all purpose cluster.
The value of PWD in Databricks Host, Org Id, and Cluster ID is always
<personal-access-token>
.
Cluster ID
The ID of the Databricks analytics cluster.
You can get the cluster ID from the JDBC URL.
Use the following syntax:
jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<Org Id>/
<Cluster ID>
;AuthMech=3;UID=token;PWD=<personal-access-token>
Organization Id
The unique organization ID for the workspace in Databricks.
Use the following syntax:
jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/
<Org Id>
/<Cluster ID>;AuthMech=3;UID=token;PWD=<personal-access-token>
Databricks Token
Personal access token to access Databricks.
Ensure that you have permissions to attach to the cluster identified in the
Cluster ID
property.
For mappings, you must have additional permissions to create data engineering clusters.
SQL Endpoint JDBC URL
Databricks SQL endpoint JDBC connection URL.
Use the following syntax:
jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>;
For application ingestion and database ingestion tasks, begin the URL with the prefix jdbc:databricks://, as follows:
jdbc:databricks://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>;
This field is required to connect to the Databricks SQL endpoint.
For Data Integration, this field is required to connect to the Databricks SQL endpoint. Ensure that you set the required environment variables in the Secure Agent.
The Databricks Host, Organization ID, and Cluster ID properties are not considered if you configure the SQL Endpoint JDBC URL property.
For more information on Databricks Delta SQL endpoint, contact Informatica Global Customer Support.
Database
The database in Databricks Delta that you want to connect to.
For Data Integration, by default, all databases available in the workspace are listed.
JDBC Driver Class Name
The name of the JDBC driver class.
Specify the driver class name as
com.simba.spark.jdbc.Driver
.
For application ingestion and database ingestion tasks, specify the driver class name as:
com.databricks.client.jdbc.Driver
Cluster Environment
The cloud provider where the Databricks cluster is deployed.
Choose from the following options:
  • AWS
  • Azure
Default is AWS.
The connection attributes depend on the cluster environment you select. For more information, see the AWS cluster properties and Azure cluster properties sections.
Min Workers
1
The minimum number of worker nodes to be used for the Spark job.
Mandatory for mappings and minimum value is 1.
Max Workers
1
The maximum number of worker nodes to be used for the Spark job.
If you don't want to autoscale, set Max Workers = Min Workers or don't set Max Workers.
DB Runtime Version
1
The Databricks runtime version.
Select 7.3 LTS from the list.
Worker Node Type
1
The worker node instance type that is used to run the Spark job.
For example, the worker node type for AWS can be i3.2xlarge. The worker node type for Azure can be Standard_DS3_v2.
Driver Node Type
1
The driver node instance type that is used to collect data from the Spark workers.
For example, the driver node type for AWS can be i3.2xlarge. The driver node type for Azure can be Standard_DS3_v2.
If you don't specify the driver node type, Databricks uses the value you specify in the worker node type field.
Instance Pool ID
1
The instance pool ID used for the Spark cluster.
If you specify the Instance Pool ID
to run mappings
, the following connection properties are ignored:
  • Driver Node Type
  • EBS Volume Count
  • EBS Volume Type
  • EBS Volume Size
  • Enable Elastic Disk
  • Worker Node Type
  • Zone ID
Enable Elastic Disk
1
Enables the cluster to get additional disk space.
Enable this option if the Spark workers are running low on disk space.
Spark Configuration
1
The Spark configuration to use in the Databricks cluster.
The configuration must be in the following format:
"key1"="value1";"key2"="value2";....
For example:
"spark.executor.userClassPathFirst"="False"
Spark Environment Variables
1
The environment variables to export before launching the Spark driver and workers.
The variables must be in the following format:
"key1"="value1";"key2"="value2";....
For example:
"MY_ENVIRONMENT_VARIABLE"="true"
1
Doesn't apply to mappings in advanced mode.
The following properties are required to launch the job cluster at run time for a
mapping
task:
  • Min Workers
  • Max Workers
  • DB Runtime Version
  • Worker Node Type
  • Driver Node Type
  • Enable Elastic Disk
  • Spark Configuration
  • Spark Environment Variables
  • Zone ID
  • EBS Volume Type
  • EBS Volume Count
  • EBS Volume Size

0 COMMENTS

We’d like to hear from you!