Table of Contents

Search

  1. Preface
  2. Connectors and Connections
  3. Mass Ingestion connectors
  4. Mass Ingestion connection properties

Connectors and Connections

Connectors and Connections

Advanced settings

Advanced settings

The following table describes the advanced connection properties:
Property
Description
Database
The database name that you want to connect to in Databricks Delta.
Optional for SQL warehouse and Databricks cluster.
For Data Integration, if you do not provide a database name, all databases available in the workspace are listed. The value you provide here overrides the database name provided in the
SQL Warehouse JDBC URL
connection property.
JDBC Driver Class Name
The name of the JDBC driver class.
Optional for SQL warehouse and Databricks cluster.
For JDBC URL versions 2.6.22 or earlier, specify the driver class name as
com.simba.spark.jdbc.Driver
.
For JDBC URL versions 2.6.25 or later, specify the driver class name as
com.databricks.client.jdbc.Driver
.
Staging Environment
The cloud provider where the Databricks cluster is deployed.
Required for SQL warehouse and Databricks cluster.
Select one of the following options:
  • AWS
  • Azure
  • Personal Staging Location
Default is Personal Staging Location.
You can select the Personal Staging Location as the staging environment instead of Azure or AWS staging environments to stage data locally for mappings and tasks.
If you select Personal Staging Location for a connection that Mass Ingestion uses, the Parquet data files for application ingestion or database ingestion jobs can be staged to a local personal storage location, which has a data retention period of 7 days. You must also specify a Database Host value. If you use Unity Catalog, note that a personal storage location is automatically provisioned.
Personal staging location doesn't apply to Databricks cluster.
You cannot use personal staging location with Databricks Delta unmanaged tables.
You cannot switch between clusters once you establish a connection.
Databricks Host
The host name of the endpoint the Databricks account belongs to.
Required for Databricks cluster. Doesn't apply to SQL warehouse.
You can get the Databicks Host from the JDBC URL. The URL is available in the Advanced Options of JDBC or ODBC in the Databricks Delta all-purpose cluster.
The following example shows the Databicks Host in JDBC URL:
jdbc:spark://
<Databricks Host>
:443/ default;transportMode=http; ssl=1;httpPath=sql/ protocolv1/o/<Org Id>/<Cluster ID>; AuthMech=3; UID=token; PWD=<personal-access-token>
The value of PWD in Databricks Host, Organization Id, and Cluster ID is always
<personal-access-token>
.
Cluster ID
The ID of the cluster.
Required for Databricks cluster. Doesn't apply to SQL warehouse.
You can get the cluster ID from the JDBC URL. The URL is available in the Advanced Options of JDBC or ODBC in the Databricks Delta all-purpose cluster
The following example shows the Cluster ID in JDBC URL:
jdbc:spark://<Databricks Host>:443/ default;transportMode=http; ssl=1;httpPath=sql/ protocolv1/o/<Org Id>/
<Cluster ID>
; AuthMech=3;UID=token; PWD=<personal-access-token>
Organization ID
The unique organization ID for the workspace in Databricks.
Required for Databricks cluster. Doesn't apply to SQL warehouse.
You can get the Organization ID from the JDBC URL. The URL is available in the Advanced Options of JDBC or ODBC in the Databricks Delta all-purpose cluster
The following example shows the Organization ID in JDBC URL:
jdbc:spark://<Databricks Host>:443/ default;transportMode=http; ssl=1;httpPath=sql/ protocolv1/o/
<Organization ID>
/ <Cluster ID>;AuthMech=3;UID=token; PWD=<personal-access-token>
Min Workers
The minimum number of worker nodes to be used for the Spark job. Minimum value is 1.
Required for Databricks cluster. Doesn't apply to SQL warehouse.
Max Workers
The maximum number of worker nodes to be used for the Spark job. If you don't want to autoscale, set Max Workers = Min Workers or don't set Max Workers.
Optional for Databricks cluster. Doesn't apply to SQL warehouse.
DB Runtime Version
The version of Databricks cluster to spawn when you connect to Databricks cluster to process mappings.
Required for Databricks cluster. Doesn't apply to SQL warehouse.
Select the Databricks runtime version 9.1 LTS or 13.3 LTS.
Worker Node Type
The worker node instance type that is used to run the Spark job.
Required for Databricks cluster. Doesn't apply to SQL warehouse.
For example, the worker node type for AWS can be i3.2xlarge. The worker node type for Azure can be Standard_DS3_v2.
Driver Node Type
The driver node instance type that is used to collect data from the Spark workers.
Optional for Databricks cluster. Doesn't apply to SQL warehouse.
For example, the driver node type for AWS can be i3.2xlarge. The driver node type for Azure can be Standard_DS3_v2.
If you don't specify the driver node type, Databricks uses the value you specify in the worker node type field.
Instance Pool ID
The instance pool ID used for the Spark cluster.
Optional for Databricks cluster. Doesn't apply to SQL warehouse.
If you specify the Instance Pool ID
to run mappings
, the following connection properties are ignored:
  • Driver Node Type
  • EBS Volume Count
  • EBS Volume Type
  • EBS Volume Size
  • Enable Elastic Disk
  • Worker Node Type
  • Zone ID
Elastic Disk
Enables the cluster to get additional disk space.
Optional for Databricks cluster. Doesn't apply to SQL warehouse.
Enable this option if the Spark workers are running low on disk space.
Spark Configuration
Doesn't apply to a data loader task or to Mass Ingestion tasks.
Spark Environment Variables
Doesn't apply to a data loader task or to Mass Ingestion tasks.

0 COMMENTS

We’d like to hear from you!