Table of Contents

Search

  1. Preface
  2. Introduction to Databricks Delta Connector
  3. Connections for Databricks Delta
  4. Mappings for Databricks Delta
  5. Migrating a mapping
  6. Databricks Delta SQL ELT optimization
  7. Data type reference

Databricks Delta Connector

Databricks Delta Connector

Source properties for Databricks Delta

Source properties for Databricks Delta

In a mapping, you can configure a Source transformation to represent a Databricks Delta object.
The following table describes the Databricks Delta source properties that you can configure in a Source transformation:
Property
Description
Connection
Name of the source connection. Select a source connection or click
New Parameter
to define a new parameter for the source connection.
You can completely paramaterize a parameter file for a source connection only for a single object source type.
Parameterization doesn't apply to mappings in advanced mode.
Source Type
Type of the source object. Select any of the following source objects:
  • Single Object
  • Multiple Objects
    1
    . You can only use advanced relationships with multiple objects.
  • Query.
  • Parameter
    1
    . Select
    Parameter
    to define the source type when you configure the task.
Multiple objects and query source types don't apply to Databricks cluster.
Multi-object database override will override the database for all imported objects, while the table override will only override the first table of the multi-object source.
Object
Name of the source object.
You cannot use the data preview option if the source fields contain hierarchical data types.
Query
1
Click on
Define Query
and enter a valid custom query.
The
Query
property appears only if you select
Query
as the source type.
You can parameterize a custom query object at runtime in a mapping.
You can also enable unity catalog settings in a custom query to access a table within a particular catalog.
1
Doesn't apply to mappings in advanced mode.
The following table describes the Databricks Delta query options that you can configure in a Source transformation:
Property
Description
Query Options
Filters the source data based on the conditions you specify. Click
Configure
to configure a filter option.
The Filter option filters records and reduces the number of rows that the Secure Agent reads from the source. Add conditions in a read operation to filter records from the source. You can specify the following filter conditions:
  • Not parameterized. Use a basic filter to specify the object, field, operator, and value to select specific records.
  • Completely parameterized
    *
    . Use a parameter to specify the filter query.
  • Advanced. Use an advanced filter to define a complex filter condition.
You can use Contains, Ends With, and Starts With operators to filter records only on SQL endpoints.
Filter
Filters records based on the filter condition.
You can specify a simple filter or an advanced filter.
The following table describes the Databricks Delta source advanced properties that you can configure in a Source transformation:
Property
Description
Database Name
Overrides the database name provided in connection and the database name provided during metadata import.
To read from multiple objects ensure that you have specified the database name in the connection properties.
Table Name
Overrides the table name used in the metadata import with the table name that you specify.
Pre SQL
The pre-SQL command to run on the Databricks Delta source table before the agent reads the data.
Doesn't apply to Databricks cluster.
For example, if you want to update records in the database before you read the records from the table, specify a pre-SQL statement.
The query must include a fully qualified table name. You can specify multiple pre-SQL commands, each separated with a semicolon.
Post SQL
The post-SQL command to run on the Databricks Delta table after the agent completes the read operation.
Doesn't apply to Databricks cluster.
For example, if you want to delete some records after the latest records are loaded, specify a post-SQL statement.
The query must include a fully qualified table name. You can specify multiple post-SQL commands, each separated with a semicolon.
SQL Override
Overrides the default SQL query used to read data from Databricks Delta custom query source.
The column names in the SQL override query should match with the column names in the custom query in a SQL transformation.
The metadata of the source should be the same as SQL override to override the query.
Staging Location
Relative directory path to store the staging files.
  • If the Databricks cluster is deployed on AWS, use the path relative to the Amazon S3 staging bucket.
  • If the Databricks cluster is deployed on Azure, use the path relative to the Azure Data Lake Store Gen2 staging filesystem name.
When you use the unity catalog, a pre-existing location on user's cloud storage must be provided in the Staging Location.
Job Timeout
Maximum time in seconds that is taken by the Spark job to complete processing.
Doesn't apply to SQL warehouse.
If the job is not completed within the time specified, the Databricks cluster terminates the job and the mapping fails.
If the job timeout is not specified, the mapping shows success or failure based on the job completion.
Job Status Poll Interval
Poll interval in seconds at which the Secure Agent checks the status of the job completion.
Doesn't apply to SQL warehouse.
Default is 30 seconds.
DB REST API Timeout
The Maximum time in seconds for which the Secure Agent retries the REST API calls to Databricks when there is an error due to network connection or if the REST endpoint returns
5xx HTTP
error code.
Doesn't apply to SQL warehouse.
Default is 10 minutes.
DB REST API Retry Interval
The time Interval in seconds at which the Secure Agent must retry the REST API call, when there is an error due to network connection or when the REST endpoint returns
5xx HTTP
error code.
Doesn't apply to SQL warehouse.
This value does not apply to the Job status REST API. Use job status poll interval value for the Job status REST API.
Default is 30 seconds.
Tracing Level
Sets the amount of detail that appears in the log file. You can choose terse, normal, verbose initialization, or verbose data.
Default is normal.
Advanced source properties are not applicable to mappings in advanced mode.
Only pre-SQL and post-SQL advanced properties are applicable for custom queries.

0 COMMENTS

We’d like to hear from you!