To configure a streaming mapping for Databricks, configure the connection for the mapping.
The following image shows the validation and execution environments for a Databricks mapping:
When you configure the mapping, configure the following properties:
The environment that validates the mapping. Select Databricks in the validation environment and select the Databricks engine. The Data Integration Service pushes the mapping logic to the Databricks engine.
The environment that runs the mapping. Select Databricks as the execution environment.
Connection: The connection to the Databricks Spark engine used for pushdown of processing. Select
and browse for a connection or select a connection parameter.
In the case of a mapping failure, to enable the mapping to start reading data from the time of failure, configure the
property. For example:
. The directory you specify is created within the directory you specify in the
Specify the following properties to configure how the data is processed:
Maximum Rows Read. Specify the maximum number of rows that are read before the mapping stops running. Default is
Read All Rows
Maximum Runtime Interval. Specify the maximum time to run the mapping before it stops. If you set values for this property and the
Maximum Rows Read
property, the mapping stops running after one of the criteria is met. Default is
. A value of
enables the mapping to run without stopping.
State Store. Specify the DBFS location on the cluster to store information about the state of the Databricks Job. Default is
You can configure the state store as part of the configuration of execution options for the Data Integration Service.
You can use these properties to test the mapping.
Specify the batch interval streaming properties. The batch interval is number of seconds after which a batch is submitted for processing. Based on the batch interval, the Spark engine processes the streaming data from sources and publishes the data in batches.