Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Streaming
  3. Data Engineering Streaming Administration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Window Transformation
  8. Appendix A: Connections
  9. Appendix B: Monitoring REST API Reference
  10. Appendix C: Sample Files

Mapping Configurations for Databricks

Mapping Configurations for Databricks

To configure a streaming mapping for Databricks, configure the connection for the mapping.
The following image shows the validation and execution environments for a Databricks mapping:
Databricks validation environment and execution environment.
When you configure the mapping, configure the following properties:
Validation environment
The environment that validates the mapping. Select Databricks in the validation environment and select the Databricks engine. The Data Integration Service pushes the mapping logic to the Databricks engine.
Execution environment
The environment that runs the mapping. Select Databricks as the execution environment.
Databricks
Connection: The connection to the Databricks Spark engine used for pushdown of processing. Select
Connection
and browse for a connection or select a connection parameter.
In the case of a mapping failure, to enable the mapping to start reading data from the time of failure, configure the
infaspark.checkpoint.directory
property. For example:
infaspark.checkpoint.directory <directory>
. The directory you specify is created within the directory you specify in the
State Store
property.
Source configuration
Specify the following properties to configure how the data is processed:
  • Maximum Rows Read. Specify the maximum number of rows that are read before the mapping stops running. Default is
    Read All Rows
    .
  • Maximum Runtime Interval. Specify the maximum time to run the mapping before it stops. If you set values for this property and the
    Maximum Rows Read
    property, the mapping stops running after one of the criteria is met. Default is
    Run Indefinitely
    . A value of
    Run Indefinitely
    enables the mapping to run without stopping.
  • State Store. Specify the DBFS location on the cluster to store information about the state of the Databricks Job. Default is
    <Home Directory>/stateStore
    You can configure the state store as part of the configuration of execution options for the Data Integration Service.
You can use these properties to test the mapping.
Streaming properties
Specify the batch interval streaming properties. The batch interval is number of seconds after which a batch is submitted for processing. Based on the batch interval, the Spark engine processes the streaming data from sources and publishes the data in batches.