Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Ask INFA.

Table of Contents

Search

  1. Preface
  2. Introduction to Databricks Connector
  3. Connections for Databricks
  4. Mappings for Databricks
  5. Migrating a mapping
  6. SQL ELT with Databricks Connector
  7. Data type reference
  8. Troubleshooting

Databricks Connector

Databricks Connector

Rules and guidelines for SQL ELT optimization

Rules and guidelines for
SQL ELT optimization

Use the following rules and guidelines when you enable a mapping for
SQL ELT optimization
to a Databricks database:

General rules and guidelines

Use the following rules and guidelines when you configure
SQL ELT optimization
in a mapping:
  • When you use the IN() function with OR, AND, or NOT operator in an IIF() function, the IIF() function returns incorrect results.
  • When you configure mapping to read from an Amazon S3 or a Microsoft Azure Data Lake Storage Gen2 source, you cannot use Databricks connection in the Lookup transformation in the mapping.

Mapping with Databricks source and target

Use the following rules and guidelines when you configure
SQL ELT optimization
in a mapping that reads from and writes to Databricks:
  • When you provide the database name in the connection, the following rules and guidelines apply:
    • For full
      SQL ELT optimization
      , temporary staging tables or views are created in the database name provided in the target connection attribute
      database
      . If the attribute is empty the database name is picked from JDBC URL of the target connection.
    • For Source
      SQL ELT optimization
      , temporary staging tables or views are created in the database name provided in the source connection attribute
      database
      . If the attribute is empty the database name is picked from JDBC URL of source connection.
    • The database name provided in the connection should have read and write permission. If the
      database name
      field is empty, then the database name provided in the JDBC URL will take precedence and should have read and write permission.
    The guidelines are only applicable to custom query support, SQL override, and
    SQL ELT optimization
    in Databricks using flat file formats.
  • You cannot use the Data Driven target operation with the Update Mode advanced target property.
  • LAST function is a non-deterministic function. This function returns different results each time it is called, even when you provide the same input values.
  • When you configure a Filter transformation or specify a filter condition, do not specify special characters.
  • When you configure a mapping enabled for full
    SQL ELT optimization
    to read from multiple sources and you override the database name and table name from the advanced properties, the mapping fails.
  • To configure a Filter transformation or specify a filter condition on columns of date or timestamp in a Databricks table, you must pass the data through the TO_DATE() function as an expression in the filter condition.
  • When you configure a mapping enabled for full
    SQL ELT optimization
    on the Databricks SQL engine and push the data to the Databricks target, ensure that you map all the fields in target. Else, the mapping fails.
  • When you create a new target at runtime, you must not specify a database name and table name in the
    Target Database Name
    and
    Target Table Name
    in the target advanced properties.
  • When you read data from a column of Date data type and write data into a column of Date data type, the SQL ELT query pushes the column of Date data type and casts the column to Timestamp data type.
  • You cannot completely parameterize a multi-line custom query using a parameter file. If you specify a multi-line custom query in a parameter file, the mapping considers only the first line of the multi-line query.
  • When you push the CRC32() funtion to Databricks, the data type of the return value is either Bigint or String.
  • When you push the DATE_DIFF() function to Databricks, the function returns the integral part of the value and not the fractional part.
  • When you push the GREATEST() function to Databricks and configure input value arguments of String data type, you must not specify the caseFlag argument.
  • To push the TO_CHAR(DATE) function to Databricks, use the following string and format arguments:
    • YYYY
    • YY
    • MM
    • MON
    • MONTH
    • DD
    • DDD
    • DY
    • DAY
    • HH12
    • HH24
    • MI
    • Q
    • SS
    • SS.MS
    • SS.US
    • SS.NS
  • To push the TO_DATE(string, format) function to Databricks, you must use the following format arguments:
    • YYYY
    • YY
    • MM
    • MON
    • MONTH
    • DD
    • DDD
    • HH12
    • HH24
    • MI
    • SS
    • SS.MS
    • SS.US
    • SS.NS
  • When you enable full
    SQL ELT optimization
    in a mapping and use the IFF() condition in an Expression transformation, the mapping fails for the following functions:
    • IS_SPACES
    • IS_NUMBER
    • IS_DATE
  • A mapping enabled with full
    SQL ELT optimization
    and contains an SQL transformation fails when the column names in the SQL override query don't match with the column names in the custom query.

Mapping with Amazon S3 source and Databricks target

Use the following rules and guidelines when you configure
SQL ELT optimization
in a mapping that reads from an Amazon S3 source and writes to a Databricks target:
  • When you use an unconnected lookup and run a Databricks mapping enabled for SQL ELT optimization, the mapping fails if the IIF() function returns a Float value in the lookup query.
  • When you select the source type as directory in the advanced source properties, ensure that all the files in the directory contain the same schema.
  • When you select query as the source type in lookup, you cannot override the database name and table name in the advanced source properties.
  • When you include a source transformation in a mapping enabled with
    SQL ELT optimization
    , exclude the FileName field from the source. The FileName field is not applicable.
  • When you parameterize a lookup object in a mapping enabled with
    SQL ELT optimization
    , the mapping fails as you cannot exclude the filename port at runtime.
  • When you parameterize the source object in a mapping task, ensure that you pass the source object parameter value with the fully qualified path in the parameter file.
  • You cannot use wildcard characters for the source file name and directory name in the source transformation.
  • You cannot use wildcard characters for the folder path or file name in the advanced source properties.
  • You cannot configure dynamic lookup cache.
  • When you use a Joiner transformation in a mapping enabled with
    SQL ELT optimization
    and create a new target at runtime, ensure that the fields do not have a not null constraint.
  • Ensure that the field names in Parquet, ORC, AVRO, or JSON files do not contain Unicode characters.

Mapping with Azure Data Lake Storage Gen2 source and Databricks target

Use the following rules and guidelines when you configure
SQL ELT optimization
in a mapping that reads from a Azure Data Lake Storage Gen2 source and writes to a Databricks target:
  • Mappings fail if the lookup object contains unsupported data types.
  • When you select the source type as directory in the advanced source property, ensure that all the files in the directory contain the same schema.
  • When you select query as the source type in lookup, you cannot override the database name and table name in the advanced source properties.
  • When you include a source transformation in a mapping enabled with
    SQL ELT optimization
    , exclude the FileName field from the source. The FileName field is not applicable.
  • When you parameterize a lookup object in a mapping enabled with
    SQL ELT optimization
    , the mapping fails as you cannot exclude the filename port at runtime.
  • When you parameterize the source object in a mapping task, ensure that you pass the source object parameter value with the fully qualified path in the parameter file.
  • You cannot use wildcard characters for the source file name and directory name in the source transformation.
  • You cannot configure dynamic lookup cache.
  • When you use a Joiner transformation in a mapping enabled with
    SQL ELT optimization
    and create a new target at runtime, ensure that the fields do not have a not null constraint.
  • Ensure that the field names in Parquet, ORC, AVRO, or JSON files do not contain Unicode characters.

Cross workspace mappings

When you set up a mapping enabled with full
SQL ELT optimization
to access data from a Databricks workspace, and the associated metastore resides in a separate workspace, the mapping runs without
SQL ELT optimization
.

0 COMMENTS

We’d like to hear from you!