Consider the following general rules and guidelines for file sources:
The flat file source must reside on one of the following storage resources:
For the Azure platform: Microsoft Azure Blob Storage, Microsoft Azure Data Lake Store (ADLS) Gen1 or Gen2
For the AWS platform: S3
The row delimiter must be /n.
The file cannot be fixed width.
Multiple column delimiters are not supported.
To read multiline data, set the text qualifier to single or double quotes and enclose the data in the quoted qualifier.
Empty values only are treated as null values.
When the path to the physical data object source includes spaces in the name of a file or directory, the path that is rendered in the source column of the output has the characters
for spaces. For example,
is rendered as
. This behavior happens for flat file and complex files.
Delta Lake Sources
Consider the following guidelines for using Delta Lake sources:
Mappings that access Delta Lake tables must use the Databricks run-time engine. If you run a Delta Lake mapping in the native environment with the JDBC connection, the mapping succeeds, but no data is written to the target.
Consider the following rules and guidelines for null processing:
Unexpected values converted to nulls
The Databricks Spark engine generates null values for all fields in the same record if any field contains an unexpected value based on the following scenarios:
Any type mismatch occurs, such as passing string data to a numeric column.
Data is out of bounds, such as with bigint or int data types.
Consider using a Filter transformation to filter out null rows.
Date/time values converted to nulls
When the Databricks Spark engine reads date/time values, it uses the format YYYY-MM-DD HH24:MM:SS.US. If the date format read from the source does not match this format, the Databricks Spark engine converts the date values to nulls.
Double and Decimal Conversions
When the Databricks Spark engine reads from an Azure or AWS source, it converts double and decimal data types to scientific notation. When it converts that data back to a double or decimal to write to the target, it drops precision greater than 15 and maintains precision of 15 digits.