Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Rules and Guidelines for Databricks Targets

Rules and Guidelines for Databricks Targets

File Targets

Consider the following general rules and guidelines for file targets:
  • The file targets must reside on Microsoft Azure Blob Storage or Microsoft Azure Data Lake Store.
  • The row delimiter must be /n.
  • The file cannot be fixed width.
  • Multiple column delimiters are not supported.
  • Empty values only are treated as null values.

Delta Lake Targets

Consider the following guidelines for using Delta Lake targets:
  • Mappings that access Delta Lake tables must use the Databricks run-time engine. If you run a Delta Lake mapping in the native environment with the JDBC connection, the mapping succeeds, but no data is written to the target.

Null Processing

Consider the following rules and guidelines for null processing:
Null value conversions
When the Databricks Spark engine writes to a target, it converts null values to empty strings (" "). For example, 12, AB,"",23p09udj.
Consider the null processing behavior based on the following Databricks targets:
  • File target. The Databricks Spark engine writes all null values as empty strings.
  • Azure SQL Data Warehouse. The Databricks Spark engine can write the empty strings to string columns, but when it tries to write an empty string to a non-string column, the mapping fails with a type mismatch.
To allow the Databricks Spark engine to convert the empty strings back to null values and write to the target, configure the following advanced property in the Databricks Spark connection: infaspark.flatfile.writer.nullValue=true
Unexpected values converted to nulls
The Databricks Spark engine generates null values for all fields in the same record if any field contains an unexpected value based on the following scenarios:
  • Any type mismatch occurs, such as passing string data to a numeric column.
  • Data is out of bounds, such as with bigint or int data types.
Consider using a Filter transformation to filter out null rows.
Date/time values converted to nulls
When the Databricks Spark engine writes date/time values, it uses the format YYY-MM-DD HH24:MM:SS.US. If the date format that passes to the target does not match this format, the Databricks Spark engine writes null values.

Double and Decimal Conversions

When the Databricks Spark engine reads from an Azure or AWS source, it converts double and decimal data types to scientific notation. When it converts that data back to a double or decimal to write to the target, it drops precision greater than 15 and maintains precision of 15 digits.


Updated September 28, 2020