Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Cluster Workflows
  8. Profiles
  9. Monitoring
  10. Hierarchical Data Processing
  11. Hierarchical Data Processing Configuration
  12. Hierarchical Data Processing with Schema Changes
  13. Intelligent Structure Models
  14. Stateful Computing
  15. Connections
  16. Data Type Reference
  17. Function Reference

User Guide

User Guide

Rules and Guidelines for Databricks Sources

Rules and Guidelines for Databricks Sources

File Sources

Consider the following general rules and guidelines for file sources:
  • The flat file source must reside on Microsoft Azure Blob Storage or Microsoft Azure Data Lake Store.
  • The row delimiter must be /n.
  • The file cannot be fixed width.
  • Multiple column delimiters are not supported.
  • To read multiline data, set the text qualifier to single or double quotes and enclose the data in the quoted qualifier.
  • Empty values only are treated as null values.

Null Processing

Consider the following rules and guidelines for null processing:
Unexpected values converted to nulls
The Databricks Spark engine generates null values for all fields in the same record if any field contains an unexpected value based on the following scenarios:
  • Any type mismatch occurs, such as passing string data to a numeric column.
  • Data is out of bounds, such as with bigint or int data types.
Consider using a Filter transformation to filter out null rows.
Date/time values converted to nulls
When the Databricks Spark engine reads date/time values, it uses the format configured in the Mapping properties for the run-time preferences of the Developer tool. If the date format read from the source does not match the format configured in the Developer tool, the Databricks Spark engine converts the date values to nulls. The default value in the Developer tool is MM/DD/YYYY HH24:MI:SS.

Double and Decimal Conversions

When the Databricks Spark engine reads from an Azure source, it converts double and decimal data types to scientific notation. When it converts that data back to a double or decimal to write to the target, it drops precision greater than 15 and maintains precision of 15 digits.

0 COMMENTS

We’d like to hear from you!