Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Microsoft Azure Blob Storage
  3. PowerExchange for Microsoft Azure Blob Storage Configuration
  4. Microsoft Azure Blob Storage Connections
  5. Microsoft Azure Blob Storage Data Objects
  6. Microsoft Azure Blob Storage Mappings
  7. Data Type Reference

PowerExchange for Microsoft Azure Blob Storage User Guide

PowerExchange for Microsoft Azure Blob Storage User Guide

Rules and Guidelines for Data Types

Rules and Guidelines for Data Types

Some data types for complex files might be applicable only when you use specific Hadoop and Databricks distributions.
Before you read from or write to complex files, consider certain rules and guidelines for the data types.

Decimal data type

To process Decimal data types with precision up to 38 digits on the Data Integration Service, set the
EnableSDKDecimal38
custom property to
true
for the Data Integration Service.

Avro

To process Date, Decimal, and Timestamp data types from Avro files in mappings that run on the Data Integration Service or on the Spark engine in the Cloudera CDP distribution, ensure that both the Hadoop Distribution Directory property in the developerCore.ini file and the INFA_PARSER_HOME environment variable for the Data Integration Service are set to the same Cloudera CDP distribution.

JSON

You can read and write complex file objects in JSON format only on the Spark engine.

Parquet

The following rules apply for Parquet files:
  • When you import a Parquet file, the format of the schema for the String data type differs based on the distribution. For Cloudera CDP, the schema for String appears as UTF8, while for other distributions, it appears as STRING.
    For example, when you use Cloudera CDP, the schema for String appears as:
    optional binary c_name (UTF8);
    In other distributions, String appears as:
    optional binary c_name (STRING);
    To resolve this issue, ensure that both the Hadoop Distribution Directory property in the developerCore.ini file and the INFA_PARSER_HOME environment variable for the Data Integration Service are set to the same distribution.
  • Consider the following guidelines for Date and Time data types:
    • The Data Integration Service and Spark engine in the Azure HDInsight HDI, Databricks, and Cloudera CDP distributions can process Date, Time, and Timestamp data types till microseconds.
    • When the Data Integration Service reads the Date data type that does not have a time value, it adds the time value, based on the time zone, to the date in the target.
      For example, if the source contains the Date value
      1980-01-09 00:00:00
      , the following incorrect Time value is generated in the target:
      1980-01-09 05:30:00
    • When the Data Integration Service reads the Time data type, it writes incorrect date values to the target. For example, if the source contains the Time value
      1980-01-09 06:56:01.365235000
      , the following incorrect Date value is generated in the target:
      1899-12-31 06:56:01.365235000

0 COMMENTS

We’d like to hear from you!