Table of Contents

Search

  1. Preface
  2. Introduction to Microsoft Azure Data Lake Storage Gen2 Connector
  3. Connections for Microsoft Azure Data Lake Storage Gen2
  4. Mappings for Microsoft Azure Data Lake Storage Gen2
  5. Migrating a mapping
  6. Data type reference
  7. Troubleshooting

Microsoft Azure Data Lake Storage Gen2 Connector

Microsoft Azure Data Lake Storage Gen2 Connector

Rules and guidelines for mappings

Rules and guidelines for mappings

Consider the following rules and guidelines for mappings:
Mappings
  • When you configure a mapping, ensure that you use only the special characters that are allowed in Microsoft Azure Data Lake Storage Gen2 in the directory name, file name, or object path.
    For more information about the special characters allowed in Microsoft Azure Data Lake Storage Gen2, see the Microsoft Azure documentation.
  • You cannot read and write primitive data types in nested and multi-line indented JSON files in mappings.
  • When a column name in the source starts with a number and you create a target at runtime, the corresponding target column is prefixed with an underscore character (_).
  • When you create a Microsoft Azure Data Lake Storage Gen2 target at runtime to write an Avro, ORC, or Parquet file, you cannot write null values with primitive data types.
  • When you use the JVM options to configure the HTTPS proxy server and read and write Avro, Parquet, and ORC files, the mapping fails with the following error:
    AzureADAuthenticator.getTokenCall threw java.io.IOException : Unable to tunnel through proxy. Proxy returns "HTTP/1.1 407 Proxy Authentication Required"
  • When you append data to a Microsoft Azure Data Lake Storage Gen2 target, you must map all the incoming fields to target fields.
  • When you create a Microsoft Azure Data Lake Storage Gen2 target at runtime and specify the path in the object name field, ensure that you specify the complete path including the file system name.
    For example,
    <FileSystem_Name>/<Directory1>/.../<DirectoryN>/<FileName>
  • When the data has more than one escape character and you do not select the
    Disable escape char when a qualifier is set
    option, all the escape characters are not written to the flat file target in Microsoft Azure Data Lake Storage Gen2.
    For example,
    "Ga\\lit",124 "Ga\\\"l",19
    is written as
    Ga\lit,124 "Ga\$"l",19
    .
  • You cannot read hierarchical data types in a mapping. You can use the Hierarchy Parser transformation to convert the hierarchical input into relational output.
    For more information about configuring a Hierarchy Parser transformation, see
    Transformations
    in the Data Integration documentation.
  • When you run a mapping to write a JSON file to a Microsoft Azure Data Lake Storage Gen2 target, the mapping writes the values of double data type in exponential format in the target.
  • When you run a mapping configured with a fixed partition of 8 to read from a parquet file of size 20 GB or more and write to a Microsoft Azure Data Lake Storage Gen2 target, the mapping fails with the following error:
    [ERROR] java.lang.OutOfMemoryError: Java heap space
  • If a mapping includes parameterized source and target,
    Allow parameter to be overridden at run time
    checkbox is selected, and the source object selected resides in a folder during the mapping task creation, the mapping fails with the following error:
    [ERROR] Exception: Exception occured in read phase, error: Exception while downloading file to local staging
  • When you use the Snappy compression format to write data to Microsoft Azure Data Lake Storage Gen2, the mapping retains a
    snappy-1.1.8****-libsnappyjava.so
    file in the temp directory on the agent machine after it runs successfully.
  • When you create a new target and select
    Handle Special Characters
    to append time stamp to the file name, the file name override is not honored in a mapping.
  • When you run a mapping to create a Microsoft Azure Data lake storage Gen2 target, use the append strategy, and enable
    Handle Special Characters
    , the file is either created or appended based on the timestamp you include in the file name.
  • When you parameterize the target connection and object and create a new target at runtime, the target is created in the root directory specified in the Connection Directory path even if you specify the complete path in the create target object name.
    To resolve this, you can specify an absolute or relative path in Directory Override in advanced target properties to create the target in the overriden path.
  • When you read
    and write
    complex files, set the JVM options for type DTM to increase the -Xms and -Xmx values in the system configuration details of the Secure Agent to avoid java heap space error. The recommended -Xms and -Xmx values are 512 MB and 1024 MB respectively.
  • When you read
    and write
    complex files, ensure that the file name does not contain percentage (%) or hash (#). Otherwise, the data preview and mapping fails at runtime.
Mappings in advanced mode
  • When you run a mapping in advanced mode to read data from a Microsoft Azure Data lake storage Gen2 source, use a parameter file to parameterize the source connection and object, and specify the directory and file override in the advanced properties, the mapping considers the values specified in the parameter file.
  • When you read data from and write data to Microsoft Azure Data Lake Storage Gen2, use the same storage account for both the source and target connections.
    If you want to use different storage accounts, use shared authentication for one account and service principal authentication for the other account. You cannot use the same authentication type for both the storage accounts.
  • When you read data from and write data to Microsoft Azure Data Lake Storage Gen2 and use the shared key authentication, ensure that you use the same access key for both the source and target connections.
    For example, if you use Key 1 as the access key for the source connection and Key 2 for the target connection, the mapping fails.
  • When you use the managed identity authentication, you cannot use system assigned identity.
  • When you use managed identity authentication, ensure that the storage account specified in the connection is not the same as the storage account specified in the staging location and log location for the Azure cluster.
  • You can read and write hierarchical data types for Avro, JSON, and Parquet files. You can also read hierarchical data types for ORC files.
  • When you set the qualifier mode to Minimal and use an escape character, the escape characters are not escaped and quoted in the target. To resolve this issue, set the qualifier mode to All.
  • When you set the qualifier mode to All and do not specify a value for the qualifier, \00 (Null) is considered as the qualifier.
  • You cannot add multiple pipelines in a mapping.
  • When you read from a complex file source of size 128 MB or more, the Secure Agent writes incorrect data and creates multiple target files without overriding the existing target.
  • You cannot read zero-byte files when you run mappings in advanced mode.
  • When you upload a schema file for the source and create a Microsoft Azure Data Lake Storage Gen2 target at runtime, ensure that source file is not empty.
  • When you append data to an existing target, you must configure any overrides in the advanced target properties, else the mapping fails.
  • When you append data to a target created at runtime and if a file with the same name exists in the target directory, the mapping fails.
    In this case, you must first overwrite the existing file and then append the data.
  • When a JSON file has a field with empty struct data, the Secure Agent ignores the field and reads the remaining fields during metadata read.
    For example, if the JSON file has the following data in the first row:
    {"id":123,"address":{}}
    , the
    address
    field is ignored and does not appear in the
    Fields
    tab. If the JSON file has values for the
    address
    field in the consecutive row, you can use the
    Data elements to sample
    property to fetch this field.
  • When you run a mapping in advanced mode and map the source fields of double or float data type to the target fields of string data type, the format of the values changes in the target.
    The following table describes the change in the format of the values in the target:
    Value in the source
    Value in the target
    1.7976931348623157e+308
    1.79769313486232e+308
    -9999999999999.99
    -10000000000000
    4.94065645841247e-324
    4.9e-324
    7956318123.99392483
    7956318123.99392

0 COMMENTS

We’d like to hear from you!