Table of Contents

Search

  1. Preface
  2. Introduction to Microsoft Azure Data Lake Storage Gen2 Connector
  3. Connections for Microsoft Azure Data Lake Storage Gen2
  4. Mappings for Microsoft Azure Data Lake Storage Gen2
  5. Migrating a mapping
  6. Data type reference
  7. Troubleshooting

Microsoft Azure Data Lake Storage Gen2 Connector

Microsoft Azure Data Lake Storage Gen2 Connector

File formatting options

File formatting options

Select the format of the Microsoft Azure Data Lake Storage Gen2 file and configure the formatting options.
The following table describes the formatting options for Avro, Parquet, JSON, ORC, and delimited flat files:
Property
Description
Schema Source
The schema of the source or target file.
Select one of the following options to specify a schema:
  • Read from data file. Imports the schema from a file in Microsoft Azure Data Lake Storage Gen2.
  • Import from schema file. Imports the schema from a schema definition file in the agent machine.
Schema File
The schema definition file in the agent machine from where you want to upload the schema.
You cannot upload a schema file when you create a target at runtime.
The following table describes the formatting options for flat files:
Property
Description
Flat File Type
The type of flat file.
Select one of the following options:
  • Delimited. Reads a flat file that contains column delimiters.
  • Fixed Width. Reads a flat file with fields that have a fixed length.
    You must select the file format in the
    Fixed Width File Format
    option.
    If you do not have a fixed-width file format, click
    New
    Components
    Fixed Width File Format
    to create one.
Delimiter
Character used to separate columns of data in a delimited flat file. You can set values as comma, tab, colon, semicolon, or others.
You cannot set a tab as a delimiter directly in the
Delimiter
field. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the
Delimiter
field.
EscapeChar
Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string data in a delimited flat file.
When you write data to Microsoft Azure Data Lake Storage Gen2 and specify a qualifier, by default, the qualifier is considered as the escape character. Else, the character specified as the escape character is considered.
Qualifier
Quote character that defines the boundaries of data in a delimited flat file. You can set qualifier as single quote or double quote.
Qualifier Mode
Specify the qualifier behavior when you write data to a delimited flat file.
You can select one of the following options:
  • Minimal. Default mode. Applies qualifier to data enclosed within a delimiter value or a special character.
  • All. Applies qualifier to all data.
  • Non_Numeric. Not applicable.
  • All_Non_Null. Not applicable.
Disable escape character when a qualifier is set
Applicable to a Microsoft Azure Data Lake Storage Gen2 target.
Select to disable the escape character when a qualifier is set.
When you disable the escape character, the special characters not escaped and are considered as part of the data written to the target.
Code Page
Select the code page that the Secure Agent must use to read or write data to a delimited flat file.
Select UTF-8 for mappings.
Select one of the following options for mappings in advanced mode:

    UTF-8

    MS Windows Latin 1

    Shift-JIS

    ISO 8859-15 Latin 9 (Western European)

    ISO 8859-3 Southeast European

    ISO 8859-5 Cyrillic

    ISO 8859-9 Latin 5 (Turkish)

    IBM EBCDIC International Latin-1

Header Line Number
Specify the line number that you want to use as the header when you read data from a delimited flat file.
Specify the value as 0 or 1.
To read data from a file with no header, specify the value as 0.
First Data Row
1
Specify the line number from where you want the Secure Agent to read data in a delimited flat file. You must enter a value that is greater or equal to one.
To read data from the header, the value of the
Header Line Number
and the
First Data Row
fields should be the same. Default is 1.
Target Header
Select whether you want to write data to a target that contains a header or without a header in the delimited flat file. You can select
With Header
or
Without Header
options.
This property is not applicable when you read data from a Microsoft Azure Data Lake Storage Gen2 source.
Distribution Column
Not applicable.
Max Rows To Preview
Not applicable.
Row Delimiter
Character used to separate rows of data. You can set values as
\r
,
\n
, and
\r\n
.
This property is not applicable when you read data from a Microsoft Azure Data Lake Storage Gen2 source.
1
Doesn't apply to mappings in advanced mode.
The following table describes the formatting options for JSON files:
Property
Description
Data elements to sample
1
Specify the number of rows to read to find the best match to populate the metadata.
Memory available to process data
1
The memory that the parser uses to read the JSON sample schema and process it.
The default value is 2 MB.
If the file size is more than 2 MB, you might encounter an error. Set the value to the file size that you want to read.
Read multiple-line JSON files
Not applicable.
1
Applies only to mappings in advanced mode.

0 COMMENTS

We’d like to hear from you!
Sam Greene - June 15, 2023

I am attempting to move data from CSV files stored in ADLS to Parquet files in ADLS. The CSV doesn't have a header line. 

When defining the source connection and using a flat file type, I am unsure what the schema file should be. Is there documentation concerning this file?  I cannot find any information in the knowledge base.  Thanks!

    Informatica Documentation Team - June 16, 2023

    Hi Sam Greene,

    To read data from a flat file that has no header, you need to set the Header Line Number to “0” and First Data Row as “1”. You can find this information in the “File formatting options” topic.  See instructions in the field properties “Header Line Number” and “First Data Row” in the table that lists the formatting options for flat files.

    Thanks,

    Informatica Documentation Team