Microsoft Azure Data Lake Storage Gen2 Connector

Back Next

Microsoft Azure Data Lake Storage Gen2 sources in mappings

In a mapping, you can configure a source transformation to represent a single Microsoft Azure Data Lake Storage Gen2 object.

The following table describes the Microsoft Azure Data Lake Storage Gen2 source properties that you can configure in a source transformation:

Property	Description
Connection	Name of the source connection. Select a source connection or click New Parameter to define a new parameter for the source connection. If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties. Ensure that the parameter file is in the correct format. When you switch between a non-parameterized and a parameterized Microsoft Azure Data Lake Storage Gen2 connection, the advanced property values are retained.
Source Type	Select Single Object or Parameter.
Object	Name of the source object. Ensure that the headers or file data does not contain special characters.
Parameter	Select an existing parameter for the source object or click New Parameter to define a new parameter for the source object. The Parameter property appears only if you select Parameter as the source type. When you parameterize the source object, specify the complete object path including the file system in the default value of the parameter. If you want to overwrite the parameter at runtime, select the Allow parameter to be overridden at run time option when you create a parameter. When the task runs, the agent uses the parameters from the file that you specify in the task advanced session properties. Ensure that the parameter file is in the correct format.
Format	Specifies the file format that the Microsoft Azure Data Lake Storage Gen2 Connector uses to read data from Microsoft Azure Data Lake Storage Gen2. You can select the following file format types: Flat Avro Parquet JSON ORC Discover Structure¹ Default is None . If you select None as the format type, Microsoft Azure Data Lake Storage Gen2 Connector reads data from Microsoft Azure Data Lake Storage Gen2 files in binary format. You cannot read a JSON file that exceeds 1 GB. Ensure that the source file is not empty. For more information, see File formatting options
Intelligent Structure Model¹	Applies to Discover Structure format type. Determines the underlying patterns in a sample file and auto-generates a model for files with the same data and structure. Select one of the following options to associate a model with the transformation: Select. Select an existing model. New. Create a new model. Select Design New to create the model. Select Auto-generate from sample file for Intelligent Structure Discovery to generate a model based on sample input that you select. Select one of the following options to validate the XML source object against an XML-based hierarchical schema: Source object doesn't require validation. Source object requires validation against a hierarchical schema. Select to validate the XML source object against an existing or a new hierarchical schema. When you create a mapping task, on the Runtime Options tab, you configure how Data Integration handles the schema mismatch. You can choose to skip the mismatched files and continue to run the task or stop the task when the task encounters the first file that does not match. For more information, see Components .
¹Applies only to mappings in advanced mode.

The following table describes the Microsoft Azure Data Lake Storage Gen2 source advance properties:

Property	Description
Concurrent Threads¹	Number of concurrent connections to extract data from the Microsoft Azure Data Lake Storage Gen2. When reading a large file or object, you can spawn multiple threads to process data. Configure Block Size to divide a large file into smaller parts. Default is 4. Maximum is 10.
Filesystem Name Override	Overrides the default file system name.
Source Type	Select the type of source from which you want to read data. You can select the following source types: File Directory Default is File.
Allow Wildcard Characters	Indicates whether you want to use wildcard characters for the directory source type. For more information, see Wildcard characters.
Directory Override	Microsoft Azure Data Lake Storage Gen2 directory that you use to read data. Default is root directory. The directory path specified at run time overrides the path specified while creating a connection. You can specify an absolute or a relative directory path: Absolute path - The Secure Agent searches this directory path in the specified file system. Example of absolute path: Dir1/Dir2 Relative path - The Secure Agent searches this directory path in the native directory path of the object. Example of relative path: /Dir1/Dir2 When you use the relative path, the imported object path is added to the file path used during the metadata fetch at runtime. Do not specify a root directory ( / ) to override the directory.
File Name Override	Source object. Select the file from which you want to read data. The file specified at run time overrides the file specified in Object.
Block Size¹	Applicable to flat file format. Divides a large file into smaller specified block size. When you read a large file, divide the file into smaller parts and configure concurrent connections to spawn the required number of threads to process data in parallel. Specify an integer value for the block size. Default value in bytes is 8388608.
Timeout Interval	Not applicable.
Recursive Directory Read	Indicates whether you want to read objects stored in subdirectories in mappings. For more information, see Reading files from subdirectories
Incremental File Load²	Indicates whether you want to incrementally load files when you use a directory as the source for mappings in advanced mode. When you incrementally load files, the mapping task reads and processes only files in the directory that have changed since the mapping task last ran. For more information, see Incrementally loading files.
Compression Format	Reads compressed data from the source. Select one of the following options: None. Select to read Avro, ORC, and Parquet files that use Snappy compression. The compressed files must have the .snappy extension. You cannot read compressed JSON files. Gzip. Select to read flat files and Parquet files that use Gzip compression. The compressed files must have the .gz extension. You cannot preview data for a compressed flat file.
Interim Directory¹	Optional. Applicable to flat files and JSON files. Path to the staging directory in the Secure Agent machine. Specify the staging directory where you want to stage the files when you read data from Microsoft Azure Data Lake Storage Gen2. Ensure that the directory has sufficient space and you have write permissions to the directory. Default staging directory is /tmp . You cannot specify an interim directory when you use the Hosted Agent.
Tracing Level	Sets the amount of detail that appears in the log file. You can choose terse, normal, verbose initialization or verbose data. Default is normal.
¹Doesn't apply to mappings in advanced mode. ²Applies only to mappings in advanced mode.