Data Profiling
- Data Profiling
- All Products
Property
| Description
|
---|---|
Retain Athena Query Result On S3 File
| Specifies whether you want to retain the Amazon Athena query result on the Amazon S3 file. Select the check box to retain the Amazon Athena query result on the Amazon S3 file.
The Amazon Athena query result in stored in the CSV file format.
By default, the
Retain Athena Query Result on S3 File check box is not selected.
|
S3OutputLocation
| Specifies the location of the Amazon S3 file that stores the result of the Amazon Athena query.
You can also specify the Amazon S3 file location in the
S3OutputLocation parameter in the
JDBC URL connection property.
If you specify the Amazon S3 output location in both the connection and the advanced source properties, the Secure Agent uses the Amazon S3 output location specified in the advanced source properties.
|
Fetch Size
| Determines the number of rows to read in one result set from Amazon Athena.
Default is 10000.
|
Encryption Type
| Encrypts the data in the Amazon S3 staging directory.
You can select the following encryption types:
Default is None.
|
Schema Name
| Overrides the schema name of the source object.
|
Source Table Name
| Overrides the table name used in the metadata import with the table name that you specify.
|
SQL Query
| Overrides the default SQL query.
Enclose column names in double quotes. The SQL query is case sensitive. Specify an SQL statement supported by the Amazon Athena database.
When you specify the columns in the SQL query, ensure that the column name in the query matches the source column name in the mapping.
|
Option
| Description
|
---|---|
Source Type
| Type of the source from which you want to read data.
File .
For more information about the source type, see Source types in Amazon S3 V2 sources.
|
Folder Path
| Optional. Overwrites the bucket name or folder path of the Amazon S3 source file.If applicable, include the folder name that contains the source file in the
<bucket_name>/<folder_name> format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the
/<folder_name> format, the folder path appends with the folder path that you specified in the connection properties.
/<dir2> folder path in this property and
<my_bucket1>/<dir1> folder path in the connection property, the folder path appends with the folder path that you specified in the connection properties in
<my_bucket1>/<dir1>/<dir2>
format.
|
File Name
| Optional. Overwrites the Amazon S3 source file name.
|
Allow Wildcard Characters
| Use the ? and * wildcard characters to specify the folder path or file name if you run a mapping in advanced mode to read data from an Avro, flat, JSON, ORC, or Parquet file.
|
Enable Recursive Read
| Use the recursive read option for flat, Avro, JSON, ORC, and Parquet files. The files that you read using recursive read must have the same metadata. Enable recursive read when you specify wildcard characters in a folder path or file name. To enable recursive read, select the source type as Directory.
|
Incremental File Load
| Incrementally load source files in a directory to read and process only the files that have changed since the last time the mapping task ran.
|
Staging Directory
| Optional. Path of the local staging directory. Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file. Default staging directory is the /temp directory on the machine that hosts the Secure Agent.
When you specify the directory path, the Secure Agent create folders depending on the number of partitions that you specify in the following format: InfaS3Staging<00/11><timestamp>_<partition number> where, 00 represents read operation and 11 represents write operation.
For example,
InfaS3Staging000703115851268912800_0 The temporary files are created within the new directory. The staging directory in the source property does not apply to an advanced cluster. However, you must specify a staging directory on Amazon S3 in the advanced configuration.
For more information, see Administrator.
|
Hadoop Performance Tuning Options
| Optional. This property is not applicable for Amazon S3 V2 Connector.
|
Compression Format
| Decompresses data when you read data from Amazon S3.
Amazon S3 V2 Connector does not support the Lzo and Bzip2 compression format even though the option appears in this property.
For more information about the compression format, see
Data compression in Amazon S3 V2 sources and targets.
|
Download Part Size
| Downloads the part size of an Amazon S3 object in bytes.
|
Multipart Download Threshold
| Minimum threshold size to download an Amazon S3 object in multiple parts.
|
Temporary Credential Duration
| The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
|
Option
| Description
|
---|---|
Concurrent Threads
| Optional. Number of concurrent connections to load data from the Microsoft Azure Data Lake Storage Gen2. When writing a large file, you can spawn multiple threads to process data. Configure
Block Size to divide a large file into smaller parts. Default is 4. Maximum is 10.
|
Filesystem Name Override
| Optional. Overrides the default file name.
|
Source Type
| Type of the source from which you want to read data.
File .
For more information about the source type, see
Directory Source in Microsoft Azure Data Lake Storage Gen2 Sources.
|
Allow Wildcard Characters
| Use the ? and * wildcard characters to specify the folder path or file name if you run a mapping in advanced mode to read data from an Avro, flat, JSON, ORC, or Parquet file.
|
Directory Override
| Optional. Microsoft Azure Data Lake Storage Gen2 directory that you use to write data. Default is root directory. The Secure Agent creates the directory if it does not exist. The directory path specified at run time overrides the path specified while creating a connection.
|
File Name Override
| Optional. Target object. Select the file from which you want to write data. The file specified at run time overrides the file specified in Object.
|
Block Size
| Optional. Divides a large file or object into smaller parts each of specified block size. When writing a large file, consider dividing the file into smaller parts and configure concurrent connections to spawn required number of threads to process data in parallel. Default is 8 MB.
|
Compression Format
| Optional. Compresses and writes data to the target. Select
Gzip to write flat files.
|
Timeout Interval
| Optional. The number of seconds to wait when attempting to connect to the server. A timeout will occur if the connection cannot be established in the specified amount of time.
|
Interim Directory
| Optional. Path to the staging directory in the Secure Agent machine.
Specify the staging directory where you want to stage the files when you read data from Microsoft Azure Data Lake Store. Ensure that the directory has sufficient space and you have write permissions to the directory.
Default staging directory is /tmp .
You cannot specify an interim directory for an advanced cluster.
|
Incremental File Load
| Incrementally load source files in a directory to read and process only the files that have changed since the last time the mapping task ran.
|
Enable Recursive Read
| Use the recursive read option for flat, Avro, JSON, ORC, and Parquet files. The files that you read using recursive read must have the same metadata. Enable recursive read when you specify wildcard characters in a folder path or file name. To enable recursive read, select the source type as Directory.
|
Option
| Description
|
---|---|
Google Cloud Storage Path
| Overrides the Google Cloud Storage path that you specified in the connection. This property is required when the source is not a flat file. Use the following format:
gs://<bucket name> or gs://<bucket name>/<folder name>
|
Source File Name
| Optional. Overrides the Google Cloud Storage source file name that you specified in the Source transformation.
Does not apply when you configure Is Directory option to read multiple files from a directory.
|
Is Directory
| Select this property to read all the files available in the folder specified in the Google Cloud Storage Path property.
|
Encryption Type
| Method to decrypt data.
You can select one of the following encryption types:
|
Option
| Description
|
---|---|
Database
| Overrides the database specified in the connection.
|
Schema
| Overrides the schema specified in the connection.
|
Warehouse
| Overrides the Snowflake warehouse name specified in the connection.
|
Role
| Overrides the Snowflake role assigned to user specified in the connection.
|
Table Name
| Overrides the table name of the imported Snowflake Data Cloud source table.
|
Option
| Description
|
---|---|
S3 Bucket Name
| Amazon S3 bucket name for staging the data. You can also specify the bucket name with the folder path.
|
Enable Compression
| Compresses the staging files into the Amazon S3 staging directory. The task performance improves when the Secure Agent compresses the staging files. Default is selected.
|
Staging Directory Location
| Location of the local staging directory. When you run a task in Secure Agent runtime environment, specify a directory path that is available on the corresponding Secure Agent machine in the runtime environment.Specify the directory path in the following manner:
<staging directory>
For example,
C:\Temp . Ensure that you have the write permissions on the directory.
Does not apply to an advanced cluster.
|
Temporary Credential Duration
| The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds. Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration up to a maximum of 12 hours in the AWS console and then enter the same time duration in this property.
|
Encryption Type
| Encrypts the data in the Amazon S3 staging directory. You can select the following encryption types:
Default is None.
|
Download S3 Files in Multiple Parts
| Downloads large Amazon S3 objects in multiple parts. When the file size of an Amazon S3 object is greater than 8 MB, you can choose to download the object in multiple parts in parallel. Default is 5 MB. Does not apply to an advanced cluster.
|
Multipart Download Threshold Size
| The maximum threshold size to download an Amazon S3 object in multiple parts. Default is 5 MB. Does not apply to an advanced cluster.
|
Schema Name
| Overrides the default schema name.
You cannot configure a custom query when you use the schema name.
|
Source Table Name
| Overrides the default source table name.
When you select the source type as Multiple Objects or Query , you cannot use the Source Table Name option.
|
Option
| Description
|
---|---|
Database Name
| Overrides the database name provided in connection and the database name provided during metadata import.
To read from multiple objects. ensure that you have specified the database name in the connection properties.
|
Table Name
| Overrides the table name used in the metadata import with the table name that you specify.
|
SQL Override
| Overrides the default SQL query used to read data from a Databricks Delta custom query source.
The column names in the SQL override query should match with the column names in the custom query in a SQL transformation.
The metadata of the source should be the same as SQL override to override the query.
You can use the option when you run the profiling task on a Data Integration Server.
|
Staging Location
| Relative directory path to the staging file storage.
When you use the unity catalog, a pre-existing location on the user's cloud storage must be provided in the Staging Location. The Staging Location is not required for the Unity Catalog when you run the profiling task on a Data Integration Server.
|
Job Timeout
| Maximum time in seconds that is taken by the Spark job to complete processing. If the job is not completed within the time specified, the Databricks cluster terminates the job and the mapping fails.
If the job timeout is not specified, the mapping shows success or failure based on the job completion.
|
Job Status Poll Interval
| Poll interval in seconds at which the Secure Agent checks the status of the job completion.
Default is 30 seconds.
|
DB REST API Timeout
| The Maximum time in seconds for which the Secure Agent retries the REST API calls to Databricks when there is an error due to network connection or if the REST endpoint returns
5xx HTTP error code.
Default is 10 minutes.
|
DB REST API Retry Interval
| The time Interval in seconds at which the Secure Agent must retry the REST API call, when there is an error due to network connection or when the REST endpoint returns
5xx HTTP error code.
This value does not apply to the Job status REST API. Use job status poll interval value for the Job status REST API.
Default is 30 seconds.
|
Option
| Description
|
---|---|
Azure Blob Container Name
| Microsoft Azure Blob Storage container name. Required if you select Azure Blob storage in the connection properties.
|
ADLS FileSystem Name
| The name of the file system in Microsoft Azure Data Lake Storage Gen2.
Required if you select ADLS Gen2 storage in the connection properties. You can also provide the path of the directory under given file system.
|
Schema Name Override
| Overrides the schema specified in the connection.
|
Table Name Override
| Overrides the table name of the imported Microsoft Azure Synapse SQL source table.
|
Field Delimiter
| Character used to separate fields in the file. Default is
0x1e . You can specify 'TAB' or 0-256 single-char printable and non-printable ASCII characters. Non-printable characters must be specified in hexadecimal.
|
Number of concurrent connections to Blob Store
| Number of concurrent connections to extract data from the Microsoft Azure Blob Storage. When reading a large-size blob, you can spawn multiple threads to process data. Configure
Blob Part Size to partition a large-size blob into smaller parts. Default is 4. Maximum is 10.
|
Blob Part Size
| Partitions a blob into smaller parts each of specified part size. When reading a large-size blob, consider partitioning the blob into smaller parts and configure concurrent connections to spawn required number of threads to process data in parallel. Default is 8 MB.
|
Quote Character
| The Secure Agent skips the specified character when you read data from Microsoft Azure Synapse SQL. Default is
0x1f .
|
Interim Directory
| Optional. Path to the staging directory in the Secure Agent machine.Specify the staging directory where you want to stage the files when you read data from Microsoft Azure Synapse SQL. Ensure that the directory has sufficient space and you have write permissions to the directory.
Default staging directory is /tmp.
You cannot specify an interim directory for an advanced cluster.
|
Option
| Description
|
---|---|
Pre SQL
| The SQL query that the Secure Agent runs before reading data from the source.
|
Post SQL 1
| The SQL query that the Secure Agent runs after reading data from the source.
|
Fetch Size
| The number of rows that the Secure Agent fetches from the database in a single call.
|
Table Name
| Overrides the table name used in the metadata import with the table name that you specify.
|
Schema Name
| Overrides the schema name of the source object.
If you specify the schema name both in the connection and the source properties, the Secure Agent uses the schema name specified in the source properties.
|
SQL Override
| The SQL statement to override the default query and the object name that is used to read data from the JDBC V2 source.
|
1 Doesn't apply to mappings in advanced mode.
|
Option
| Description
|
---|---|
Folder Path
| Overrides the folder path value in the Oracle Cloud Object Storage connection.
|
File Name
| Overrides the Oracle Cloud Object Storage source file name.
|
Staging Directory
| Path of the local staging directory. Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file. Default staging directory is the
/temp directory on the machine that hosts the Secure Agent. The temporary files are created within the new directory.
|
Multipart Download Threshold
| Minimum threshold size to download an Oracle Cloud Object Storage object in multiple parts. To download the object in multiple parts in parallel, ensure that the file size of an Oracle Cloud Object Storage object is greater than the value you specify in this property.
Range :
|
Download Part Size
| Downloads the part size of an Oracle Cloud Object Storage object in bytes.
Range:
|
Option
| Description
|
---|---|
Packet size in MB
| Size of the HTTP packet that SAP sends to the Secure Agent. The unit is MB. Default is 10 MB.
|
Package size in ABAP in rows
| Number of rows that are read and buffered in SAP at a time. Default is 1000 rows.
|
Enable Compression
| When selected, the ABAP program compresses the data in the gzip format before it sends the data to the Secure Agent. If the Secure Agent and the SAP system are not on the same network, you might want to enable the compression option to optimize performance. Default is not selected.
|
Option
| Description
|
---|---|
Number of rows to be fetched
| The number of rows that are randomly retrieved from the SAP Table. Default value of zero retrieves all the rows in the table.
|
Number of rows to be skipped
| The number of rows to be skipped.
|
Packet size in MB
| Packet size. Default is 10 MB.
|
Data extraction mode
| You can use one of the following modes to read data from an SAP Table:
For more information about the data extraction mode, see the
Data Extraction mode section in the
Performance Tuning Guidelines for SAP Table Reader Connector How-To Library article.
|
Enable Compression
| Enables compression.
If the Secure Agent and the SAP System are not located in the same network, you may want to enable the compression option to optimize performance.
|
Update Mode
| When you read data from SAP tables, you can configure a mapping to perform delta extraction. You can use one of the following options based on the update mode that you want to use:
For more information about the update mode, see the
Update modes for delta extraction section in the SAP connector help.
|
Parameter Name for Update Mode
| The parameter name that you defined for update mode in the parameter file.
|
Override Table Name for Delta Extraction
| Overrides the SAP table name with the SAP structure name from which you want to extract delta records that are captured with the structure name in the CDPOS table.
|