Table of Contents

Search

  1. Preface
  2. Data Profiling
  3. Profiles
  4. Profile results
  5. Tuning data profiling task performance
  6. Troubleshooting

Data Profiling

Data Profiling

Formatting Options

Formatting Options

You can optionally configure the formatting options if you choose a file as a source object.
Flat File
You can run a profile on delimited flat files with multi-byte characters and fixed-width flat files.
The following table lists the options that you can configure for a delimited flat file:
Option
Description
Delimiter
Indicates the boundary between two columns of data.
Choose one of the following options:
  • Comma
  • Tab
  • Colon
  • Semicolon
  • Non Printable. When you choose this option, the
    Non-printable character
    drop-down list appears. Select a non-printable character to use as the delimiter.
  • Other. Select this option and specify the character to use as the delimiter.
    • If you specify a comma, colon, or semicolon, the corresponding options are selected.
    • If the character specified here matches with any of the values in the
      Non-printable character
      drop-down list, the value appears in the
      Non-printable character
      drop-down list.
If you use an escape character or a quote character as the delimiter, or if you use the same character as consecutive delimiter and qualifier, you might receive unexpected results.
Default is comma.
Text Qualifier
Character that defines the boundaries of text strings.
If you select a quote character,
Data Profiling
ignores delimiters within quotes.
Default is double quote (").
Escape Character
Character that immediately precedes a column delimiter character embedded in an unquoted string, or immediately precedes the quote character in a quoted string.
When you specify an escape character,
Data Profiling
reads the delimiter character as a regular character.
Field Labels
Choose one of the following options to display the column names in profile results:
  • Auto-generate.
    Data Profiling
    auto-generates the column names.
  • Import from Row <
    row_number
    >. Imports the column name from the specified row number.
First Data Row
<row_number>
Row number from which
Data Profiling
starts to read when it imports the file. For example, if you enter
2
,
Data Profiling
skips the first row.
Data Profiling
sets the
First Data Row
automatically when you set the
Import from Row
option. For example, if you set the
Import from Row
option to 10,
Data Profiling
sets the
First Data Row
to 11.
To run a profile on fixed-width flat file, select the fixed-width format that you configured in Data Integration.
Amazon S3 v2
The following table lists the options for the delimited format type:
Option
Description
Schema Source
You must specify the schema of the source file. You can select one of the following options to specify a schema:
  • Read from data file. Amazon S3 V2 Connector imports the schema from the file in Amazon S3.
  • Import from schema file. Imports schema from a schema definition file in your local machine.
Default is Read from data file.
Delimiter
Character used to separate columns of data. You can configure parameters such as comma, tab, colon, semicolon, or others. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the Delimiter field.If you specify a multibyte character as a delimiter in the source object, the mapping fails.
Default is comma (,).
Escape Character
Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string.
Default is backslash (\).
Text Qualifier
Character that defines the boundaries of text strings.
If you select a quote character,
Data Profiling
ignores delimiters within quotes.
Default is double quote (").
Qualifier Mode
Specify the qualifier behavior for the target object. You can select one of the following options:
  • Minimal. Default mode. Applies qualifier to data that have a delimiter value or a special character present in the data. Otherwise, the Secure Agent does not apply the qualifier when writing data to the target.
  • All. Applies qualifier to all data.
Default is Minimal.
Code Page
UTF-8. Select for Unicode and non-Unicode data. Select the code page that the Secure Agent must use to read data.
Header Line Number
Specify the line number that you want to use as the header when you read data from Amazon S3. You can also read data from a file that does not have a header. Default is 1.
To read data from a file with no header, specify the value of the Header Line Number field as 0.To read data from a file with a header, set the value of the Header Line Number field to a value that is greater or equal to one.
This property is applicable during runtime and data preview to read a file. This property is applicable during data preview to write a file.
First Data Row
Specify the line number from where you want the Secure Agent to read data. You must enter a value that is greater or equal to one. To read data from the header, the value of the Header Line Number and the First Data Row fields should be the same. Default is 2.
This property is applicable during runtime and data preview to read a file. This property is applicable during data preview to write a file.
Row Delimiter
Character used to separate rows of data. You can set values as
\r\n
,
\n
, and
\r
.
The following table lists the options for the avro and parquet format type:
Option
Description
Schema Source
The schema of the source or target file. You can select one of the following options to specify a schema:
  • Read from data file. Default. Amazon S3 V2 Connector reads the schema from the source file that you select.
  • Import from Schema File. Imports schema from a schema definition file in your local machine.
Schema File
Upload a schema definition file. You cannot upload a schema file when you create a target at runtime.
The following table lists the options for the JSON format type:
Option
Description
Schema Source
The schema of the source or target file. You can select one of the following options to specify a schema:
  • Read from data file. Default. Amazon S3 V2 Connector reads the schema from the source file that you select.
  • Import from Schema File. Imports schema from a schema definition file in your local machine.
Schema File
Upload a schema definition file. You cannot upload a schema file when you create a target at runtime.
Sample Size
Specify the number of rows to read to find the best match to populate the metadata.
Memory Limit
The memory that the parser uses to read the JSON sample schema and process it.
The default value is 2 MB.If the file size is more than 2 MB, you might encounter an error. Set the value to the file size that you want to read.
Azure Data Lake Store Gen2
The following table lists the options for the delimited format type:
Option
Description
Schema Source
You must specify the schema of the source file. You can select one of the following options to specify a schema:
  • Read from data file. Azure Data Lake Store Gen2 Connector imports the schema from the file in Azure Data Lake Store.
  • Import from schema file. Imports schema from a schema definition file in your local machine.
Default is Read from data file.
Delimiter
Character used to separate columns of data. You can configure parameters such as comma, tab, colon, semicolon, or others.
You cannot set a tab as a delimiter directly in the
Delimiter
field. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the
Delimiter
field.
Default is comma (,).
Escape Character
Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string.
Default is backslash (\).
Text Qualifier
Character that defines the boundaries of text strings.
If you select a quote character,
Data Profiling
ignores delimiters within quotes.
Default is double quote (").
Qualifier Mode
Specify the qualifier behavior for the target object. You can select one of the following options:
  • Minimal. Default mode. Applies qualifier to data that have a delimiter value or a special character present in the data. Otherwise, the Secure Agent does not apply the qualifier when writing data to the target.
  • All. Applies qualifier to all data.
Default is Minimal.
Code Page
Select the code page that the Secure Agent must use to read data.
Microsoft Azure Data Lake Storage Gen2 Connector supports only UTF-8. Ignore rest of the code pages.
Header Line Number
Specify the line number that you want to use as the header when you read data from Microsoft Azure Data Lake Storage Gen2. You can also read a data from a file that does not have a header. To read data from a file with no header, specify the value of the
Header Line Number
field as 0.
This property is applicable when you perform data preview.
Default is 1.
First Data Row
Specify the line number from where you want the Secure Agent to read data. You must enter a value that is greater or equal to one. To read data from the header, the value of the
Header Line Number
and the
First Data Row
fields should be the same. Default is 2.
This property is applicable when you perform data preview.
Row Delimiter
Character used to separate rows of data. You can set values as
\r\n
,
\n
, and
\r
.
The following table lists the options for the avro and parquet format type:
Option
Description
Schema Source
The schema of the source or target file. You can select one of the following options to specify a schema:
  • Read from data file. Default. Azure Data Lake Store Gen2 Connector reads the schema from the source file that you select.
  • Import from Schema File. Imports schema from a schema definition file in your local machine.
Schema File
Upload a schema definition file. You cannot upload a schema file when you create a target at runtime.
The following table lists the options for the JSON format type:
Option
Description
Schema Source
The schema of the source or target file. You can select one of the following options to specify a schema:
  • Read from data file. Default. Azure Data Lake Store Gen2 Connector reads the schema from the source file that you select.
  • Import from Schema File. Imports schema from a schema definition file in your local machine.
Schema File
Upload a schema definition file. You cannot upload a schema file when you create a target at runtime.
Sample Size
Specify the number of rows to read to find the best match to populate the metadata.
Memory Limit
The memory that the parser uses to read the JSON sample schema and process it.
The default value is 2 MB.If the file size is more than 2 MB, you might encounter an error. Set the value to the file size that you want to read.
Google Cloud Storage V2
The following table lists the options for the delimited format type:
Option
Description
Schema Source
You must specify the schema of the source file. You can select one of the following options to specify a schema:
  • Read from data file. Google Cloud Storage V2 Connector imports the schema from the file in Google Cloud Storage.
  • Import from schema file. Imports schema from a schema definition file in your local machine.
Default is Read from data file.
Delimiter
Character used to separate columns of data. You can configure parameters such as comma, tab, colon, semicolon, or others. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the Delimiter field.
If you specify a multibyte character as a delimiter in the source object, the mapping fails.
To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the Delimiter field.
Escape Character
Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string.
Text Qualifier
Character that defines the boundaries of text strings.
If you select a quote character,
Data Profiling
ignores delimiters within quotes.
Default is double quote (").
Qualifier Mode
Specify the qualifier behavior for the target object. You can select one of the following options:
  • Minimal. Default mode. Applies qualifier to data enclosed within a delimiter value or a special character.
  • All. Applies qualifier to all data.
  • Non_Numeric. Not applicable.
  • All_Non_Null. Not applicable.
Code Page
Select the code page that the Secure Agent must use to read or write data.Google Cloud Storage V2 Connector supports the following code pages:
  • MS Windows Latin 1. Select for ISO 8859-1 Western European data.
  • UTF-8. Select for Unicode and non-Unicode data.
  • Shift-JIS. Select for double-byte character data.
  • ISO 8859-15 Latin 9 (Western European).
  • ISO 8859-2 Eastern European.
  • ISO 8859-3 Southeast European.
  • ISO 8859-5 Cyrillic.
  • ISO 8859-9 Latin 5 (Turkish).
  • IBM EBCDIC International Latin-1.
Header Line Number
Specify the line number that you want to use as the header when you read data from Google Cloud Storage. You can also read a file that doesn't have a header. Default is 1.
To read data from a file with no header, specify the value of the Header Line Number field as 0. To read data from a file with a header, set the value of the Header Line Number field to a value that is greater than or equal to one. Ensure that the value of the Header Line Number field is lesser than or equal to the value of the First Data Row field. This property is applicable during runtime and data preview to read a file. When you create a mapping in advanced mode, set the value of the header line number to 0, 1, or empty to run the mapping successfully.
First Data Row
Specify the line number from where you want the Secure Agent to read data. You must enter a value that is greater or equal to one.
To read data from the header, the value of the Header Line Number and the First Data Row fields should be the same. Default is 1.
This property is applicable during runtime and data preview to read a file. This property is applicable during data preview to write a file.
Row Delimiter
Not applicable.
Character used to separate rows of data. You can set values as
\r\n
,
\n
, and
\r
.
The following table lists the options for the avro and parquet format type:
Option
Description
Schema Source
The schema of the source or target file. You can select one of the following options to specify a schema:
  • Read from data file. Default. Azure Data Lake Store Gen2 Connector reads the schema from the source file that you select.
  • Import from Schema File. Imports schema from a schema definition file in your local machine.
Schema File
Upload a schema definition file. You cannot upload a schema file when you create a target at runtime.
The following table lists the options for the JSON format type:
Option
Description
Schema Source
The schema of the source or target file. You can select one of the following options to specify a schema:
  • Read from data file. Default. Azure Data Lake Store Gen2 Connector reads the schema from the source file that you select.
  • Import from Schema File. Imports schema from a schema definition file in your local machine.
Schema File
Upload a schema definition file. You cannot upload a schema file when you create a target at runtime.
Sample Size
Specify the number of rows to read to find the best match to populate the metadata.
Memory Limit
The memory that the parser uses to read the JSON sample schema and process it.
The default value is 2 MB.If the file size is more than 2 MB, you might encounter an error. Set the value to the file size that you want to read.
Read multiple-line JSON files
Not applicable.
Oracle Cloud Object Storage
The following table lists the options for the delimited format type:
Option
Description
Schema Source
You must specify the schema of the source file. You can select one of the following options to specify a schema:
  • Read from data file. Oracle Cloud Object Storage Connector imports the schema from the file in Oracle Cloud Object Storage.
  • Import from schema file. Imports schema from a schema definition file in your local machine.
Default is Read from data file.
Delimiter
Character used to separate columns of data. You can configure parameters such as comma, tab, colon, semicolon, or others.
You cannot set a tab as a delimiter directly in the
Delimiter
field. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the
Delimiter
field.
Default is comma (,).
Escape Character
Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string.
Default is backslash (\).
Text Qualifier
Character that defines the boundaries of text strings.
If you select a quote character,
Data Profiling
ignores delimiters within quotes.
Default is double quote (").
Qualifier Mode
Specify the qualifier behavior for the target object. You can select one of the following options:
  • Minimal. Default mode. Applies qualifier to data that have a delimiter value or a special character present in the data. Otherwise, the Secure Agent does not apply the qualifier when writing data to the target.
  • All. Applies qualifier to all data.
Default is Minimal.
Code Page
Select the code page that the Secure Agent must use to read data.
Oracle Cloud Object Storage Connector supports only UTF-8. Ignore rest of the code pages.
Header Line Number
Specify the line number that you want to use as the header when you read data from Oracle Cloud Object Storage. You can also read a data from a file that does not have a header. To read data from a file with no header, specify the value of the
Header Line Number
field as 0.
This property is applicable when you perform data preview.
Default is 1.
First Data Row
Specify the line number from where you want the Secure Agent to read data. You must enter a value that is greater or equal to one. To read data from the header, the value of the
Header Line Number
and the
First Data Row
fields should be the same. Default is 2.
This property is applicable when you perform data preview.
Row Delimiter
Character used to separate rows of data. You can set values as
\r\n
,
\n
, and
\r
.

0 COMMENTS

We’d like to hear from you!