Table of Contents

Search

  1. Preface
  2. Data Profiling
  3. Profiles
  4. Profile results
  5. Tuning data profiling task performance
  6. Troubleshooting

Data Profiling

Data Profiling

Profile Settings

Profile Settings

You can choose a sampling option for the profile run. You can also choose whether to drill down on the profile results.
The following table lists the options that you can choose in the
Profile Settings
area:
Property
Description
Run profile on
Choose one of the following sampling options to run the profile:
  • All rows. The profile runs on all the rows in the source object.
  • First
    n
    rows. The profile runs on the first
    n
    number of rows in the source.
  • Random sample
    n
    rows. The profile runs on the configured number of random rows.
Drilldown
Choose one of the following drill-down options:
  • Choose
    On
    to drill down on the profile results to display specific data. In the profiling results, when you choose a data type, pattern, or value,
    Data Profiling
    displays the relevant data in the
    Data Preview
    area. If you choose this option, you can run queries on the source object after you run the profile.
  • Choose
    Off
    to not drill down on the source object.
To drill down and to query the source object, you need Data Preview privileges in
Data Profiling
.
You cannot perform drill down on the profile results or queries if you select the Avro or Parquet source object for Amazon S3 and Azure Data Lake Store connections.
The following table lists the connections and supported sampling options:
Connection
Sampling Option
Amazon Athena
All Rows
First N Rows
Amazon Redshift V2
All Rows
Random N Rows
Amazon S3 v2
All Rows
Azure Data Lake Store Gen2
All Rows
Databricks Delta
All Rows (Data Integration Server and advanced mode execution)
Sample N Rows (Data Integration Server execution)
Flat File
All Rows
Google Big Query v2
All Rows
Google Cloud Storage V2
All Rows
JDBC V2
All Rows
First N Rows
Mapplets
All Rows
Microsoft Azure Synapse SQL
All Rows
First N Rows
Random N Rows
ODBC
All Rows
First N Rows. For Postgres and IBM DB2 data sources over an ODBC connection.
Oracle
All Rows
First N Rows
SAP BW Reader
All Rows
SAP Table
All Rows
To retrieve random number of rows from the data source, you can configure the
Number of rows to be fetched
option in the advanced options for the source connection.
SQL Server
All Rows
First N Rows
Salesforce
All Rows
First N Rows
Snowflake Data Cloud
All Rows
First N Rows
Random N Rows
To run a Databricks profile in advanced mode, ensure you can access an advanced cluster.

0 COMMENTS

We’d like to hear from you!