You can create a custom profile or default profile. When you create a custom profile, you can configure the columns, sample rows, and drill-down options. When you create a default profile, the column profile and data domain discovery runs on the entire data set with all the data domains.
In the
Discovery
workspace, click
Profile
, or select
New
Profile
from the header area.
You can right-click on the data object in the
Library
workspace and create a profile. In this profile, the profile name, location name, and data object are extracted from the data object properties. You can create a default profile or customize the settings to create a custom profile.
The
New Profile
wizard appears.
The
Single source
option is selected by default. Click
Next
.
In the
Specify General Properties
screen, enter a name and an optional description for the profile. In the Location field, select the project or folder where you want to create the profile. Click
Next
.
In the
Select Source
screen, click
Choose
to select a data object, or click
New
to import a data object. Click
Next
.
In the
Choose Data Object
dialog box, select a data object. Click
OK
.
The Properties pane displays the properties of the selected data object. The Data Preview pane displays the columns in the data object.
In the
New Data Object
dialog box, you can choose a connection, schema, table, or view to create a profile on, select a location, and create a folder to import the data object. Click
OK
.
In the
Select Source
screen, select the columns that you want to run a profile on. Optionally, select
Name
to select all the columns. Click
Next
.
All the columns are selected by default. The Analyst tool lists column properties, such as the name, data type, precision, scale, nullable, and participates in the primary key for each column.
In the
Specify Settings
screen, choose to run a column profile, data domain discovery, or a column profile and data domain discovery. By default, column profile option is selected.
Choose
Run column profile
to run a column profile.
Choose
Run data domain discovery
to perform data domain discovery. In the
Data domain
pane, select the data domains that you want to discover, select a conformance criteria, and select the columns for data domain discovery in the
Edit columns selection for data domin discovery
dialog box.
Choose
Run column profile
and
Run data domain discovery
to run the column profile and data domain discovery. Select the data domain options in the
Data domain
pane.
By default, the columns that you select is for column profile and data domain discovery. Click
Edit
to select or deselect columns for data domain discovery.
Choose Data, Columns, or Data and Columns to run data domain discovery on.
Choose a sampling option. You can choose
All rows (complete analysis)
,
Sample first
,
Random sample
,
Random sample (auto)
,
Limit n
, or
Random percentage
as a sampling option in the
Run profile on
pane. The sampling option applies to column profile and data domain discovery.
Choose a drilldown option. You can choose
Live
or
Staged
drilldown option, or you can choose
Off
to disable drilldown in the
Drilldown
pane. Optionally, click
Select Columns
to select columns to drill down on. You can choose to omit data type and data domain inference for columns with an approved data type or data domain.
Choose
Native
,
Blaze
,
Spark
, or
Databricks
as the run-time environment. If you choose
Blaze
or
Spark
, click
Choose
to select a Hadoop connection in the
Select a Hadoop Connection
dialog box. If you choose
Databricks
, click
Choose
to select a Databricks connection.
Click
Next
.
The
Specify Rules and Filters
screen opens.
In the
Specify Rules and Filters
screen, you can perform the following tasks:
Create, edit, or delete a rule. You can apply existing rules to the profile.
Create, edit, or delete a filter.
When you create a scorecard on this profile, you can reuse the filters that you create for the profile.