You can exclude null values when you perform data domain discovery on a data source. When you select the minimum percentage of rows with the exclude null values option, the conformance percentage is the ratio of number of matching rows divided by the total number of rows minus the null values in the column.
The data domain discovery process differs when you choose the
Exclude null values from data domain discovery
option and the multiple sampling options or filters.
The following scenarios explain the data domain discovery results when you choose the exclude null values option along with a sampling option and filters:
With
All rows
as the sampling option and no filters. Data domain discovery ignores all the null values in the column.
With a sampling option and no filters. Data domain discovery ignores all the null values in the sampled data and runs on the rest of the sampled data.
With
All rows
as the sampling option and with filters. Data domain discovery ignores all the null values in the filtered data and runs on the rest of the filtered data.
With a sampling option and filters. Data domain discovery ignores the null values in the filtered data in the sample and runs on the rest of the filtered data.
Example
You have a data source with 10,000 rows where 3,000 rows have Social Security Numbers in the Comments column. You create a column profile and data domain discovery and choose the following options:
Select the
Exclude null values from data domain discovery
option.
Select
All rows
as the sampling option.
Select the
Minimum percentage of rows
option and configure the option to 12%.
When you run the profile, the profile runs on the data set and ignores the null values during data domain discovery.