Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling Warehouse Worksheet

Profiling Warehouse Worksheet

You can use this worksheet to estimate the amount of tablespace required to store the profiling results. The worksheet contains two parts and includes multiple worksheets. The first part gathers tentative values on the expected data characteristics and result sizes. The second part uses the values in the first part to calculate the final estimate.

Worksheet - Part 1

Enter the values for each metric in the Value column of the following worksheet:
Metric
Description
Value
Average Value Length for a Profile (AVL)
The average length of a value in characters across all columns and all tables that you run a profile on.
Average Value Length for a Scorecard (AVLsc)
The average length of a value in characters across all columns and all tables that you run a scorecard on.
Average Cardinality for a Profile (AC)
The cardinality of a column is the number of unique values in each column expressed as a percentage. This is the average cardinality across all columns and all tables that you run a profile on.
Average Cardinality for a Scorecard (ACsc)
The average cardinality across all columns and all tables that you run a scorecard on.
Average Number of Columns Across All Tables (NC)
The average number of columns across all tables that you run a profile on.
Average Number of Columns across All Scorecards (NCsc)
The average number of columns across all scorecards.
Average Number of Tables or Schema (NT)
The average number of tables or schema. For overlap discovery and foreign key discovery, this parameter represents the number of tables.
Max Value Frequencies (MVF)
The maximum number of value frequencies in the profiling warehouse for each column. The same number applies to both profiles and scorecards. The default value is 16000.
Average Number of Primary Keys (APK)
The average number of primary keys for each primary key discovery profile. A general guideline is to set this parameter to 100.
Average Number of Functional Dependencies (AFD)
The average number of functional dependencies for each functional dependency discovery profile. A general guideline is to set this parameter to 1000.
Average Overlapping Pairs (AOPP)
The estimated average number of overlapping pairs between tables as a percentage. A general guideline is to set this parameter to .01 (1%).
Average Foreign Keys for Each Table (FKT)
The estimated average number of foreign keys for each table. A general guideline is to set this parameter to 4.

Worksheet - Part 2

After you complete part 1 of the worksheet, update the worksheets in part 2.
In the Value column of the following worksheets, enter the values based on the formula in the Calculation column. You can enter the values for specific metrics, such as the number of scorecards and number of column profiles, based on your development deployment.
Scorecard
Use the following worksheet to record the values for a scorecard:
Metric
Calculation
Value
Average scorecard size
NCsc X [((2 X AVLsc) + 64) X (MVF X ACsc)]
Number of scorecards
-
Average number of versions for each scorecard
-
Calculation
To calculate the required tablespace to store the scorecard results in the profiling warehouse, multiply all the values in the Value column.
Column Profile
Use the following worksheet to record the values for a column profile:
Metric
Calculation
Value
Average profile size
NC X [((2 X AVL) + 64) X (MVF X AC)]
Number of column profiles
-
Calculation
To calculate the required tablespace to store the column profile results in the profiling warehouse, multiply both the values in the Value column.
Data Domain Discovery
Use the following worksheet to record the values for data domain discovery:
Metric
Calculation
Value
Average data domain size
NC X 254
Number of data domain discovery profiles
-
Calculation
To calculate the total tablespace required to store the data domain discovery results in the profiling warehouse, multiply both the values in the Value column.
Primary Key Discovery
Use the following worksheet to record the values for primary key discovery:
Metric
Calculation
Value
Average primary key result size
APK X [(128 + (32 X AVL))]
Number of primary key discovery profiles
-
Calculation
To calculate the total tablespace required to store the primary key discovery results in the profiling warehouse, multiply both the values in the Value column.
Functional Dependency Discovery
Use the following worksheet to record the values for functional dependency discovery:
Metric
Calculation
Value
Average functional dependency result size
APK X [(160 + (32 X AVL))]
Number of functional dependency discovery profiles
-
Calculation
To calculate the total tablespace required to store the functional dependency discovery results in the profiling warehouse, multiply both the values in the Value column.
Overlap Discovery
Use the following worksheet to record the values for overlap discovery:
Metric
Calculation
Value
Signatures
(NC X NT) X 3600
Overlapping pairs
(NC X NT)square X AOP
Calculation
To calculate the total tablespace required to store the overlap discovery results in the profiling warehouse, add both the values in the Value column.
Foreign Key Discovery
Use the following worksheet to record the values for foreign key discovery:
Metric
Calculation
Value
Signatures
(NC X NT) X 3600
Foreign keys
(FKT X NT) X [(224 + (2048 X AVL))]
Calculation
To calculate the total tablespace required to store the foreign key discovery results in the profiling warehouse, add both the values in the Value column.
If you run overlap discovery and foreign key discovery on the same set of tables, both the jobs share the disk space for signature computation.
Final Calculation
To calculate the total tablespace required for all the profile operations, add the tablespace values for the following profile operations:
  • Scorecard
  • Column profile
  • Data domain discovery
  • Primary key discovery
  • Functional dependency discovery
  • Overlap discovery
  • Foreign key discovery

0 COMMENTS

We’d like to hear from you!