Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling Warehouse Guidelines for Key and Functional Dependency Discovery

Profiling Warehouse Guidelines for Key and Functional Dependency Discovery

The disk space for key and functional dependency discovery depends on the number of inferred keys, functional dependencies, and their dependency violations. These items take up large space in the profiling warehouse if you set a large number for key and functional dependency discovery.
You can use the following formulas to compute the disk space. If you set the confidence parameter to 100%, the profiling warehouse does not store violating rows and you can exclude the computation for key violation.
Keys
Number of Keys X Average Number of Key Columns X 32 + Number of Keys X ( 32 + (2 Bytes for Each Character X Average Column Size ) X Average Number of Key Columns X Average Number of Violating Rows
Where
  • Number of Keys is the number of inferred keys.
  • Average Number of Key Columns is the average number of columns in the key.
  • The value 32 is the number of bytes used to store one column in the key.
  • Average Column Size is the average number of characters in the columns if the numbers and dates are converted to the String datatype.
  • The value 2 Bytes for Each Character is the typical number of bytes used for a single Unicode character.
  • Average Number of Violating Rows is the average number of rows that violate the key.
Functional Dependency
Number of FDs X (Average Number of LHS Columns + 1) X 32 + Number of FDs X (32 + (2 Bytes for Each Character X Average Column Size ) X (Average Number of LHS Columns ) X Average Number of Violating Rows
Where
  • Number of FDs is the number of inferred functional dependencies.
  • Average Number of LHS Columns is the average number of columns in the determinant of the functional dependency. Add one column for the dependent column.
  • The value 32 is the number of bytes used to store one column in the functional dependency.
  • Average Column Size is the average number of characters in the columns if the numbers and dates are converted to the String datatype.
  • The value 2 Bytes for Each Character is the typical number of bytes used for a single Unicode character.
  • Average Number of Violating Rows is the average number of rows that violate the functional dependency.

0 COMMENTS

We’d like to hear from you!