Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling Warehouse Guidelines for Data Domain Discovery

Profiling Warehouse Guidelines for Data Domain Discovery

When you run data domain discovery, the Data Integration Service stores statistical and book-keeping data in the profiling warehouse in addition to the data domain discovery results.
Statistical and Bookkeeping Data for Data Domain Discovery
Each data domain discovery run stores a copy of the data domain names and the associated groups. In addition, each column contains a set book-keeping data stored in their own tables, such as profile ID and sequence numbers. This data takes up very little space and you can exclude it from disk space calculations.
Consider the disk requirement to be effectively zero.
Data Domain Discovery Result Calculation Guidelines
Each column stores the confidence computation and other metadata for each data domain. The required disk size is not significantly large. However, the disk space requirement can add up if there are many data domains.
Use the following formula to calculate the disk size requirements:
2 X number of columns X number of data domains X 254
where
  • The value 2 indicates the number of rules for each domain except the column name rule and data membership rule.
  • Number of columns is the sum of columns and virtual columns in the data domain discovery run.
  • Number of data domains is the number of data domains in the data domain discovery run.
  • The value 254 indicates the size of the statistics and keys.

0 COMMENTS

We’d like to hear from you!