Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Relational Databases for Column Profiles

Relational Databases for Column Profiles

The Profiling Service Module transfers as much processing as it can to the machine where the database resides. You need to consider the division of work between the Profiling Service Module and database when you estimate resources for each machine.
Depending on the rule logic, rules can be pushed down to the database or handled internally by the Profiling Service Module. If a rule is pushed down to the database, it is treated like a column during the profile run. Rules that run inside the Profiling Service Module are treated like columns in a flat file profile run. The rules are grouped into mappings of five output columns at a time before running the profile. The flat file calculations apply in this case.
The network between the relational database and the Profiling Service Module must be able to handle the data transfers. For large databases, the bandwidth required can be considerable.
The resource guidelines are for a single mapping that pushes the profiling logic to the relational database for each column.
Component
Requirement
CPU
At least one CPU processes each query based on the relational database. If the relational database can increase the processing power, such as the parallel hint in Oracle, the number of CPUs that the mapping uses increases.
Memory
The relational database requires memory in the form of a buffer cache. The greater the buffer cache, the faster the relational database runs the query. Use at least 512 MB of buffer cache.
Disk
Relational systems use temporary table space. The formula for the maximum amount of temporary table space required is:
2 X maximum number of rows X (maximum column size + frequency bytes)
where
  • Two indicates two passes.
  • Maximum number of rows is the maximum number of rows in any table.
  • Maximum column size is the number of bytes in any column in a table that is not one of the very large data types that you cannot run a profile on. An example of the very large datatype is CLOB. The column size must take into account the character encoding, such as Unicode or ASCII.
  • Frequency bytes is 4 or 8 bytes. Frequency bytes store the frequency during the analysis. This is the default size that the database uses for COUNT(*).
In many situations, the mapping uses less disk space. Perform the disk computation and assign the temporary table space to one or more physical disks. Use one disk for each mapping, and use a maximum of four disks.
Operating System
Use a 64-bit operating system to accommodate memory sizes greater than 4 GB. A 32-bit system works if the profiling parameter fits within the memory limitations of the system.

0 COMMENTS

We’d like to hear from you!