Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Hardware Guidelines

Hardware Guidelines

The profiling warehouse contains a set of tables into which the profile mappings write the results. SQL queries fetch results from the profiling warehouse and send to the client application. In addition, you can use profiling warehouse views to run external reports on the profiling warehouse data.
The usage pattern for profile runs vary from small sets of results to multiple tables when the Data Integration Service writes the results to the profiling warehouse. The largest of these results is the value frequency data, at most 16,000 rows for relational sources and 80,000 for flat file sources. The exception is the mappings that stage the source data. These mappings can store millions of rows at one time and can consume significant resources over a longer period of time.
CPU Cores
The queries to the profiling warehouse usually do not include aggregate and other large computations. Most of the computational costs account for storing and selecting data. The CPU usage might not be ongoing or significant due to the brevity of the SQL queries.
To determine the number of CPU cores, you can use the following procedure:
  1. Choose the minimum number of CPU cores. To find out the minimum number of CPU cores, choose the larger value between 4 and the value you get when you divide Maximum Profile Execution Pool Size value by 4.
  2. Add 1 to 2 cores to the minimum number of CPU cores if the dominant column profile use case is to stage the data.
  3. To the result value in step 2, add 1 core for every 2 to 4 concurrent reports on the profiling warehouse.
The sum of all the values is the recommended number of CPU cores for the profiling warehouse.
The recommended number of CPU cores applies to the maximum resource usage of the Profiling Service Module. The number of CPU cores can vary based on the usage of the Profiling Service Module.
Memory
The profiling warehouse performs better with additional memory as the database can cache more data.
The following table provides guidelines for the minimal amount of memory for the buffer cache:
Use Case
Memory
Single user or Laptop
0.5 GB
2 to 5 users
2 GB
5 to 10 users
4 GB
> 10 users
8 GB
Disk
The profiling warehouse performs better with the tablespace spread among more disks, which is a general recommendation that works for all use cases. Since most of the queries involve fetching or storing data instead of analysis, there is little need for temporary tablespace. Therefore, there are no specific recommendations for disk.
Network
The speed of data transfer between the Profiling Service Module and profiling warehouse affects the profile performance. Slow network speed can increase the time required to write the results to the profiling warehouse and retrieve the results back to the client application. You need to have a fast and dedicated network or switch between the profiling warehouse and the Profiling Service Module for better profile performance.
Mixed Databases
The database that hosts the profiling warehouse usually hosts other repositories, such as the Model Repository Service. You need to take the minimum configuration into account when you analyze the expected usage in the environment and plan accordingly.

0 COMMENTS

We’d like to hear from you!