Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Individual Profile Sizing Example

Individual Profile Sizing Example

This use case summarizes the initiative of a few users in a department of a small organization to understand the quality of the data assets that they control. The users mostly run column profiles on flat files and database tables that have up to 10 million rows. The setup environment has a shared Windows Server on which the users run profiles.
The following table describes the setup environment:
Setup Component
Description
Users
One or two data analysts use the Analyst tool.
Hardware
1 node, 4 cores, 8 GB, 4 x 2 TB disks, and a 50% shared Windows Server.
Data
  • Flat files with 10 million rows.
  • Relational tables with 10 million rows on an 8 core, 32 GB Linux machine.
Profile type
Column profile and data domain discovery.
Profiling warehouse
Set up on the same server.
Model Repository Service
Set up on the same server.
Analyst Service
Set up on the same server.
The following table summarizes the recommended configuration parameters for the setup environment:
Parameter
Value
Maximum Execution Pool Size
>= 10
Maximum Profile Execution Pool Size
4
Maximum Concurrent Profile Jobs
2
Maximum DB Connections
2
Maximum Concurrent Profile Threads
1
Analysis
In this basic profile scenario, the configuration depends on the concurrent profile runs that users perform on up to three smaller flat files. Column profiles on flat files with 10 million rows run quickly. However, the expected peak CPU usage can exceed the power of the Profiling Service Module hardware for short periods of time.
To continuously run a profile on three or more flat files simultaneously, you can scale down the Maximum Concurrent Profile Jobs parameter to 2. This parameter value ensures adequate throughput for all uses of the node. In this use case, one or two data analysts use the Profiling Service Module. Therefore, the users can communicate with each other to ensure that machine overload does not occur at the higher setting of three concurrent profile jobs.
The configuration also depends on the production database. You can set the Maximum DB Connections parameter value to 2 so that the two concurrent profiles do not affect the performance of the relational database. If you run all the three relational profiles, the Maximum Profile Execution Pool Size parameter limits the number of concurrent profile queries to 10.
To avoid performance issues with the production database, you can reduce the Maximum Profile Execution Pool Size parameter value to 4. An example of this scenario is where the database cannot handle more than four concurrent profile queries. When you set the Maximum Profile Execution Pool Size parameter value to 4, the column profile jobs continue to run and limits the number of concurrent queries to the database.
You can set the Maximum Execution Pool Size parameter to a value higher than the Maximum Profile Execution Pool Size parameter value to enable quicker drill-down operations and previews.

0 COMMENTS

We’d like to hear from you!