Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Departmental Profile Sizing Example

Departmental Profile Sizing Example

This use case describes a small competency center in a medium to large size organization or a department in a larger organization. The data analysts need to understand and monitor current data for data quality purposes. Data analysts need to help developers in the front-end analysis of data-migration projects.
Most of the data sources range from medium to large files with 50 million to 1 billion rows. A few of the data sources can have up to 10 billion rows. A single dedicated node runs the profiles.
The following table describes the setup environment:
Setup Component
Description
Users
Five data analysts use the Analyst tool and three developers use the Developer tool.
Hardware
1 node, 16 cores, 32 GB, 6 x 2 TB disks, and Linux.
Data
  • Flat files with up to 10 billion rows.
  • Relational tables with up to 10 billion rows on multiple large servers.
Profile type
Column profile, data domain discovery, and scorecards.
Profiling warehouse
Set up on a different server.
Model Repository Service
Set up on a different server.
Analyst Service
Set up on a different server.
The following table describes the recommended configuration parameters for the setup environment:
Parameter
Value
Maximum Execution Pool Size
25
Maximum Profile Execution Pool Size
15
Maximum Concurrent Profile Jobs
8
Maximum DB Connections
5
Maximum Concurrent Profile Threads
1
DIS: Temporary Directories*
3
*Each directory on a different disk.

Analysis

The profile operations, such as drill-down operations and previews, in the Developer tool require additional Data Integration Service threads. To meet this requirement, you can set the Maximum Execution Pool Size parameter to a higher value than the Maximum Profile Execution Pool Size parameter. In this use case, the requirement is three times higher because the number of developers is three. A value in the range of 5 to 10 might be appropriate for the use case. If the use case varies, analyze the requirements of the developers and choose an appropriate number.
The Profiling Service Module node can run up to eight concurrent flat file profiles during peak usage. You can set the Maximum Concurrent Profile Jobs parameter to 8. This setting ensures that the analysts can run approximately one and a half profile or scorecard jobs at a time, which is more than adequate for most situations.
To improve the flat file performance, verify that the temporary directory includes at least three different directories on different physical disk drives.
If you run a profile on relational sources, you can retain the default value of 5 for the Maximum DB Connections parameter because this profile environment has multiple relational database servers. If all the profiles use one database server, the Maximum Profile Execution Pool Size parameter prevents more than 15 concurrent profile queries running at the same time. If the database servers have more than 16 cores, increase both Maximum Profile Execution Pool Size and Maximum Execution Pool Size parameters by up to 15 additional threads.

0 COMMENTS

We’d like to hear from you!