Profiling and Discovery Sizing Guidelines

Back Next

Profiling Warehouse Database Properties

The profiling warehouse database properties apply to the Profiling Service Module across the deployment.

You can set the following parameters:

Profiling Warehouse Database: Connection name to the profiling warehouse database. In addition to the profile results, the profiling warehouse holds the persisted profile job queue. Verify that no profile job runs when you change the connection name. Otherwise, the profile jobs might stop running because the profile jobs run on the Data Integration Service where the Profiling Service Module submitted the profile jobs.
You set the default value when you create the instance.
Maximum Ranks: The number of minimum values and maximum values to display for a profile based on the datatype of the column. Ranks are useful to understand the range that the values of a column might take and whether the column has pseudonull values. You can retain the default value.
Default is 5.
Maximum Patterns: The maximum number of patterns that each column stores. Sometimes, it is important to store as many patterns as possible. You can set the
Maximum Patterns
parameter to a large number, such as 10,000, and adjust the
Pattern Threshold Percentage
parameter to
.01
. Setting a high value for this parameter has negligible impact on performance.
Default is 10.
Maximum Profile Execution Pool Size: The number of profile mappings that the Profiling Service Module can run concurrently when the Data Integration Service runs on a single node or on a grid. The pool size is dependent on the aggregate processing capacity of the Data Integration Service, which you specify for each node on the
Processes
tab of the Administrator tool. The pool size cannot be greater than the sum of the processing capacity of all nodes.
When you plan for a deployment, consider the following types of threads:
Threads used for profile tasks.
Reserved threads for drill-down tasks on the source data and similar, quick real-time profile tasks.
Threads for all other nonprofiling purposes, such as SQL endpoints, preview, and deployed mappings.

It is important to understand the mixture of mappings and profile jobs so that you can configure the Maximum Execution Pool Size parameter. For optimal performance, verify that the total number of threads in the three categories adds up to the aggregate total of the Maximum Execution Pool Size parameter.

Default is 10.
Maximum DB Connections: The number of parallel queries across all the profiling jobs for a relational source. If two profile jobs run on the same database, each profile job gets half the number of connections. If two profile jobs run on different databases, each profile job gets the maximum number of connections.
The Profiling Service Module verifies the connection name to recognize the different databases. If you have different connection names, the Profiling Service Module considers the connection names as different databases. Therefore, you might not want to create database aliases with two different connection names.

All databases use the same number of connections. Therefore, if you run a profile that runs on two databases with different performance characteristics, consider the database with the lowest concurrent profile run requests to configure this parameter.

Default is 5.
Profile Results Export Path: The default path where the Profiling Service Module stores the exported objects, such as Microsoft Excel spreadsheets and DDL files.
The default value is
<Installation Directory>/tomcat/bin/ProfileExport
.

Rename Saved Search

Table of Contents

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling Warehouse Database Properties

Profiling Warehouse Database Properties