Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profile Deployment

Profile Deployment

As part of profile deployment, you need to plan the resources for profile deployment in the development environment and production environment. The Profiling Service Module has a set of parameters that controls the performance of a profiling job. You must configure the parameters for each deployment.
When you plan a profile deployment, you need to consider the profile job type, response time, user type, and data sources.
The following categories determine the system resource recommendations:
  • Resource guidelines for the Profiling Service Module and the Data Integration Service, including memory, disk space, and CPU usage.
  • Resource guidelines for column profiling, key discovery, functional dependency discovery, foreign key discovery, and overlap discovery based on the data source types and hardware capacity.

Profile Job Type

You can have multiple profile jobs when you run a profile on a data source. Each profile operation uses a different combination of resources. The mix of profile jobs determines the resource requirements. You need to balance the performance goals and resource costs effectively to optimize the deployment.
The following table summarizes the relative use of resources by each profile job type and data source:
Profile Operation
Data Source Type
CPU
Memory
Disk Space
RDBMS
Profiling Warehouse
Column Profile
Flat File
Medium
Low
Medium
None
Medium
Column Profile
Relational
Low
Low
None
High
Medium
Data Domain Discovery
Flat File
High
Low
Medium
None
Low
Data Domain Discovery
Relational
Medium
Low
None
High
Low
Key Discovery
-
Low
High
High
None
Low
Functional Dependency Discovery
-
Low
High
High
None
Low
Overlap Discovery
-
High
Low
None
None
Low
Foreign Key Discovery
-
High
Low
None
None
Low
Enterprise Discovery
Flat File
High
High
High
None
High
Enterprise Discovery
Relational
High
High
High
High
High
Reporting or Viewing Results
-
Low
None
None
None
Low
Drilldown
Flat File
Low
None
None
None
None
Drilldown
Relational
Low
None
None
Low
None

Response Time

The speed of a profile job run depends on the type of the profile job and resource types that the profile job uses. Most of the algorithms benefit from faster CPUs and memory because the operating system can use memory in different ways including caching data.
If the profile job has multithreaded algorithms, you can add additional CPU cores to improve the response time. Some algorithms perform better with faster or additional temporary disk.
The network speed is critical when the Data Integration Service queries or writes data to the profiling warehouse in another machine. The network speed is also important when the Data Integration Service running on one machine pushes queries to the RDBMS on another machine.
The following table summarizes the resource types for the Data Integration Service that increase response time when you add more or better resources for each resource type:
Profile Job Type
Faster CPU
Cores
Memory
Disk
Network
Column Profile
Yes
Yes
No
Yes
Yes
Data Domain Discovery
Yes
Yes
No
Yes
Yes
Key Discovery
Yes
No
Yes
Yes
No
Functional Dependency Discovery
Yes
No
Yes
Yes
No
Overlap Discovery
Yes
Yes
No
No
No
Foreign Key Discovery
Yes
Yes
No
No
No
Enterprise Discovery
Yes
Yes
Yes
Yes
Yes
Reporting or Viewing Results
Yes
No
No
No
Yes
Drilldown
Yes
No
No
No
Yes

User Types

The profile workload including system-generated profile jobs, such as periodic scorecard runs, depends on the number of users and type of users. When the number of users increases, more profile jobs run concurrently. The concurrent jobs indicate a range of the number of average profiling jobs for each profile type that can run successfully for the specified number of cores and memory. The type of profiling jobs that you need to estimate for depends on the type of user and resources.
Each user type might generate the following profile jobs:
Informatica Analyst user
Submits profile jobs, such as profile run, scorecard run, and drill-down jobs.
Informatica Developer user
Runs all the profile job types including enterprise discovery. In the Developer tool, the profile job type depends on the project.
infacmd command line utility user
Schedules scorecard runs but these users can run all profile jobs.

Pushdown Optimization for Data Sources

The effective use of the computing resource allocation depends on the data source type . When you run a profile on a relational source, the Profiling Service Module can transfer some of the profiling logic to the data source. The source system must be able to accommodate the additional workload. When you run a profile on a non-relational data source, the Profiling Service Module needs to compute the profiling job in the Data Integration Service. You can allocate all the computing resources to the system that runs the Informatica application. The pushdown of the processing logic also depends on the rule type and profile type.
The following guidelines determine the pushdown optimization for column profiles and rules:
  • Pushdown optimization applies only to physical data sources.
  • Pushdown optimization applies only to the following rules:
    • Rules containing a single expression transformation or internal expression rule with a single Boolean output port type.
    • Reusable validation rules that contain a single validation expression transformation.
    • Rules created in the Analyst tool.
  • Pushdown optimization does not apply to the following data objects:
    • Logical data object and mapping specification
      Pushdown optimization does not apply to the profiling logic. However, the Data Integration Service machine can optimize the logical data object and mapping specification mappings and push down parts of the mappings before the Data Integration Service applies the profiling logic.
    • Mapping specification
    • Flat file
    • Mainframe source
  • Pushdown optimization does not apply to the following rules:
    • Rules with multiple transformations.
    • Rules with a single, non-Boolean output port.
    • Reusable rules.
    • Rules that contain IIF(), Ltrim(), or Rtrim() function.
  • Pushdown optimization does not apply to columns with the Date data type.
The Profiling Service Module pushes the value frequency computation and rule logic to the data source for column profiles, data domain discovery profiles, and enterprise discovery profiles. The Profiling Service Module pushes the filter logic to the data source for key discovery and functional discovery for a single table, and overlap discovery and foreign key discovery for multiple tables.
If a column profile run does not push down the value frequencies, the Data Integration Service does not push down the rules.
The following table summarizes the resource allocation between the Profiling Service Module and data source system based on the pushdown of the processing logic:
Profile Job Type
Pushdown
Database
Profile Service Module
Column Profile
Yes
Medium
Medium
Column Profile
No
None
High
Data Domain Discovery
Yes
Medium
Medium
Data Domain Discovery
No
None
High
Key Discovery
Yes
None
High
Key Discovery
No
None
High
Functional Dependency Discovery
Yes
None
High
Functional Dependency Discovery
No
None
High
Overlap Discovery
Yes
None
High
Overlap Discovery
No
None
High
Foreign Key Discovery
Yes
None
High
Foreign Key Discovery
No
None
High
Enterprise Discovery
Yes
Medium
Medium
Enterprise Discovery
No
None
High
Reporting or Viewing Results
Yes
None
Medium
Reporting or Viewing Results
No
None
High
Drilldown
Yes
Medium
Low
Drilldown
No
None
High

0 COMMENTS

We’d like to hear from you!