Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Advanced Profiling Properties

Advanced Profiling Properties

The advanced profiling properties apply to a single Data Integration Service node. You must configure the parameters for each node in the Data Integration Service.
You can configure the following advanced profiling properties:
Pattern Threshold Percent
The minimum percentage of rows matching up to two decimal places for a pattern to appear in the results.
Default is 5.00.
Maximum # Value Frequency Pairs
The maximum number of value frequency pairs stored in the profiling warehouse. This parameter does not control whether the Profiling Service Module computes all the value frequency pairs and the basic characteristics of a column profile run. However, an increase in the parameter value adds additional time to write more value frequency pairs to the profiling warehouse.
Default is 16,000.
Maximum String Length
The maximum length of a string that the Profiling Service Module mappings process internally. The default is set to the maximum value of 255. If you decrease the value, the Data Integration Service truncates the value. Decreased string lengths can have a minor impact on the amount of tablespace required for the profiling warehouse and negligible impact on the overall performance.
Default is 255.
Maximum Numeric Precision
The maximum precision, which is the number of significant digits in the number, for numeric decimal datatypes. If you set a low value for this parameter, the Data Integration Service might process additional numeric datatypes as strings instead of numbers.
Default is 38.
Maximum Concurrent Profile Jobs
The number of profile jobs that can run in parallel, even if there are more threads available to run mappings. You can use the parameter to control the number of concurrent jobs. Use the parameter to optimize the Profiling Service Module resources so that the Profiling Service Module resources do not affect the resource usage for the Data Integration Service.
When you run a column profile on a relational source, the Maximum DB Connections parameter determines the number of mappings that the Profiling Service Module uses. The Profiling Service Module uses one mapping each for the other profiling jobs. When you run a column profile on flat file sources or relational sources, the Maximum Concurrent Profile Threads parameter determines the number of mappings that the Profiling Service Module uses.
You can configure the Maximum Concurrent Profile Jobs based on the capabilities and other uses of the nodes that you run the Profiling Service Module on.
Default is 5.
Maximum Concurrent Columns
The number of columns that a mapping runs in parallel. The default value of 5 is optimal for most of the profiling use cases. You can increase the default value for columns with cardinality lower than the average value. Decrease the default value for columns with cardinality higher than the average value. You might also want to decrease this value is when you consistently run profiles on large source files where temporary disk space is low.
Default is 5.
Maximum Concurrent Profile Threads
The number of mappings that run in parallel when you run a column profile on a flat file data source or relational data source. Each mapping simultaneously runs a profile on a number of columns equal to the value you set for the Maximum Concurrent Columns parameter. If you increase this parameter value, the Profiling Service Module simultaneously runs the profile on more number of columns and reduces the overall time for the profile run.
Default is 1.
Maximum Column Heap Size
The cache size for each column profile mapping for flat files. You can increase this value to prevent the Data Integration Service from writing some parts of the intermediate profile results to temporary disk. However, this effect does not apply to large data sources. The default setting is optimal for most of the profiling use cases.
Default is 64.
Reserved Profile Threads
The number of threads that the Data Integration Service reserves to perform drill-down operations. The parameter ensures that a thread is always available for a drill-down operation, which is a quick and real-time operation. If enterprise discovery is a large part of the profile jobs, you can increase the parameter value. You can also increase the parameter value if multiple users might perform drill-down operations when the Profiling Service Module runs a profile.
Default is 1.

0 COMMENTS

We’d like to hear from you!