Table of Contents

Search

  1. Abstract for Profiling Sizing Guidelines
  2. Supported Versions
  3. Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Profiling and Discovery Sizing Guidelines

Hardware Guidelines for Profiling

Hardware Guidelines for Profiling

The performance of a profile depends on the hardware architecture. You usually have predefined hardware components in the system and it might not be possible to upgrade an individual component without the replacement of the computer. However, understanding how each component functions and the implication on performance can help when you plan for increase in performance.
Consider the following hardware considerations for profiles:
CPU
An increase in the processing speed of the CPU results in faster computation of the profiling results. Usually, doubling the speed of the CPU reduces the time of the profile result computation by half for the same processor family.
The speed of the CPU depends on the micro architecture of the CPU, including the clock speed, instruction set, and whether the CPU is 32 bit or 64 bit. The speed of the CPU depends on the number of effective cores or processing threads. The Data Integration Service uses multithreading programming techniques and uses more than one core or one processing thread at a time. If the CPU has more cores, the system can scale better and have increased throughput.
Disk
The Data Integration Service can use disk to read flat files and similar nonrelational sources.
The Data Integration Service streams and processes the data. The dependent factors are the disk type and network bandwidth to the CPU. To increase the read performance in this use case, you can optimize the storage architecture for single file access.
This optimization can be complex due to the variety of storage technologies:
  • Single and RAID Disk. The important factors you need to consider are the rotation speed, seek speed, controller speed, and motherboard bus speed.
  • SAN. The storage technology is implementation dependent. The important factors you need to consider are the number and types of host controllers, transfer speed from the appliance, and motherboard bus speed.
You can also use disk for temporary storage. The Data Integration Service depends on temporary disk space where the Data Integration Service can write to multiple temporary files on different disks in parallel. The Data Integration Service uses this approach to increase the read and write bandwidth that results in faster processing. The system can increase performance with a single temporary directory that uses a high performance storage implementation.
Memory
If you add faster memory, the CPU can quickly retrieve data from memory and increase the performance.
If you add more memory, the operating system and database can cache more data for faster access. The increase in cache data increases the performance of the Profiling Service Module because the Profiling Service Module can quickly access the data available in memory.
Network
The Data Integration Service uses the network to access databases that do not reside on the Data Integration Service machine. Multiple profile functions depend on the network to transfer large amounts of data from a database that the Data Integration Service processes.
The following profile functions use the network:
  • Column profile. The Data Integration Service can push the data processing down to the database. However, a column profile processes the unique values locally. If the source table is large and the profile runs on a key column, the Data Integration Service transfers all the values in the column.
  • Rule profile. The Data Integration Service pushes some rules down to the database. If the Data Integration Service cannot push a rule down to the database, the Data Integration Service transfers all the values in each column that is part of the rule.
  • Data domain discovery. The Data Integration Service can push the processing down to the database. However, a column profile processes the unique values locally. If the source table is large and the profile runs on a key column, the Data Integration Service transfers all the column values.
  • Foreign key discovery. The Data Integration Service transfers all the data because the Data Integration Service reads all the values for each source table that you run a profile on.
The network speed between the Profiling Service Module and database plays an important role in the performance. The profile functions run faster if you increase the network speed and the number of host controllers. If the network has multiple nonprofile functions, the profile functions run slower.

0 COMMENTS

We’d like to hear from you!