Informatica Data Quality
- Informatica Data Quality H2L
- All Products
Data Source
| Description
|
---|---|
Flat file source
| The Profiling Service Module reads each row in a flat file source. The Profiling Service Module can construct rows by reading bytes from a flat file as required.
When you run a profile on flat file data sources, the Profiling Service Module runs all the processing logic in the mapping, including sorting and buffering.
|
Relational source
| Relational data sources contain an SQL query engine that you can use to view the data in a front-end application. The Profiling Service Module shares the processing logic with the relational database for some of the profile jobs.
If the Profiling Service Module and relational source are in two different machines, the Data Integration Service distributes the processing logic across the resources of the two machines.
You can optimize the relational source for the profile queries that results in the increase of performance.
|
Semi-structured source
| Avro, JSON, Parquet, and XML formats are semi-structured data sources. You can create flat file data objects for JSON or XML data sources. You can create complex file data objects for Avro, JSON, Parquet, and XML data sources in Hadoop Distributed File System (HDFS).
|
Mainframe source
| If the mainframe source is nonrelational, such as IMS or VSAM, the Profiling Service Module processes the source as a flat file.
It is not recommended that you share the SQL processing queries with IBM DB2 sources because mainframe access can result in additional charges or license fees.
The Profiling Service Module considers all relational mainframe sources as special flat files and performs all the processing logic. This method reduces the number of I/O operations on the mainframe source.
|
Other sources
| The Profiling Service Module considers social media, PowerExchange, logical data object, and mapping transformation data sources as flat files.
|
Type
| Requirements
|
---|---|
Base Memory
| The amount of memory required to run the Java Virtual Machine that the Data Integration Service uses, which is approximately 640 MB.
|
Variable Memory
| The amount of memory required to run each Data Transformation Manager thread.
One Data Transformation Manager thread is required to run each mapping that computes a part of a profile job. This overhead is dependent on the Maximum Execution Pool Size property in the service properties. The default value of this property is 10 and the overhead is approximately 1,000 MB.
A mapping requires additional memory to read address or identity reference data. A profile that reads the output of an address validation rule may incur an additional 1 GB in memory to read and cache the address validation reference data.
|
CPU
| The Profiling Service Module uses less than 1 CPU.
Consider the following CPU requirements for different profile types:
When you calculate the number of CPUs required for Data Transformation Manager operations, round the total number up to the nearest integer. Disk space is a one-time cost when the Data Integration Service is installed. CPU overhead is minimal when the Data Integration Service is not running jobs.
|
Memory
| No additional memory is required beyond the minimum needed to run the mapping.
|
Disk
| No disk space is required.
|
Operating System
| Use a 64-bit operating system, if possible, as a 64-bit system can handle memory sizes greater than 4 GB.
A 32-bit system works if the profiling parameter fits within the memory limitations of the system.
|