Common Content for Data Integration
- Common Content for Data Integration 10.5.4
- All Products
Property
| Description
|
---|---|
Use Operating System Profiles and Impersonation
| Runs mappings, workflows, and profiling jobs with operating system profiles.
In a Hadoop environment, the Data Integration Service uses the Hadoop impersonation user to run mappings, workflows, and profiling jobs.
You can select this option if the Data Integration Service runs on UNIX or Linux. To apply changes, restart the Data Integration Service.
|
Launch Job Options
| Runs jobs in the Data Integration Service process, in separate DTM processes on the local node, or in separate DTM processes on remote nodes. Configure the property based on whether the Data Integration Service runs on a single node or a grid and based on the types of jobs that the service runs.
Choose one of the following options:
Default is in separate local processes.
If the Data Integration Service uses operating system profiles, configure to run jobs in separate local processes.
If the Data Integration Service runs on UNIX and is configured to run jobs in separate local or remote processes, verify that the host file on each node with the compute role contains a localhost entry. Otherwise, jobs that run in separate processes fail.
|
Maximum On-Demand Execution Pool Size
| Maximum number of on-demand jobs that can run concurrently. Jobs include data previews, profiling jobs, REST and SQL queries, web service requests, and mappings run from the Developer tool. All jobs that the Data Integration Service receives contribute to the on-demand pool size. The Data Integration Service immediately runs on-demand jobs if enough resources are available. Otherwise, the Data Integration Service rejects the job. Default is 10.
The maximum on-demand pool size depends on the maximum number of concurrent jobs that a Developer tool client can run on a Data Integration Service. The maximum number of concurrent jobs that a Developer tool client can run is 10.
|
Maximum Native Batch Execution Pool Size
| Maximum number of deployed jobs that can run concurrently in the native environment. The Data Integration Service moves native mapping jobs from the queue to the native job pool when enough resources are available. Default is 10.
|
Maximum Hadoop Batch Execution Pool Size
| Maximum number of deployed jobs that can run concurrently in the Hadoop environment. The Data Integration Service moves Hadoop jobs from the queue to the Hadoop job pool when enough resources are available. Default is 100.
|
Maximum Memory Size
| Maximum amount of memory, in bytes, that the Data Integration Service can allocate for running all requests concurrently when the service runs jobs in the Data Integration Service process. When the Data Integration Service runs jobs in separate local or remote processes, the service ignores this value. If you do not want to limit the amount of memory the Data Integration Service can allocate, set this property to 0.
If the value is greater than 0, the Data Integration Service uses the property to calculate the maximum total memory allowed for running all requests concurrently. The Data Integration Service calculates the maximum total memory as follows:
Maximum Memory Size + Maximum Heap Size + memory required for loading program components
Default is 0.
If you run profiles or data quality mappings, set this property to 0.
|
Maximum Parallelism
| Maximum number of parallel threads that process a single mapping pipeline stage.
When you set the value greater than 1, the Data Integration Service enables partitioning for mappings, column profiling, and data domain discovery. The service dynamically scales the number of partitions for a mapping pipeline at run time. Increase the value based on the number of CPUs available on the nodes where jobs run.
In the Developer tool, developers can change the maximum parallelism value for each mapping. When maximum parallelism is set for both the Data Integration Service and the mapping, the Data Integration Service uses the minimum value when it runs the mapping.
You cannot change the maximum parallelism value for each profile. When the Data Integration Service converts a profile job into one or more mappings, the mappings always use Auto for the mapping maximum parallelism.
You do not have to set maximum parallelism for the Data Integration Service to use multiple partitions in the Hadoop environment.
Default is 1. Maximum is 64.
|
Hadoop Kerberos Service Principal Name
| Service Principal Name (SPN) of the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication.
Not required when you run the MapR Hadoop distribution. Required for other Hadoop distributions.
|
Hadoop Kerberos Keytab
| The file path to the Kerberos keytab file on the machine on which the Data Integration Service runs.
Not required when you run the MapR Hadoop distribution. Required for other Hadoop distributions.
|
Home Directory
| Root directory accessible by the node. This is the root directory for other service directories. Default is
<Informatica installation directory>/tomcat/bin . If you change the default value, verify that the directory exists.
You cannot use the following characters in the directory path:
This property change does not require a restart of the Data Integration Service.
|
Temporary Directories
| Directory for temporary files created when jobs are run. Default is
<home directory>/disTemp .
Enter a list of directories separated by semicolons to optimize performance during profile operations and during cache partitioning for Sorter transformations.
You cannot use the following characters in the directory path:
This property change does not require a restart of the Data Integration Service.
|
Cache Directory
| Directory for index and data cache files for transformations. Default is
<home directory>/cache .
Enter a list of directories separated by semicolons to increase performance during cache partitioning for Aggregator, Joiner, or Rank transformations.
You cannot use the following characters in the directory path:
This property change does not require a restart of the Data Integration Service.
|
Source Directory
| Directory for source flat files used in a mapping. Default is
<home directory>/source .
If the Data Integration Service runs on a grid, you can use a shared directory to create one directory for source files. If you configure a different directory for each node with the compute role, ensure that the source files are consistent among all source directories.
You cannot use the following characters in the directory path:
This property change does not require a restart of the Data Integration Service.
|
Target Directory
| Default directory for target flat files used in a mapping. Default is
<home directory>/target .
Enter a list of directories separated by semicolons to increase performance when multiple partitions write to the flat file target.
You cannot use the following characters in the directory path:
This property change does not require a restart of the Data Integration Service.
|
Rejected Files Directory
| Directory for reject files. Reject files contain rows that were rejected when running a mapping. Default is
<home directory>/reject .
You cannot use the following characters in the directory path:
This property change does not require a restart of the Data Integration Service.
|
Cluster Staging Directory
| The directory on the cluster where the Data Integration Service pushes the binaries to integrate the native and non-native environments and to store temporary files during processing. Default is /tmp.
|
Hadoop Staging User
| The HDFS user that performs operations on the Hadoop staging directory. The user needs write permissions on Hadoop staging directory. Default is the Data Integration Service user.
|
Custom Hadoop OS Path
| The local path to the Informatica Hadoop binaries compatible with the Hadoop operating system. Required when the Hadoop cluster and the Data Integration Service are on different supported operating systems. Download and extract the Informatica binaries for the Hadoop cluster on the machine that hosts the Data Integration Service. The Data Integration Service uses the binaries in this directory to integrate the domain with the Hadoop cluster. The Data Integration Service can synchronize the following operating systems:
SUSE 12 and Redhat 6.7
Changes take effect after you recycle the Data Integration Service.
When you install an Informatica EBF, you must also install it in the path of the Hadoop operating system on the Data Integration Service machine.
|
Data Engineering Recovery
| Indicates whether mapping jobs that run on the Spark engine are recovered when the Data Integration Service processing node fails. Default is False.
For more information, see the
Informatica Data Engineering Administrator Guide .
|
State Store
| The HDFS location on the cluster to store information about the state of the Spark job. Default is
<Home directory >/State Store
Configure this property when you configure the run-time properties of a streaming mapping.
This property change does not require a restart of the Data Integration Service.
For more information about this property, see the
Big Data Streaming User Guide
.
|