Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Content Management Service
  4. Data Integration Service
  5. Data Integration Service Architecture
  6. Data Integration Service Management
  7. Data Integration Service Grid
  8. Data Integration Service Applications
  9. Metadata Manager Service
  10. Model Repository Service
  11. PowerCenter Integration Service
  12. PowerCenter Integration Service Architecture
  13. High Availability for the PowerCenter Integration Service
  14. PowerCenter Repository Service
  15. PowerCenter Repository Management
  16. PowerExchange Listener Service
  17. PowerExchange Logger Service
  18. SAP BW Service
  19. Search Service
  20. System Services
  21. Test Data Manager Service
  22. Web Services Hub
  23. Application Service Upgrade
  24. Application Service Databases
  25. Connecting to Databases from Windows
  26. Connecting to Databases from UNIX
  27. Updating the DynamicSections Parameter of a DB2 Database

Execution Options

Execution Options

The following table describes the execution options for the Data Integration Service:
Property
Description
Use Operating System Profiles and Impersonation
Runs mappings, workflows, and profiling jobs with operating system profiles.
In a Hadoop environment, the Data Integration Service uses the Hadoop impersonation user to run mappings, workflows, and profiling jobs.
You can select this option if the Data Integration Service runs on UNIX or Linux. To apply changes, restart the Data Integration Service.
Launch Job Options
Runs jobs in the Data Integration Service process, in separate DTM processes on the local node, or in separate DTM processes on remote nodes. Configure the property based on whether the Data Integration Service runs on a single node or a grid and based on the types of jobs that the service runs.
Choose one of the following options:
  • In the service process.
    Configure when you run SQL data service and web service jobs on a single node or on a grid where each node has both the service and compute roles.
  • In separate local processes.
    Configure when you run mapping, profile, and workflow jobs on a single node or on a grid where each node has both the service and compute roles.
  • In separate remote processes.
    Configure when you run mapping, profile, and workflow jobs on a grid where nodes have a different combination of roles. If you choose this option when the Data Integration Service runs on a single node, then the service runs jobs in separate local processes.
Default is in separate local processes.
If the Data Integration Service uses operating system profiles, configure to run jobs in separate local processes.
If the Data Integration Service runs on UNIX and is configured to run jobs in separate local or remote processes, verify that the host file on each node with the compute role contains a localhost entry. Otherwise, jobs that run in separate processes fail.
Maximum Execution Pool Size
Maximum number of jobs that each Data Integration Service process can run concurrently. Jobs include data previews, mappings, profiling jobs, SQL queries, and web service requests. For example, a Data Integration Service grid includes three running service processes. If you set the value to 10, each Data Integration Service process can run up to 10 jobs concurrently. A total of 30 jobs can run concurrently on the grid. Default is 10.
The maximum number of requests that the Data Integration Service can run concurrently. Default is 10.
Maximum Memory Size
Maximum amount of memory, in bytes, that the Data Integration Service can allocate for running all requests concurrently when the service runs jobs in the Data Integration Service process. When the Data Integration Service runs jobs in separate local or remote processes, the service ignores this value. If you do not want to limit the amount of memory the Data Integration Service can allocate, set this property to 0.
If the value is greater than 0, the Data Integration Service uses the property to calculate the maximum total memory allowed for running all requests concurrently. The Data Integration Service calculates the maximum total memory as follows:
Maximum Memory Size + Maximum Heap Size + memory required for loading program components
Default is 0.
If you run profiles or data quality mappings, set this property to 0.
Maximum Parallelism
Maximum number of parallel threads that process a single mapping pipeline stage.
When you set the value greater than 1, the Data Integration Service enables partitioning for mappings, column profiling, and data domain discovery. The service dynamically scales the number of partitions for a mapping pipeline at run time. Increase the value based on the number of CPUs available on the nodes where jobs run.
In the Developer tool, developers can change the maximum parallelism value for each mapping. When maximum parallelism is set for both the Data Integration Service and the mapping, the Data Integration Service uses the minimum value when it runs the mapping.
Default is 1. Maximum is 64.
This property change does not require a restart of the Data Integration Service.
Developers cannot change the maximum parallelism value for each profile. When the Data Integration Service converts a profile job into one or more mappings, the mappings always use Auto for the mapping maximum parallelism.
Hadoop Kerberos Service Principal Name
Service Principal Name (SPN) of the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication.
Hadoop Kerberos Keytab
The file path to the Kerberos keytab file on the machine on which the Data Integration Service runs.
Temporary Directories
Directory for temporary files created when jobs are run. Default is
<home directory>/disTemp
.
Enter a list of directories separated by semicolons to optimize performance during profile operations.
Enter a list of directories separated by semicolons to optimize performance during profile operations and during cache partitioning for Sorter transformations.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Home Directory
Root directory accessible by the node. This is the root directory for other service directories. Default is
<Informatica installation directory>/tomcat/bin
. If you change the default value, verify that the directory exists.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Cache Directory
Directory for index and data cache files for transformations. Default is
<home directory>/cache
.
Enter a list of directories separated by semicolons to increase performance during cache partitioning for Aggregator, Joiner, or Rank transformations.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Source Directory
Directory for source flat files used in a mapping. Default is
<home directory>/source
.
If the Data Integration Service runs on a grid, you can use a shared directory to create one directory for source files. If you configure a different directory for each node with the compute role, ensure that the source files are consistent among all source directories.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Target Directory
Default directory for target flat files used in a mapping. Default is
<home directory>/target
.
Enter a list of directories separated by semicolons to increase performance when multiple partitions write to the flat file target.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Rejected Files Directory
Directory for reject files. Reject files contain rows that were rejected when running a mapping. Default is
<home directory>/reject
.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Informatica Home Directory on Hadoop
The PowerCenter® Big Data Edition home directory on every data node created by the Hadoop RPM install. Type
/<PowerCenterBigDataEditionInstallationDirectory>/Informatica
.
Hadoop Distribution Directory
The directory containing a collection of Hive and Hadoop JARS on the cluster from the RPM Install locations. The directory contains the minimum set of JARS required to process Informatica mappings in a Hadoop environment. Type
/<PowerCenterBigDataEditionInstallationDirectory>/Informatica/services/shared/hadoop/[Hadoop_distribution_name]
.
Data Integration Service Hadoop Distribution Directory
The Hadoop distribution directory on the Data Integration Service node. The contents of the Data Integration Service Hadoop distribution directory must be identical to Hadoop distribution directory on the data nodes. Type
<Informatica Installation directory/Informatica/services/shared/hadoop/[Hadoop_distribution_name]
.
State Store
The HDFS location on the cluster to store information about the state of the Spark job. Default is
<Home Directory >/State Store
Configure this property when you configure the run-time properties of a streaming mapping.
This property change does not require a restart of the Data Integration Service.
For more information about this property, see the
Informatica Intelligent Streaming User Guide