Table of Contents

Search

  1. Preface
  2. Analyst Service
  3. Catalog Service
  4. Content Management Service
  5. Data Integration Service
  6. Data Integration Service Architecture
  7. Data Integration Service Management
  8. Data Integration Service Grid
  9. Data Integration Service REST API
  10. Data Integration Service Applications
  11. Enterprise Data Preparation Service
  12. Interactive Data Preparation Service
  13. Informatica Cluster Service
  14. Mass Ingestion Service
  15. Metadata Access Service
  16. Metadata Manager Service
  17. Model Repository Service
  18. PowerCenter Integration Service
  19. PowerCenter Integration Service Architecture
  20. High Availability for the PowerCenter Integration Service
  21. PowerCenter Repository Service
  22. PowerCenter Repository Management
  23. PowerExchange Listener Service
  24. PowerExchange Logger Service
  25. SAP BW Service
  26. Search Service
  27. System Services
  28. Test Data Manager Service
  29. Test Data Warehouse Service
  30. Web Services Hub
  31. Application Service Upgrade
  32. Appendix A: Application Service Databases
  33. Appendix B: Connecting to Databases from Windows
  34. Appendix C: Connecting to Databases from UNIX or Linux
  35. Appendix D: Updating the DynamicSections Parameter of a DB2 Database

Application Service Guide

Application Service Guide

Execution Options

Execution Options

The following table describes the execution options for the Data Integration Service:
Property
Description
Use Operating System Profiles and Impersonation
Runs mappings, workflows, and profiling jobs with operating system profiles.
In a Hadoop environment, the Data Integration Service uses the Hadoop impersonation user to run mappings, workflows, and profiling jobs.
You can select this option if the Data Integration Service runs on UNIX or Linux. To apply changes, restart the Data Integration Service.
Launch Job Options
Runs jobs in the Data Integration Service process, in separate DTM processes on the local node, or in separate DTM processes on remote nodes. Configure the property based on whether the Data Integration Service runs on a single node or a grid and based on the types of jobs that the service runs.
Choose one of the following options:
  • In the service process.
    Configure when you run jobs on a single node or on a grid where each node has both the service and compute roles.
  • In separate local processes.
    Configure when you run jobs on a single node or on a grid where each node has both the service and compute roles.
  • In separate remote processes.
    Configure when you run mapping, profile, and workflow jobs on a grid where nodes have a different combination of roles. If you choose this option when the Data Integration Service runs on a single node, then the service runs jobs in separate local processes. You cannot run SQL data service or web service jobs in separate remote processes.
Default is in separate local processes.
If the Data Integration Service uses operating system profiles, configure to run jobs in separate local processes.
If the Data Integration Service runs on UNIX and is configured to run jobs in separate local or remote processes, verify that the host file on each node with the compute role contains a localhost entry. Otherwise, jobs that run in separate processes fail.
Maximum On-Demand Execution Pool Size
Maximum number of on-demand jobs that can run concurrently. Jobs include data previews, profiling jobs, REST and SQL queries, web service requests, and mappings run from the Developer tool. All jobs that the Data Integration Service receives contribute to the on-demand pool size. The Data Integration Service immediately runs on-demand jobs if enough resources are available. Otherwise, the Data Integration Service rejects the job. Default is 10.
The maximum on-demand pool size depends on the maximum number of concurrent jobs that a Developer tool client can run on a Data Integration Service. The maximum number of concurrent jobs that a Developer tool client can run is 10.
Maximum Native Batch Execution Pool Size
Maximum number of deployed jobs that can run concurrently in the native environment. The Data Integration Service moves native mapping jobs from the queue to the native job pool when enough resources are available. Default is 10.
Maximum Hadoop Batch Execution Pool Size
Maximum number of deployed jobs that can run concurrently in the Hadoop environment. The Data Integration Service moves Hadoop jobs from the queue to the Hadoop job pool when enough resources are available. Default is 100.
Maximum Memory Size
Maximum amount of memory, in bytes, that the Data Integration Service can allocate for running all requests concurrently when the service runs jobs in the Data Integration Service process. When the Data Integration Service runs jobs in separate local or remote processes, the service ignores this value. If you do not want to limit the amount of memory the Data Integration Service can allocate, set this property to 0.
If the value is greater than 0, the Data Integration Service uses the property to calculate the maximum total memory allowed for running all requests concurrently. The Data Integration Service calculates the maximum total memory as follows:
Maximum Memory Size + Maximum Heap Size + memory required for loading program components
Default is 0.
If you run profiles or data quality mappings, set this property to 0.
Maximum Parallelism
Maximum number of parallel threads that process a single mapping pipeline stage.
When you set the value greater than 1, the Data Integration Service enables partitioning for mappings, column profiling, and data domain discovery. The service dynamically scales the number of partitions for a mapping pipeline at run time. Increase the value based on the number of CPUs available on the nodes where jobs run.
In the Developer tool, developers can change the maximum parallelism value for each mapping. When maximum parallelism is set for both the Data Integration Service and the mapping, the Data Integration Service uses the minimum value when it runs the mapping.
You cannot change the maximum parallelism value for each profile. When the Data Integration Service converts a profile job into one or more mappings, the mappings always use Auto for the mapping maximum parallelism.
You do not have to set maximum parallelism for the Data Integration Service to use multiple partitions in the Hadoop environment.
Default is 1. Maximum is 64.
Hadoop Kerberos Service Principal Name
Service Principal Name (SPN) of the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication.
Not required when you run the MapR Hadoop distribution. Required for other Hadoop distributions.
Hadoop Kerberos Keytab
The file path to the Kerberos keytab file on the machine on which the Data Integration Service runs.
Not required when you run the MapR Hadoop distribution. Required for other Hadoop distributions.
Home Directory
Root directory accessible by the node. This is the root directory for other service directories. Default is
<Informatica installation directory>/tomcat/bin
. If you change the default value, verify that the directory exists.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Temporary Directories
Directory for temporary files created when jobs are run. Default is
<home directory>/disTemp
.
Enter a list of directories separated by semicolons to optimize performance during profile operations and during cache partitioning for Sorter transformations.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Cache Directory
Directory for index and data cache files for transformations. Default is
<home directory>/cache
.
Enter a list of directories separated by semicolons to increase performance during cache partitioning for Aggregator, Joiner, or Rank transformations.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Source Directory
Directory for source flat files used in a mapping. Default is
<home directory>/source
.
If the Data Integration Service runs on a grid, you can use a shared directory to create one directory for source files. If you configure a different directory for each node with the compute role, ensure that the source files are consistent among all source directories.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Target Directory
Default directory for target flat files used in a mapping. Default is
<home directory>/target
.
Enter a list of directories separated by semicolons to increase performance when multiple partitions write to the flat file target.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Rejected Files Directory
Directory for reject files. Reject files contain rows that were rejected when running a mapping. Default is
<home directory>/reject
.
You cannot use the following characters in the directory path:
* ? < > " | , [ ]
This property change does not require a restart of the Data Integration Service.
Cluster Staging Directory
The directory on the cluster where the Data Integration Service pushes the binaries to integrate the native and non-native environments and to store temporary files during processing. Default is /tmp.
Hadoop Staging User
The HDFS user that performs operations on the Hadoop staging directory. The user needs write permissions on Hadoop staging directory. Default is the Data Integration Service user.
Custom Hadoop OS Path
The local path to the Informatica Hadoop binaries compatible with the Hadoop operating system. Required when the Hadoop cluster and the Data Integration Service are on different supported operating systems. Download and extract the Informatica binaries for the Hadoop cluster on the machine that hosts the Data Integration Service. The Data Integration Service uses the binaries in this directory to integrate the domain with the Hadoop cluster. The Data Integration Service can synchronize the following operating systems:

    SUSE 11 and Redhat 6.5

Changes take effect after you recycle the Data Integration Service.
When you install an Informatica EBF, you must also install it in the path of the Hadoop operating system on the Data Integration Service machine.
Data Engineering Recovery
Indicates whether mapping jobs that run on the Spark engine are recovered when the Data Integration Service processing node fails. Default is False.
For more information, see the
Informatica Data Engineering Administrator Guide
.
State Store
The HDFS location on the cluster to store information about the state of the Spark job. Default is
<Home directory >/State Store
Configure this property when you configure the run-time properties of a streaming mapping.
This property change does not require a restart of the Data Integration Service.
For more information about this property, see the
Big Data Streaming User Guide
.

0 COMMENTS

We’d like to hear from you!