Table of Contents

Search

  1. Preface
  2. Advanced clusters
  3. Setting up AWS
  4. Setting up Google Cloud
  5. Setting up Microsoft Azure
  6. Setting up a self-service cluster
  7. Setting up a local cluster
  8. Advanced configurations
  9. Troubleshooting
  10. Appendix A: Command reference

Advanced Clusters

Advanced Clusters

Self-service cluster properties

Self-service cluster properties

Create an
advanced configuration
to configure properties for an
advanced cluster
. The properties describe where you want to start the cluster on your cloud platform and the infrastructure that you want to use.
The basic properties describe the
advanced configuration
and define the cloud platform that hosts the self-service cluster. To configure the cluster, configure the platform and runtime properties.
To learn about the minimum resource specifications that you need to set up a self-service cluster to run a mapping, see Resource requirements for cluster nodes.

Basic configuration

The following table describes the basic properties:
Property
Description
Name
Name of the
advanced configuration
.
Description
Description of the
advanced configuration
.
Runtime Environment
Runtime environment to associate with the
advanced configuration
. The runtime environment can contain only one Secure Agent. A runtime environment cannot be associated with more than one configuration.
If you don't select a runtime environment, the validation process can't validate the communication link to the Secure Agent and that the Secure Agent has the minimum runtime requirements to start a cluster.
Cloud Platform
Cloud platform that hosts the cluster.
Select Self-Service Cluster.

Platform configuration

The following table describes the platform properties:
Property
Description
Kubeconfig File Path
Path of the kubeconfig file.
A kubeconfig file organizes information about clusters, users, and authentication mechanisms.
Example:
<directory name>/<file_name>.yaml
You can save the YAML file in any directory on the Secure Agent machine.
Kube Context Name
Name of the cluster context.
A context defines a named cluster and user tuple which is used to send requests to the specified cluster using the provided authentication information.
Cluster Version
Version of the Kubernetes cluster server.
The
advanced configuration
validates the major and minor versions of the Kubernetes cluster server, but does not validate the patch release version numbers.
Namespace
Namespace where Informatica deploys resources.
Number of Worker Nodes
Number of worker nodes in the cluster. Specify the minimum and maximum number of worker nodes.
Cluster Idle Timeout
Amount of time before Informatica-created cluster resource objects are deleted due to inactivity.
Mapping Task Timeout
Amount of time to wait for a
mapping
task to complete before it is terminated. By default, a
mapping
task does not have a timeout.
If you specify a timeout, a value of at least 10 minutes is recommended. The timeout begins when the
mapping
task is submitted to the Secure Agent.
Staging Location
Complete path of the cloud location for staging data.
Specify the path in one of the following formats:
  • AWS.
    s3://<bucket name>/<folder path>
    Specify an S3 bucket in the same region as the cluster to decrease latency.
  • Microsoft Azure.
    abfs(s)://<file system>@<storage account>.dfs.core.windows.net/<folder path>&:<resource group>/<region>
The region is optional. The default region is
westus2
.
The Secure Agent needs permissions to access the staging location to store staging files at run time. You must provide appropriate IAM access permissions to both the Secure Agent machine and the worker nodes running in your cluster to access the staging location.
Log Location
Complete path of the cloud location for storing logs.
Specify the path in one of the following formats:
  • AWS.
    s3://<bucket name>/<folder path>
    Specify an S3 bucket in the same region as the cluster to decrease latency.
  • Microsoft Azure.
    abfs(s)://<file system>@<storage account>.dfs.core.windows.net/<folder path>&:<resource group>/<region>
The region is optional. The default region is
westus2
.
The Secure Agent needs permissions to access the staging location to store staging files at run time. You must provide appropriate IAM access permissions to both the Secure Agent machine and the worker nodes running in your cluster to access the staging location.
Labels
Key-value pairs that Informatica attaches to the Kubernetes objects that it creates in the self-service cluster.
You can use labels to organize and select subsets of objects. Each object can have a set of key-value labels defined. Each key must be unique for a given object.
You cannot use the @ symbol in a label. For more information about the supported syntax and character set, see the Kubernetes documentation.
Node Selector Labels
Use node selector labels to identify the nodes in the cluster on which Informatica can create Kubernetes objects.

Advanced configuration

The following table describes the advanced properties:
Property
Description
Annotations
Key-value pairs that are used to attach arbitrary non-identifying metadata to objects. You can only define annotations for Pods in a cluster.
For more information about annotations, see the Kubernetes documentation.
Tolerations
Key-value pairs that are used to ensure that Pods are scheduled on appropriate nodes.
When you configure a toleration, set the following properties:
  • Key
  • Operator
  • Value
  • Effect
  • Toleration Seconds
For more information about tolerations, see the Kubernetes documentation.

Runtime configuration

The following table describes the runtime properties:
Property
Description
Encrypt Data
Indicates whether temporary data on the cluster is encrypted.
Encrypting temporary data might slow down job performance.
Runtime Properties
Custom properties to customize the cluster and the jobs that run on the cluster.

0 COMMENTS

We’d like to hear from you!
Alessio Giordani - November 27, 2024

Hello documentation team,

In the context of a self-service cluster, I've understood that:

  • only the maximum number of worker nodes will be considered, meaning that Informatica will not use more nodes than the number specified even if the cluster contains more nodes. The minimum number is not taken in consideration and is decided by the AKS/EKS setup.
  • the cluster Idle timeout control the AKS/EKS cluster cleanse from the artefacts deployed by Informatica, (pods, secrets, etc.) and not the cluster deletion.

can you please update?

thanks,

Alessio

Informatica Documentation Team - November 27, 2024

Thanks for reaching out, Alessio!

We received the following response to your query from our development team:

“For a self-service cluster, the maximum number of nodes is used by Data Integration to calculate the resource quota available for IICS, and to control job submission to avoid sending too many spark jobs to the cluster at the same time. Otherwise, there can be resource deadlock for spark execution. For example, if the maximum number of nodes is 3, and each node has 8 CPUs, IICS ensures that the spark drivers submitted to the cluster occupy at most 8 CPUs, one third of the resources that IICS can use.

Once the spark jobs are submitted to cluster, it’s up the cluster’s own POD scheduler to decide where to run the jobs. It might run the job on more nodes than the maximum configured number of nodes.

If you want to make sure that Data Integration doesn’t use more than the maximum configured number of nodes, you should define multiple node groups in the cluster, each of which have a specific node label, and provide the corresponding node labels in the advanced cluster configuration so that the Data Integration resources are only allocated on nodes in a certain group.

Stopping a self-service cluster means removing the resources that IICS created in the cluster.”

We'll get the documentation updated in an upcoming release.