Advanced Clusters

Back Next

Self-service cluster properties

Create an

advanced configuration

to configure properties for an

advanced cluster

. The properties describe where you want to start the cluster on your cloud platform and the infrastructure that you want to use.

The basic properties describe the

advanced configuration

and define the cloud platform that hosts the self-service cluster. To configure the cluster, configure the platform and runtime properties.

To learn about the minimum resource specifications that you need to set up a self-service cluster to run a mapping, see Resource requirements for cluster nodes.

Basic configuration

The following table describes the basic properties:

Property	Description
Name	Name of the advanced configuration .
Description	Description of the advanced configuration .
Runtime Environment	Runtime environment to associate with the advanced configuration . The runtime environment can contain only one Secure Agent. A runtime environment cannot be associated with more than one configuration. If you don't select a runtime environment, the validation process can't validate the communication link to the Secure Agent and that the Secure Agent has the minimum runtime requirements to start a cluster.
Cloud Platform	Cloud platform that hosts the cluster. Select Self-Service Cluster.

Platform configuration

The following table describes the platform properties:

Property	Description
Kubeconfig File Path	Path of the kubeconfig file. A kubeconfig file organizes information about clusters, users, and authentication mechanisms. Example: <directory name>/<file_name>.yaml You can save the YAML file in any directory on the Secure Agent machine.
Kube Context Name	Name of the cluster context. A context defines a named cluster and user tuple which is used to send requests to the specified cluster using the provided authentication information.
Cluster Version	Version of the Kubernetes cluster server. The advanced configuration validates the major and minor versions of the Kubernetes cluster server, but does not validate the patch release version numbers.
Namespace	Namespace where Informatica deploys resources.
Number of Worker Nodes	Number of worker nodes in the cluster. Specify the minimum and maximum number of worker nodes. The maximum node entry prevents the cluster from being overwhelmed with too many jobs, reducing the chances of resource deadlock. However, once the jobs reach the cluster, the cluster's own pod scheduler might use more nodes than the configured max nodes to run the job. To prevent the system from using more than the maximum number of nodes, you need to do the following: In the cluster, define multiple node groups. Each must have specific node labels. Add the node labels to the advanced cluster configuration. This allocates resources to only nodes in a specific node group.
Cluster Idle Timeout	Amount of time before Informatica-created cluster resource objects are deleted due to inactivity. The cluster itself is not deleted.
Mapping Task Timeout	Amount of time to wait for a mapping task to complete before it is terminated. By default, a mapping task does not have a timeout. If you specify a timeout, a value of at least 10 minutes is recommended. The timeout begins when the mapping task is submitted to the Secure Agent.
Staging Location	Complete path of the cloud location for staging data. Specify the path in one of the following formats: AWS. s3://<bucket name>/<folder path> Specify an S3 bucket in the same region as the cluster to decrease latency. Microsoft Azure. abfs(s)://<file system>@<storage account>.dfs.core.windows.net/<folder path>&:<resource group>/<region> The region is optional. The default region is westus2 . The Secure Agent needs permissions to access the staging location to store staging files at run time. You must provide appropriate IAM access permissions to both the Secure Agent machine and the worker nodes running in your cluster to access the staging location.
Log Location	Complete path of the cloud location for storing logs. Specify the path in one of the following formats: AWS. s3://<bucket name>/<folder path> Specify an S3 bucket in the same region as the cluster to decrease latency. Microsoft Azure. abfs(s)://<file system>@<storage account>.dfs.core.windows.net/<folder path>&:<resource group>/<region> The region is optional. The default region is westus2 . The Secure Agent needs permissions to access the staging location to store staging files at run time. You must provide appropriate IAM access permissions to both the Secure Agent machine and the worker nodes running in your cluster to access the staging location.
Labels	Key-value pairs that Informatica attaches to the Kubernetes objects that it creates in the self-service cluster. You can use labels to organize and select subsets of objects. Each object can have a set of key-value labels defined. Each key must be unique for a given object. You cannot use the @ symbol in a label. For more information about the supported syntax and character set, see the Kubernetes documentation.
Node Selector Labels	Use node selector labels to identify the nodes in the cluster on which Informatica can create Kubernetes objects.

Advanced configuration

The following table describes the advanced properties:

Property	Description
Annotations	Key-value pairs that are used to attach arbitrary non-identifying metadata to objects. You can only define annotations for Pods in a cluster. For more information about annotations, see the Kubernetes documentation.
Tolerations	Key-value pairs that are used to ensure that Pods are scheduled on appropriate nodes. When you configure a toleration, set the following properties: Key Operator Value Effect Toleration Seconds For more information about tolerations, see the Kubernetes documentation.

Runtime configuration

The following table describes the runtime properties:

Property	Description
Encrypt Data	Indicates whether temporary data on the cluster is encrypted. Encrypting temporary data might slow down job performance.
Runtime Properties	Custom properties to customize the cluster and the jobs that run on the cluster.

Advanced configurations

Runtime Properties

Validating the configuration

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal

0 COMMENTS

We’d like to hear from you! Log in to comment.

Alessio Giordani - November 27, 2024

Hello documentation team,

In the context of a self-service cluster, I've understood that:

only the maximum number of worker nodes will be considered, meaning that Informatica will not use more nodes than the number specified even if the cluster contains more nodes. The minimum number is not taken in consideration and is decided by the AKS/EKS setup.
the cluster Idle timeout control the AKS/EKS cluster cleanse from the artefacts deployed by Informatica, (pods, secrets, etc.) and not the cluster deletion.

can you please update?

thanks,

Alessio

Informatica Documentation Team - November 27, 2024

Thanks for reaching out, Alessio!

We received the following response to your query from our development team:

“For a self-service cluster, the maximum number of nodes is used by Data Integration to calculate the resource quota available for IICS, and to control job submission to avoid sending too many spark jobs to the cluster at the same time. Otherwise, there can be resource deadlock for spark execution. For example, if the maximum number of nodes is 3, and each node has 8 CPUs, IICS ensures that the spark drivers submitted to the cluster occupy at most 8 CPUs, one third of the resources that IICS can use.

Once the spark jobs are submitted to cluster, it’s up the cluster’s own POD scheduler to decide where to run the jobs. It might run the job on more nodes than the maximum configured number of nodes.

If you want to make sure that Data Integration doesn’t use more than the maximum configured number of nodes, you should define multiple node groups in the cluster, each of which have a specific node label, and provide the corresponding node labels in the advanced cluster configuration so that the Data Integration resources are only allocated on nodes in a certain group.

Stopping a self-service cluster means removing the resources that IICS created in the cluster.”

We'll get the documentation updated in an upcoming release.

Rename Saved Search

Table of Contents

Advanced Clusters

Advanced Clusters

Self-service cluster properties

Self-service cluster properties

Basic configuration

Platform configuration

Advanced configuration

Runtime configuration