Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Ask INFA.

Table of Contents

Search

  1. Preface
  2. Data Profiling
  3. Profiles
  4. Profile results
  5. Tuning data profiling task performance
  6. Troubleshooting

Data Profiling

Data Profiling

Runtime environment

Runtime environment

You can choose a runtime environment to run the task. If you do not choose a runtime environment, the profile runs on the default runtime environment configured for the connection.
You can create, view, edit, or delete runtime environments in Administrator.
Data Profiling
displays runtime environments based on the source object that you select. For example, if the source object that you select is Avro, Parquet, or JSON,
Data Profiling
lists all the runtime environments that has the Elastic Server service enabled. If you select any other source object,
Data Profiling
lists all the runtime environments that has the Data Integration Server service enabled.

Serverless runtime environment

A serverless runtime environment is an advanced serverless deployment solution that does not require downloading, installing, configuring, and maintaining a Secure Agent or Secure Agent group. You can use a serverless runtime environment in the same way that you use a runtime environment when you configure a connection or some types of tasks in
Data Profiling
.
The following table lists the options that you can choose in the
Serverless Usage Properties
area:
Option
Description
Max Compute Units
Maximum number of serverless compute units corresponding to machine resources that the task can use. Overrides the corresponding property in the serverless runtime environment. By default, for a data profiling task, the maximum number of compute units is set to two.
Task Timeout
Amount of time in minutes to wait for the task to complete before it is terminated. The timeout ensures that serverless compute units are not unproductive when the task hangs. By default, the timeout is the value that is configured in the serverless runtime environment.
For more information, see the Runtime environments document.

Advanced clusters

An advanced cluster is a Kubernetes cluster that provides a distributed processing environment on the cloud. Fully-managed and self-service clusters can run data logic using a scalable architecture, while local clusters use a single node to quickly onboard projects for advanced use cases.
To use an advanced cluster, you perform the following steps:
  1. Set up your cloud environment so that the Secure Agent can connect to and access cloud resources.
  2. In Administrator, create an advanced configuration to define the cluster and the cloud resources.
  3. In Monitor, monitor cluster health and activity while developers in your organization create and run jobs on the cloud.
To run a profile on an Avro, Parquet, or JSON file, you need to configure the Amazon S3 V2 or Azure Data Lake Store connection with the respective Advanced cluster.
For more information about setting up the AWS, Microsoft Azure, and local cluster, see Advanced Clusters help.

0 COMMENTS

We’d like to hear from you!
Alessio Giordani - October 29, 2025

Hello documentation team,

I've understood that Data profiling on Serverless runtime is currently only supported on AWS and using the "Data Integration" runtime option, which means that  Avro, Parquet, or JSON file cannot be profiled even if you select the "Advanced Data Integration" Serverless Runtime.

Can you please add these remarks in this page?

Best regards,

Alessio

 

Informatica Documentation Team - October 29, 2025

Hi Alessio Giordani,

We’re working to address your comments and will get back to you.

Thanks,

Informatica Documentation team


Alessio Giordani - November 05, 2025

Hello documentation team,

While configuring my self-service cluster on Azure (using AKS) I recognised that the pod created on the cluster need to be able to reach out the Secure Agent machine to encrypt data using the Informatica Encryption service. This is also confirmed by this KB article https://knowledge.informatica.com/s/article/ERROR-Advanced-Mode-Profiling-fails-with-java-net-NoRouteToHostException-No-route-to-host?language=en_US

Can you please add a remark here, pointing also to the KB article, to be sure that the user will check this as requirement?

In fact, for a Data Integration mapping, this is not mandatory unless the Informatica Encryption is flagged in the advanced property when writing to S3.

Regards,

Alessio

 

Informatica Documentation Team - November 05, 2025

Hi Alessio Giordani,

We’re working to address your comments and will get back to you.

Thanks,

Informatica Documentation team