Table of Contents

Search

  1. Preface
  2. Data Profiling
  3. Profiles
  4. Profile results
  5. Tuning data profiling task performance
  6. Troubleshooting

Data Profiling

Data Profiling

Runtime environment

Runtime environment

You can choose a runtime environment to run the task. If you do not choose a runtime environment, the profile runs on the default runtime environment configured for the connection.
You can create, view, edit, or delete runtime environments in Administrator.
Data Profiling
displays runtime environments based on the source object that you select. For example, if the source object that you select is Avro, Parquet, or JSON,
Data Profiling
lists all the runtime environments that has the Elastic Server service enabled. If you select any other source object,
Data Profiling
lists all the runtime environments that has the Data Integration Server service enabled.

Serverless runtime environment

A serverless runtime environment is an advanced serverless deployment solution that does not require downloading, installing, configuring, and maintaining a Secure Agent or Secure Agent group. You can use a serverless runtime environment in the same way that you use a runtime environment when you configure a connection or some types of tasks in
Data Profiling
.
The following table lists the options that you can choose in the
Serverless Usage Properties
area:
Option
Description
Max Compute Units
Maximum number of serverless compute units corresponding to machine resources that the task can use. Overrides the corresponding property in the serverless runtime environment. By default, for a data profiling task, the maximum number of compute units is set to two.
Task Timeout
Amount of time in minutes to wait for the task to complete before it is terminated. The timeout ensures that serverless compute units are not unproductive when the task hangs. By default, the timeout is the value that is configured in the serverless runtime environment.
For more information, see the Runtime environments document.

Advanced clusters

An advanced cluster is a Kubernetes cluster that provides a distributed processing environment on the cloud. Fully-managed and self-service clusters can run data logic using a scalable architecture, while local clusters use a single node to quickly onboard projects for advanced use cases.
To use an advanced cluster, you perform the following steps:
  1. Set up your cloud environment so that the Secure Agent can connect to and access cloud resources.
  2. In Administrator, create an advanced configuration to define the cluster and the cloud resources.
  3. In Monitor, monitor cluster health and activity while developers in your organization create and run jobs on the cloud.
To run a profile on an Avro, Parquet, or JSON file, you need to configure the Amazon S3 V2 or Azure Data Lake Store connection with the respective Advanced cluster.
For more information about setting up the AWS, Microsoft Azure, and local cluster, see Advanced Clusters help.

0 COMMENTS

We’d like to hear from you!