Table of Contents

Search

  1. Preface
  2. Monitoring jobs
  3. Monitoring Data Integration jobs
  4. Data Integration job log files
  5. Monitoring Data Accelerator for Azure jobs
  6. Monitoring Data Profiling jobs
  7. Monitoring imports and exports
  8. Monitoring file transfer jobs
  9. Monitoring advanced clusters
  10. Monitoring source control logs

Monitor

Monitor

Data Integration job log files

Data Integration
job log files

Data Integration
generates log files to help you monitor running, failed, and completed jobs. You can access some of the log files from the
All Jobs
,
Running Jobs
, and
My Jobs
pages, and from the job details.
Data Integration
generates the following types of log files:
Error rows file
Data Integration
generates error rows files for
synchronization
task and
masking
task instances. An error rows file shows the rows that failed and the reason why each row failed. The error rows file includes the first 50 fields of a source error row.
For example, the following error appears in the error rows file when the task tries to insert two records with the same external ID into a Salesforce target:
Error loading into target [HouseholdProduct__c] : Error received from salesforce.com. Fields [ExternalId__c]. Status code [DUPLICATE_VALUE]. Message [Duplicate external id specified: 1.0].
Session log file
Data Integration
generates a session log file for each job. This log gives you a high level view of the time spent for different operations.
The session log provides mapping compilation time, translation time, simplification time, optimization time, total time to create the LDTM, Spark task submission time, Spark task [InfaSpark0] execution start and end time, and Total time to perform the LDTM operation.
If a job fails, analyze the session log file first to help you troubleshoot the job.
Reject file
Data Integration
creates a reject file for each flat file and Oracle target in a mapping or mapping task that contains error rows. The reject file contains information about each rejected target row and the reason that the row was rejected.
Data Integration
saves the reject file to the following default folder:
$PMBadFileDir/<task federated ID>
Execution plan
Data Integration
generates an execution plan that shows the Scala code that an
advanced cluster
uses to run the data logic in a mapping in advanced mode. You can use the Scala code to debug issues in the mapping.
Agent job log
Data Integration
generates an agent job log that shows the logic that the Secure Agent uses to push the Spark execution workflow for a mapping in advanced mode to an
advanced cluster
for processing.
The agent job log contains information such as metering, time the application was submitted to the cluster, and the time the application completed. This log can help you troubleshoot delays in running the Spark task in the Session log, and you can see when the Spark task was processed on the Secure Agent.
Spark driver and Spark executor logs
An
advanced cluster
generates Spark driver and Spark executor logs to show the logic that the cluster uses to run a job. Use these logs to identify issues or errors with the Spark process. This log also contains information about the different executors being created and the tasks that are starting or have been completed.
Initialization script log
If an initialization script runs on an
advanced cluster
, the cluster generates an init script log to show the script output.
Cloud-init log
If an initialization script runs on the
advanced cluster
, the cluster generates a cloud-init log that contains information about how cluster nodes were initialized and bootstrapped. You can use the cloud-init log to check if any init scripts failed to run.
You can view the cloud-init log only in an AWS environment.
Spark event log
An
advanced cluster
generates a Spark event log to stream runtime events for tasks that run on the cluster.
The Spark event log records different events in a JSON-encoded format while the application is running. This log contains the events associated with the Spark application, such as the different jobs in the application, different stages, individual tasks, and interaction between entities.
The Spark event log also contains events related to the software infrastructure like driver information, executor creation, memory usage by executors, environment configuration, and the logical and physical plans of the Spark application. Use this log to trace what happened during every step of the Spark application run.
To find the Spark event log, open the Spark driver log and search for
SingleEventLogFileWriter
. The result of the search shows the path of the Spark event log. For example:
23/01/09 04:38:35 INFO SingleEventLogFileWriter - Logging events to s3://bucket/log_location_in_cluster_condifuration/eventLogs/atscaleagent/spark-a7bea557ede14382b4807d35b5404b97.inprogress
When the application completes, download the Spark event log from the location
s3://bucket/log_location_in_cluster_condifuration/eventLogs/atscaleagent/
as file
spark-a7bea557ede14382b4807d35b5404b97
.
To interpret the Spark event log, import it into a Spark history server and examine the log using the History server monitor. Check the following tabs:
  • The
    Jobs
    tab shows all the detailed metrics.
  • The
    Stages
    tab lists all the completed stages. You can see detailed information on the total number of tasks succeeded or failed, input and output data volume, and shuffle read and shuffle write data volume. Click on any stage to view the DAG visualization diagram.
  • The
    Environments
    tab shows the Spark-related parameters used to run the Spark job.
  • The
    Executors
    tab shows detailed information about the executor Pods and driver Pod.
For more information, refer to the Apache Spark documentation.
Advanced logs
The Advanced Log Location contains the Spark executor logs apart from the Spark driver and Agent job logs. The executor logs can help you troubleshoot issues with individual executors.

0 COMMENTS

We’d like to hear from you!