Monitor

Back Next

Data Integration job log files

Data Integration
job log files

Data Integration

generates log files to help you monitor running, failed, and completed jobs. You can access some of the log files from the

All Jobs

Running Jobs

, and

My Jobs

pages, and from the job details.

Data Integration

generates the following types of log files:

Error rows file: Data Integration
generates error rows files for
synchronization
task and
masking
task instances. An error rows file shows the rows that failed and the reason why each row failed. The error rows file includes the first 50 fields of a source error row.

For example, the following error appears in the error rows file when the task tries to insert two records with the same external ID into a Salesforce target:

Error loading into target [HouseholdProduct__c] : Error received from salesforce.com. Fields [ExternalId__c]. Status code [DUPLICATE_VALUE]. Message [Duplicate external id specified: 1.0].
Session log file: Data Integration
generates a session log file for each job. This log gives you a high level view of the time spent for different operations.

The session log provides mapping compilation time, translation time, simplification time, optimization time, total time to create the LDTM, Spark task submission time, Spark task [InfaSpark0] execution start and end time, and Total time to perform the LDTM operation.

If a job fails, analyze the session log file first to help you troubleshoot the job.
Reject file: Data Integration
creates a reject file for each flat file and Oracle target in a mapping or mapping task that contains error rows. The reject file contains information about each rejected target row and the reason that the row was rejected.
Data Integration
saves the reject file to the following default folder:
$PMBadFileDir/<task federated ID>
Execution plan: Data Integration
generates an execution plan that shows the Scala code that an
advanced cluster
uses to run the data logic in a mapping in advanced mode. You can use the Scala code to debug issues in the mapping.
Agent job log: Data Integration
generates an agent job log that shows the logic that the Secure Agent uses to push the Spark execution workflow for a mapping in advanced mode to an
advanced cluster
for processing.; The agent job log contains information such as metering, time the application was submitted to the cluster, and the time the application completed. This log can help you troubleshoot delays in running the Spark task in the Session log, and you can see when the Spark task was processed on the Secure Agent.
Spark driver and Spark executor logs: An
advanced cluster
generates Spark driver and Spark executor logs to show the logic that the cluster uses to run a job. Use these logs to identify issues or errors with the Spark process. This log also contains information about the different executors being created and the tasks that are starting or have been completed.
Initialization script log: If an initialization script runs on an
advanced cluster
, the cluster generates an init script log to show the script output.
Cloud-init log: If an initialization script runs on the
advanced cluster
, the cluster generates a cloud-init log that contains information about how cluster nodes were initialized and bootstrapped. You can use the cloud-init log to check if any init scripts failed to run.

You can view the cloud-init log only in an AWS environment.
Spark event log: An
advanced cluster
generates a Spark event log to stream runtime events for tasks that run on the cluster.; The Spark event log records different events in a JSON-encoded format while the application is running. This log contains the events associated with the Spark application, such as the different jobs in the application, different stages, individual tasks, and interaction between entities.; The Spark event log also contains events related to the software infrastructure like driver information, executor creation, memory usage by executors, environment configuration, and the logical and physical plans of the Spark application. Use this log to trace what happened during every step of the Spark application run.; To find the Spark event log, open the Spark driver log and search for
SingleEventLogFileWriter
. The result of the search shows the path of the Spark event log. For example:
23/01/09 04:38:35 INFO SingleEventLogFileWriter - Logging events to s3://bucket/log_location_in_cluster_condifuration/eventLogs/atscaleagent/spark-a7bea557ede14382b4807d35b5404b97.inprogress; When the application completes, download the Spark event log from the location
s3://bucket/log_location_in_cluster_condifuration/eventLogs/atscaleagent/
as file
spark-a7bea557ede14382b4807d35b5404b97
.; To interpret the Spark event log, import it into a Spark history server and examine the log using the History server monitor. Check the following tabs:
The
Jobs
tab shows all the detailed metrics.
The
Stages
tab lists all the completed stages. You can see detailed information on the total number of tasks succeeded or failed, input and output data volume, and shuffle read and shuffle write data volume. Click on any stage to view the DAG visualization diagram.
The
Environments
tab shows the Spark-related parameters used to run the Spark job.
The
Executors
tab shows detailed information about the executor Pods and driver Pod.

For more information, refer to the Apache Spark documentation.
Advanced logs: The Advanced Log Location contains the Spark executor logs apart from the Spark driver and Agent job logs. The executor logs can help you troubleshoot issues with individual executors.