Table of Contents

Search

  1. Preface
  2. Advanced clusters
  3. Setting up AWS
  4. Setting up Google Cloud
  5. Setting up Microsoft Azure
  6. Setting up a self-service cluster
  7. Setting up a local cluster
  8. Advanced configurations
  9. Troubleshooting
  10. Appendix A: Command reference

Advanced Clusters

Advanced Clusters

Troubleshooting an advanced cluster subtask

Troubleshooting an
advanced cluster
subtask

The job failed but there are many logs I can view. Where do I start?
Troubleshoot the job by examining the logs in the following order:
  1. Execution plan. Debug the Scala code for the job.
  2. Session log. Debug the logic that compiles the job and generates the Spark execution workflow.
  3. Agent job log. Debug how the Secure Agent pushes the Spark execution workflow to the
    advanced cluster
    for processing.
  4. Spark driver and executor logs. Debug how the
    advanced cluster
    runs the job.
You can download the execution plan, session log, agent job log, and Spark driver log in Monitor.
To find the Spark executor log, copy the advanced log location for a specific Spark task that failed. Then, navigate to the log location on your cloud platform and download the log.
I can't find all of the log files for the job that failed. I've tried to download the logs from both Monitor and the log location on my cloud platform.
The logs that are available for the job depend on the step where the job failed during processing.
For example, if the job fails before the job is pushed to the
advanced cluster
, the Spark driver and executor logs are not generated in the log location, and Monitor cannot query the logs from the cloud platform either.
You can recover some of the log files, but you might have to use other types of logs to troubleshoot the job.
I can't find the Spark driver and Spark executor logs. Can I recover them?
If you can't download the Spark driver log from the user interface, you can recover the log using the Spark driver Pod. You cannot recover Spark executor logs.
When the Secure Agent pushes a job to an
advanced cluster
, the Secure Agent creates one Spark driver Pod and multiple Spark executor Pods to run the Spark tasks. You can use the Spark driver Pod to recover the Spark driver log, but you cannot recover the Spark executor logs. The Spark driver Pod deletes the Spark executor Pods immediately after a job succeeds or fails.
When a job succeeds or fails, the Spark driver Pod is deleted after 5 minutes by default. If you need to increase the limit to assist troubleshooting, contact Informatica Global Customer Support.
To recover the Spark driver log, perform the following tasks:
  1. Find the name of the Spark driver Pod in the agent job log. For example, see the name of the Spark driver Pod in the following message:
    2019/04/09 11:10:15.511 : INFO :Spark driver pod [spark-passthroughparquetmapping-veryvery-longlongname-1234567789-infaspark02843891945120475434-driver] was successfully submitted to the cluster.
    If you cannot download the agent job log in Monitor, the log is available in the following directory on the Secure Agent machine:
    <Secure Agent installation directory>/apps/At_Scale_Server/<version>/logs/job-logs/
    The file name of the agent job log uses the format
    AgentLog-<Spark job ID>.log
    . You can find the Spark job ID in the session log. For example, the Spark job ID is
    0c2c5f47-5f0b-43af-a867-da011452c19dInfaSpark0
    in the following message of the session log:
    2019-05-09T03:07:52.129+00:00 <LdtmWorkflowTask-pool-1-thread-9> INFO: Registered job to status checker with Id 0c2c5f47-5f0b-43af-a867-da011452c19dInfaSpark0
  2. Confirm that the Spark driver Pod exists. If the driver Pod was deleted, you cannot retrieve the Spark driver log.
    To confirm that the driver Pod exists, navigate to the following directory on the Secure Agent machine:
    <Secure Agent installation directory>/apps/At_Scale_Server/<version>/mercury/services/shared/kubernetes/kubernetes_<version>/bin
    In the directory, run the following command:
    ./kubectl get pods
  3. Find the cluster instance ID in one of the following ways:
    • Locate the cluster instance ID in the session log. For example, you might see the following message:
      2019/05/07 16:22:00.20 : INFO :[SPARK_2005] Uploading the local file in the path [/export/home/builds/ws/yxiao_hadoopvm_ML/Mercury/platformdiscale/main/components/cluster/hadoop-tests/cats/edtm/spark/./target/hadoop3a0b1db6-76ea-4317-8272-5b3a8dfd2171_InfaSpark0/log4j_infa_spark.properties] to the following shared storage location: [s3a://soki-k8s-local-state-store/k8s-infa/testcluster2.k8s.local/staging/sess4280021555102778947/log4j_infa_spark.properties].
      Note the following cloud storage location that you see in the message:
      s3a://soki-k8s-local-state-store/k8s-infa/testcluster2.k8s.local/staging/
      The cluster instance ID is the entry that follows "k8s-infa." In this case, the ID is testcluster2.k8s.local.
    • Locate the cluster instance ID in the
      ccs-operation.log
      file. The file is located in the following directory on the Secure Agent machine:
      <Secure Agent installation directory>/apps/At_Scale_Server/<version>/ccs_home/
  4. Log in to the Secure Agent machine as the sudo user that started the agent.
  5. Set the environment variable KUBECONFIG on the Secure Agent machine to the following value:
    <Secure Agent installation directory>/apps/At_Scale_Server/<version>/ccs_home/<cluster ID>/.kube/kubeconfig.yaml
  6. To retrieve the Spark driver log, navigate to the following directory on the Secure Agent machine:
    <Secure Agent installation directory>/apps/At_Scale_Server/<version>/mercury/services/shared/kubernetes/kubernetes_<version>/bin
    In the directory, run the following command:
    ./kubectl logs <Spark driver pod name>

0 COMMENTS

We’d like to hear from you!