Table of Contents

Search

  1. Preface
  2. Introducing Administrator
  3. Organizations
  4. Licenses
  5. Ecosystem single sign-on
  6. SAML single sign-on
  7. Source control and service upgrade settings
  8. Users and user groups
  9. User roles
  10. Permissions
  11. Runtime environments
  12. Serverless runtime environments
  13. Secure Agent services
  14. Secure Agent installation
  15. Schedules
  16. Bundle management
  17. Event monitoring
  18. File transfer
  19. Troubleshooting

Administrator

Administrator

Troubleshooting an elastic cluster on Microsoft Azure

Troubleshooting an
elastic cluster
on Microsoft Azure

After I set up staging and log locations on Blob Storage, the
elastic mapping
fails with the following error message in the session log:
20-02-11T00:52:43.273+00:00 <WorkflowExecutorThread20> INFO: [LDTM_0075] Total time to perform the LDTM operation: 84,962 ms 2020-02-11T00:52:43.305+00:00 <InfaDisnextHadoopMappingExecutor-3-64> SEVERE: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Failed to upload the local file in the path [/mnt/resource/informatica/secureagent/apps/At_Scale_Server/33.0.1.1/metadata/0100edc7-f043-43f7-a5e1-a39f0774c2c7InfaSpark0/submit_InfaSpark0_staticCode.jar] to the following shared storage location: [<Blob Storage location>] due to the following error: [java.lang.RuntimeException: [org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: The account being accessed does not support http.]]. 2020-02-11T00:52:43.306+00:00 <InfaDisnextHadoopMappingExecutor-3-64> INFO: Spark Mapping Ended with state: Failed
The error appears because Blob Storage requires requests to be made over HTTPS. To resolve the error, use the Azure portal to disable the
Secure transfer required
option for the storage account that holds the staging and log locations.
What should I do if the status of the
elastic cluster
is Unknown?
When the cluster status is Unknown, first verify that the Secure Agent is running. If the agent is not running, enable the agent and check whether the cluster starts running.
If the cluster does not start running, an administrator can run the command to list clusters. If the command output returns the cluster state as partial or in-use, the administrator can run the command to delete the cluster.
For more information about the commands, see
Administrator
in the Administrator help.
I restarted the Secure Agent machine and now the status of the
elastic cluster
is Error.
Make sure that the Secure Agent machine and the Secure Agent are running. Then, stop the
elastic cluster
in Monitor. In an Azure environment, the cluster might take 10 minutes to stop. After the cluster stops, you can run an
elastic job
to start the cluster again.
How do I find the initialization script logs for the nodes where the init script failed?
To find the init script logs, complete the following tasks:
  1. Locate the ccs-operation.log file in the following directory on the Secure Agent machine:
    <Secure Agent installation directory>/apps/At_Scale_Server/<version>/ccs_home/
  2. In the ccs-operation.log file, find a message that is similar to the following message:
    Failed to run the init script for cluster [<cluster instance ID>] on the following nodes: [<cluster node IDs]. Review the log in the following S3 file path: [<cloud platform location>].
  3. Navigate to the cloud platform location that is provided in the message.
  4. Match the cluster node IDs to the init script log file names for the nodes where the init script failed.
The init script failed with the following standard error on some nodes in the
elastic cluster
:
Created symlink from /etc/systemd/system/apt-daily.service to /dev/null. Created symlink from /etc/systemd/system/apt-daily-upgrade.service to /dev/null. Removed symlink /etc/systemd/system/timers.target.wants/apt-daily.timer. Removed symlink /etc/systemd/system/timers.target.wants/apt-daily-upgrade.timer. E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
The init script failed because the node was running an internal process at the same time as the init script. If you continue to see the error, wait for the internal process to complete by placing a sleep command for the required duration in your init script.
For example, you might use a sleep command as follows:
#!/bin/sh
while(sudo lsof /var/lib/dpkg/lock-frontend) do echo "Sleeping 10s" sleep 10 done
sudo apt-get -y update sudo apt-get install -y expect
How are the resource requirements calculated in the following error message for an
elastic cluster
?
2019-04-26T19:04:11.762+00:00 <Thread-16> SEVERE: java.lang.RuntimeException: [java.lang.RuntimeException: The Cluster Computing System rejected the Spark task [InfaSpark0] due to the following error: [[CCS_10252] Cluster [6bjwune8v4bkt3vneokii9.k8s.local] doesn't have enough resources to run the application [spark--infaspark0e6674748-b038-4e39-a2a9-3fd49e63f289infaspark0-driver] which requires a minimum resource of [(KB memory, mCPU)]. The cluster must have enough nodes, and each node must have at least [(KB memory, mCPU)] to run this job.].]
The first resource requirement is the total number of resources that are required by the Spark driver and the Spark executor.
The second resource requirement is calculated based on the minimum resource requirements on each worker node to run a minimum of one Spark process.
The resources are calculated using the following formulas:
Memory: MAX(driver_memory, executor_memory) CPU: MAX(driver_CPU, executor_CPU)
The Spark process can be either a Spark driver process or a Spark executor process. The cluster must have two nodes where each node fulfills the minimum requirements to run either the driver or the executor, or the cluster must have one node with enough resources to run both the driver and the executor.
The resource requirements for the driver and executor depend on how you configure the following advanced session properties in the
mapping
task:

    spark.driver.memory

    spark.executor.memory

    spark.executor.cores

For more information about minimum resource requirements, see
Administrator
in the Administrator help.


Updated August 03, 2020