Troubleshoot failed Linux compute node agent extension installation

This article discusses how to troubleshoot a scenario in which the HPC Pack Linux node agent extension doesn't install successfully on a node in a high-performance computing (HPC) cluster.

Troubleshooting checklist

To troubleshoot a failed installation of the Microsoft.HpcPack.LinuxNodeAgent2016U1 extension on a Linux compute node, examine the extension log file, and then install a new Linux compute node on an infrastructure as a service (IaaS) virtual machine (VM).

Step 1: Examine the extension log

The extension log file might be able to help you determine why the node agent wasn't installed successfully. To find and view the extension log file, follow these steps:

  1. Open an administrative PowerShell console.

  2. Run the following commands to enable the Secure Shell (SSH) connection feature on the head node:

    dism /Online /Add-Capability /CapabilityName:OpenSSH.Server~~~~0.0.1.0
    Start-Service sshd
    Set-Service -Name sshd -StartupType 'Automatic'
    Set-Service -Name ssh-agent -StartupType 'Automatic'
    Start-Service ssh-agent
    
  3. On the head node, run the following command to sign in to the Linux compute node:

    ssh <domain-administrator-name>@<private-ip-address-of-linux-compute-node>
    
  4. Enter the account password of the domain administrator.

  5. Run the following command to verify that the extension log file exists on the node:

    sudo su ls -la /var/log/azure/Microsoft.HpcPack.LinuxNodeAgent2016U1/extension.log
    
  6. Run or open your preferred text viewer or editor, and then display the contents of the extension log file.

Step 2: Do a local test to burst to an IaaS VM

To test locally how to burst to an IaaS VM, follow these steps:

  1. Follow the steps to create an Azure IaaS node template. When you reach the Specify VM Image section of the template creation wizard, specify the following settings before you finish creating the node template.

    Field name Value
    Image Type MarketplaceImage
    OS Type Linux
    Image Label Red Hat Enterprise Linux 7.8
  2. Follow the steps to create the IaaS compute nodes and manage them. When you reach the Specify New Nodes section of the Add Node wizard, specify the following settings before you finish adding the node.

    Field name Value
    Node template The name of the node template that you created earlier.
    Number of nodes 1
    VM Size of nodes A1 (1 core, 1.75 GB Memory)
  3. Follow the steps to create a new job in the HPC Cluster Manager. When you reach the Resource Selection section, select LinuxNodes on the Available node groups list, and then select the Add button to move the item to the Selected node groups list. After you submit the new job, the Linux node will be provisioned correctly.

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.