Azure HCI unable to reinstall aks service (from previous corrupted aks)

Christopher Low Kin Siong 121 Reputation points Microsoft Vendor
2021-08-11T10:09:38.07+00:00

due to some corruption with the aks service or nodes (unable to run get-akshcibillingstatus).

I intend to cleanly redo the aks installation
a) all servers are patched to 10.0.20348 with latest august updates
uninstall-akshci was run on all nodes with no errors.

b) using the gui, my first attempts fail after 10 minutes 18 seconds install-akshci, the operation has timed out. clean up your host environment and re-start the setup process.

c) I reran uninstall-akshci

I tried powershell
update-module az.accounts -RequiredVersion 2.5.1
Install-Module -Name AksHci -Repository PSGallery -force -acceptlicense
Import-Module Az.Accounts
Import-Module Az.Resources
Import-Module AzureAD
Import-Module AksHci
Connect-AzAccount -devicecode
WARNING: To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code xxx
to authenticate.

WARNING: TenantId '295be6d3-mytenantid' contains more than one active subscription. First one will be
selected for further use. To select another subscription, use Set-AzContext.

Account SubscriptionName TenantId Environment


admin@mydomain.onmicrosoft.com mysub 295be6d3-mytenantid AzureCloud

Set-AzContext -Subscription "50fb2758-mysubscription"

Name Account SubscriptionName Environment TenantId
---- ------- ---------------- ----------- -------- mysub (50fb2758-... admin@mydomain.o... mysub AzureCloud 295be6d3-5142-4... Register-AzResourceProvider -ProviderNamespace Microsoft.Kubernetes ProviderNamespace : Microsoft.Kubernetes RegistrationState : Registered ResourceTypes : {connectedClusters, locations, locations/operationStatuses, registeredSubscriptions...} Locations : {West Europe, East US, West Central US, South Central US...}

Register-AzResourceProvider -ProviderNamespace Microsoft.KubernetesConfiguration

ProviderNamespace : Microsoft.KubernetesConfiguration
RegistrationState : Registered
ResourceTypes : {sourceControlConfigurations, extensions, operations}
Locations : {East US, West Europe, West Central US, West US 2...}

PS C:\Users\mylogon>
Set-AksHciRegistration -subscriptionid "50fb2758-mysubscription" -tenantid "295be6d3-mytenantid" -resourcegroupname DellAzureHCISEA -Region SouthEastAsia
Set-AksHciRegistration : Cannot bind argument to parameter 'version' because it is an empty string.
At line:1 char:1

  • Set-AksHciRegistration -subscriptionid "50fb2758-mysubscriptionid ...
  • ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  • CategoryInfo : InvalidData: (:) [Set-AksHciRegistration], ParameterBindingValidationException
  • FullyQualifiedErrorId : ParameterArgumentValidationErrorEmptyStringNotAllowed,Set-AksHciRegistration

d) I try gui again. (without uninstall-akshci from the powershell attempt)

Failed with errors
Install-AksHci - Importing Configuration Completed
Duration: 0 minutes 3 seconds
[Install-AksHci]:The operation has timed out.

Azure Stack HCI
Azure Stack HCI
A hyperconverged infrastructure operating system delivered as an Azure service that provides security, performance, and feature updates.
301 questions
0 comments No comments
{count} votes

Accepted answer
  1. MattMcSpirit-MSFT 561 Reputation points
    2021-08-11T16:10:03.967+00:00

    You can collect the logs using Get-AksHciLogs, however in the current state, I'm not sure what it will return, but worth a try.

    You shouldn't typically need to run Uninstall-AksHci on all physical nodes, just running on one node is usually sufficient but in this situation, running on every node just to be sure if probably a good idea. Here's what i've provided to others in the past:

    Firstly, run these:
    Uninstall-AksHci
    Uninstall-Moc
    Uninstall-AksHci

    Then:

    In the registry delete
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\AksHciPS
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MocPS
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\KvaPS

    Delete the PS modules from the WindowsPowerShell\Modules folder on each node.

    Delete the following folders and subfolders on the host machine:
    C:\AKSHCI
    C:\Program Files\AksHci

    Under the User who installed AKS-HCI there are a few folders as well.
    .AksHci
    d----- 3/17/2021 1:44 PM .kube
    d----- 3/16/2021 2:35 PM .Kva
    d----- 3/16/2021 2:35 PM .Moc
    d----- 3/12/2021 10:19 AM .ssh
    d----- 3/16/2021 2:36 PM .wssd

    Delete all VMs created by AKS-HCI if any are running.
    Delete the cluster object if it hasn’t been cleaned up already.
    Make sure you do that on all physical nodes in the cluster.


1 additional answer

Sort by: Most helpful
  1. Trent Helms - MSFT 2,536 Reputation points Microsoft Employee
    2021-08-11T12:16:49.053+00:00

    Hi @Christopher Low Kin Siong ,

    As I understand, the Uninstall-AksHci cmdlet should be run on directly on each of the cluster nodes and should be cleaning up the environment correctly. However, if this fails, you can manually perform a cleanup by doing the following:

    On each node of the cluster:

    • Remove wssdcloudagent service
    • Remove wssdagent service
    • Remove folder C:\Program Files\AksHci
    • Remove all VMs that are created from this process

    On the cluster:

    • Run ‘get-clustergroup’. If you have a clustergroup with a name of format ‘ca-guid’ or any that include the name ‘management cluster’, run Remove-ClusterGroup on that cluster group.

    After this, restart WAC and attempt your setup once again.

    Thanks so much,
    Trent