Azure HCI 23H2 LCM Controller Extension Failing

Will Baumgardner 30 Reputation points
2024-02-09T17:37:40.75+00:00

After numerous attempts to uninstall and reinstall the LCM controller extension its still failing to load. How can i troubleshoot this? Is this a known issue with Azure HCI stack 23H2?

Azure Stack HCI
Azure Stack HCI
A hyperconverged infrastructure operating system delivered as an Azure service that provides security, performance, and feature updates.
300 questions
Windows Server
Windows Server
A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.
12,516 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
973 questions
0 comments No comments
{count} votes

Accepted answer
  1. Szych, Elmar 75 Reputation points
    2024-02-13T13:32:21.2033333+00:00

    This did the trick for me:

    Remove-AzConnectedMachineExtension -MachineName $nodeName -Name "AzureEdgeLifecycleManager" -ResourceGroupName $ResourceGroupName -SubscriptionId $SubscriptionID
    $regkey ="hklm:\SOFTWARE\Microsoft\LCMAzureStackStampInformation"
    $name = "InitializationComplete"
    Get-ItemProperty -Path $regkey -Name $name | Remove-Item
    

    Wait until the Extension shows as removed in the Azure Portal. !!! Reboot the host Here !!!

    New-AzConnectedMachineExtension -MachineName $nodeName -Name "AzureEdgeLifecycleManager" -ResourceGroupName $ResourceGroupName -SubscriptionId $subscriptionId -Location $location -Publisher "Microsoft.AzureStack.Orchestration" -ExtensionType "LcmController"
    

4 additional answers

Sort by: Most helpful
  1. Marcus 15 Reputation points
    2024-02-13T11:32:01.4233333+00:00

    YES! @Natthaphol Suntudkarn Thank you! Worked for us!

    Complete Solution:

    1. Remove the LCM Extension from Portal/CLI/Azure PS
    2. Log on to the seed node and delete:
      • del -R C:\ecestore
      • del -R C:\CloudDeployment
      • del -R C:\nugetstore
      • Remove-Item HKLM:\Software\Microsoft\LCMAzureStackStampInformation
    3. Reboot the seed node
    4. Reinstall the LCM extension using PowerShell from workstation or Cloud Shell. ex: New-AzConnectedMachineExtension -MachineName $Hostname -Name "AzureEdgeLifecycleManager" -ResourceGroupName $RG -SubscriptionId $Subscription -Location $region -Publisher "Microsoft.AzureStack.Orchestration" -ExtensionType "LcmController"

    Best,

    Marcus

    3 people found this answer helpful.
    0 comments No comments

  2. vipullag-MSFT 25,611 Reputation points
    2024-02-12T07:42:14.7633333+00:00

    Hello Will Baumgardner

    Welcome to Microsoft Q&A Platform, thanks for posting your query here.

    I checked with internal team on this (AzureEdgeLifecycleManager(version:30.2402.08) extension shows Failed State in Azure portal), this is reported and a know issue.

    If this is a new deployment where it fails after ARC enabling the server, then you can try below steps to resolve the issue:

    1. Remove the LCM Extension from Portal/CLI/Azure PS
    2. Log on to the seed node and delete the initialization registry key:
      $regkey ="hklm:\SOFTWARE\Microsoft\LCMAzureStackStampInformation"
      $name = "InitializationComplete"
      Get-ItemProperty -Path $regkey -Name $name | Remove-Item
    3. Log out of the seed node
    4. Reinstall the LCM extension using PowerShell from workstation or Cloud Shell. ex: New-AzConnectedMachineExtension -MachineName $Hostname -Name "AzureEdgeLifecycleManager" -ResourceGroupName $RG -SubscriptionId $Subscription -Location $region -Publisher "Microsoft.AzureStack.Orchestration" -ExtensionType "LcmController"

    Hope this helps.

    1 person found this answer helpful.

  3. Marcus 15 Reputation points
    2024-02-12T15:11:03.1033333+00:00

    Hello,
    we have the same issue!
    We have installed a fresh Azure Stack HCI 23H2.
    After trying your workaround, the problem still exists.
    Here's a snip of the Errormessage: ........ VERBOSE: Importing function 'Get-EceInterfaceParameters'. VERBOSE: Importing function 'Test-EceInterface'. VERBOSE: Initialization action plan status:Error VERBOSE: Node Initialization failed with error: Type 'StagePackageForDeployment' of Role 'DeploymentService' raised an exception: Action: Invocation of step 20 failed. Stopping invocation of action plan. InterfaceInvocationFailedException at CloudEngine.Actions.PowerShellHost.WaitAndReceiveJob(Job job, CancellationToken token, UInt32 timeoutSeconds, Stopwatch watch) in C:__w\1\s\src\EceEngine\ece\CloudEngine\Actions\Source\PowerShellHost.cs:line 248 at CloudEngine.Actions.PowerShellHost.Invoke(InterfaceParameters parameters, CancellationToken token, UInt32 timeoutSeconds, ThrottlingDescription throttlingDesc) in C:__w\1\s\src\EceEngine\ece\CloudEngine\Actions\Source\PowerShellHost.cs:line 156 at CloudEngine.Actions.InterfaceTask.Invoke(Configuration roleConfiguration, String startStep, String endStep, String[] skip, Nullable1 retries, Nullable1 interfaceTimeout, CancellationToken token, Dictionary`2 runtimeParameter, Boolean runInProcess) in C:__w\1\s\src\EceEngine\ece\CloudEngine\Actions\Source\InterfaceTask.cs:line 236CloudEngine.Actions.InterfaceInvocationFailedException: Type 'StagePackageForDeployment' of Role 'DeploymentService' raised an exception: Action: Invocation of step 20 failed. Stopping invocation of action plan. .......

    Please help.
    Regards,
    Marcus

    0 comments No comments

  4. Khai Mai 5 Reputation points
    2024-02-13T00:31:43.77+00:00

    I am having the same issue. I am setting up 4-nodes cluster. I only had successful on Node 1 and Node 2 for installation LCM Controller Extension. For node 3, and node 4, I have been trying to re-install LCM Controller Extension, but they failed on those 2 systems for LCM Controller Extension installation.

    Also, I tried to follow your instruction above to remove the LCM Extension and use command you provided to reinstall, but still failed on installation with error messages. Sumary

    Install Extension

    Error Message.pdf

    Please help, otherwise, I will be stuck on the validation for the new system with new version 23H2. Thank you,

    KM

    0 comments No comments