How do we resolve disks stuck in stopping maintenance mode

VMAX_Lapras 25 Reputation points
2024-02-01T15:24:05.7033333+00:00

Hi there, we run Server 2022 Datacentre in a 3 Node Cluster (running storage spaces direct). This disks in one of the Nodes are all stuck in 'Stopping maintenance mode, OK We have tried restarting the node in question,

Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
996 questions
{count} votes

Accepted answer
  1. Barry Gibson 80 Reputation points
    2024-03-06T13:47:41.5366667+00:00

    Hi, I have a resolution for this mackled together from various forums. As long as you are certain it is not an issue and just cosmetic / false labelling you can clear the flags to set your disks to show 'OK' in the Windows Admin Centre. We were experiencing this exact issue where the WAC was showing 'Stopping maintenance mode, OK' however checking Get-StorageJob, Get-ClusterStorageSpacesDirect + Get-StorageHealthReport, Get-PhysicalDisk from PowerShell which stated disk operational status as OK and no jobs running so presumed this was cosmetic from the WAC. The step involved importing a PowerShell script, identifying the disks concerned and repeating a command for each affected disk. The script to import is called 'Clear-PhysicalDiskHealthData.ps1' downloadable from https://go.microsoft.com/fwlink/?linkid=2034205 . You can then run Get-PhysicalDisk | Select-Object SerialNumber,UniqueID to show all your disks IDs. Once you have the ID of the disk you want to resolve run Clear-PhysicalDiskHealthData -Intent -Policy -UniqueID xxxxx -Verbose -Force (replacing xxxxx with your disk ID. And that's that, disk now shows as OK in WAC (give it 5 mins to refresh). I think this is caused by a recent-ish Windows update as one server from the cluster hasn't received the latest update yet and is fine.

    1 person found this answer helpful.
    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Ian Xue 36,751 Reputation points Microsoft Vendor
    2024-02-05T06:30:11.26+00:00

    Hi VMAX,

    Thanks for your post. In general, start with these steps:

    1. Confirm the make and model of SSD is certified for Windows Server 2016 and Windows Server 2019 by using the Windows Server Catalog. Confirm with the vendor that the drives are supported for Storage Spaces Direct.
    2. Inspect the storage for any faulty drives. Use storage management software to check the status of the drives. If any of the drives are faulty, work with your vendor.
    3. Update the storage and drive firmware if necessary. Ensure that the latest Windows Updates are installed on all nodes. You can get the latest updates for Windows Server 2016 from Windows 10 and Windows Server 2016 update history. Get the latest updates for Windows Server 2019 from Windows 10 and Windows Server 2019 update history.
    4. Update the network adapter drivers and firmware.
    5. Run cluster validation and review the Storage Space Direct section. Ensure that the drives you use for the cache are reported correctly and have no errors.

    Reference: Storage Spaces Direct troubleshooting | Microsoft Learn

    Also, I have found similar issue with same error, just for your reference and hope it helpful.   Storage Spaces Direct / S2D - Disks stuck in maintenance mode (nuvotex.de)

    Best Regards,

    Ian Xue


    If the Answer is helpful, please click "Accept Answer" and upvote it.

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. Net Runner 610 Reputation points
    2024-02-06T09:45:09.2966667+00:00

    First, ensure that the information you see is not a UI glitch. I prefer using Powershell for that purpose. Run the following commands against your cluster, storage pools, and storage volumes to check if the error persists.

    Get-ClusterStorageSpacesDirect + Get-StorageHealthReport

    Consider checking if any storage optimization jobs are currently running on the affected node.

    Get-StorageJob

    Since Storage Spaces Direct is known for those kinds of problems, your best course of action would be to rebuild the cluster node partially (evict, clean storage pools, rejoin) or entirely from scratch. That approach is guaranteed to fix the problem and, in most cases, takes less time compared to wasting your time for further investigation (benefits of HCI environment).

    Alternatively, you may consider replacing Storage Spaces Direct with a virtual SAN software https://www.starwindsoftware.com/vsan that offers the same feature set but runs isolated from the Microsoft Failover Cluster subsystem, which makes it more reliable and easier to maintain.

    0 comments No comments

  3. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.