Manage storage infrastructure for Azure Stack Hub
This article describes the health and operational status of Azure Stack Hub storage infrastructure resources. These resources include storage drives and volumes. The information in this topic helps you troubleshoot various issues, like when a drive can't be added to a pool.
Volume states
To find out what state volumes are in, use the following PowerShell commands:
$scaleunit_name = (Get-AzsScaleUnit)[0].name
$subsystem_name = (Get-AzsStorageSubSystem -ScaleUnit $scaleunit_name)[0].name
Get-AzsVolume -ScaleUnit $scaleunit_name -StorageSubSystem $subsystem_name | Select-Object VolumeLabel, HealthStatus, OperationalStatus, RepairStatus, Description, Action, TotalCapacityGB, RemainingCapacityGB
Here's an example of output showing a detached volume and a degraded/incomplete volume:
VolumeLabel | HealthStatus | OperationalStatus |
---|---|---|
ObjStore_1 | Unknown | Detached |
ObjStore_2 | Warning | {Degraded, Incomplete} |
The following sections list the health and operational states:
Volume health state: Healthy
Operational state | Description |
---|---|
OK | The volume is healthy. |
Suboptimal | Data isn't written evenly across drives. Action: Contact Support to optimize drive usage in the storage pool. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles. You may have to restore from backup after the failed connection is restored. |
Volume health state: Warning
When the volume is in a Warning health state, it means that one or more copies of your data are unavailable but Azure Stack Hub can still read at least one copy of your data.
Operational state | Description |
---|---|
In service | Azure Stack Hub is repairing the volume, like after adding or removing a drive. When the repair is complete, the volume should return to the OK health state. Action: Wait for Azure Stack Hub to finish repairing the volume and check the status afterward. |
Incomplete | The resilience of the volume is reduced because one or more drives failed or are missing. However, the missing drives contain up-to-date copies of your data. Action: Reconnect any missing drives, replace any failed drives, and bring online any servers that are offline. |
Degraded | The resilience of the volume is reduced because of one or more failed or missing drives as well as outdated copies of data on the drives. Action: Reconnect any missing drives, replace any failed drives, and bring online any servers that are offline. |
Volume health state: Unhealthy
When a volume is in an Unhealthy health state, some or all of the data on the volume is currently inaccessible.
Operational state | Description |
---|---|
No redundancy | The volume has lost data because too many drives failed. Action: Contact Support. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles. |
Volume health state: Unknown
The volume can also be in the Unknown health state if the virtual disk has become detached.
Operational state | Description |
---|---|
Detached | A storage device failure occurred which may cause the volume to be inaccessible. Some data may be lost. Action: 1. Check the physical and network connectivity of all storage devices. 2. If all devices are connected correctly, contact Support. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles. You may have to restore from backup after the failed connection is restored. |
Drive states
Use the following PowerShell commands to monitor the state of drives:
$scaleunit_name = (Get-AzsScaleUnit)[0].name
$subsystem_name = (Get-AzsStorageSubSystem -ScaleUnit $scaleunit_name)[0].name
Get-AzsDrive -ScaleUnit $scaleunit_name -StorageSubSystem $subsystem_name | Select-Object StorageNode, PhysicalLocation, HealthStatus, OperationalStatus, Description, Action, Usage, CanPool, CannotPoolReason, SerialNumber, Model, MediaType, CapacityGB
The following sections describe the health states a drive can be in:
Drive health state: Healthy
Operational state | Description |
---|---|
OK | The volume is healthy. |
In service | The drive is doing some internal housekeeping operations. When the action is complete, the drive should return to the OK health state. |
Drive health state: Warning
A drive in the Warning state can read and write data successfully but has an issue.
Operational state | Description |
---|---|
Lost communication | Connectivity has been lost to the drive. Action: Bring all servers back online. If that doesn't fix it, reconnect the drive. If this state persists, replace the drive to ensure full resiliency. |
Predictive failure | A failure of the drive is predicted to occur soon. Action: Replace the drive as soon as possible to ensure full resiliency. |
IO error | There was a temporary error accessing the drive. Action: If this state persists, replace the drive to ensure full resiliency. |
Transient error | There was a temporary error with the drive. This error usually means the drive was unresponsive, but it could also mean that the Storage Spaces Direct protective partition was inappropriately removed from the drive. Action: If this state persists, replace the drive to ensure full resiliency. |
Abnormal latency | The drive is sometimes unresponsive and is showing signs of failure. Action: If this state persists, replace the drive to ensure full resiliency. |
Removing from pool | Azure Stack Hub is in the process of removing the drive from its storage pool. Action: Wait for Azure Stack Hub to finish removing the drive, and check the status afterward. If the status remains, contact Support. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles. |
Starting maintenance mode | Azure Stack Hub is in the process of putting the drive in maintenance mode. This state is temporary--the drive should soon be in the In maintenance mode state. Action: Wait for Azure Stack Hub to finish the process and check the status afterward. |
In maintenance mode | The drive is in maintenance mode, halting reads and writes from the drive. This state usually means Azure Stack Hub administration tasks such as PNU or FRU are operating the drive. But the admin could also place the drive in maintenance mode. Action: Wait for Hub Azure Stack Hub to finish the administration task, and check the status afterward. If the status remains, contact Support. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles. |
Stopping maintenance mode | Azure Stack Hub is in the process of bringing the drive back online. This state is temporary - the drive should soon be in another state, ideally Healthy. Action: Wait for Azure Stack Hub to finish the process and check the status afterward. |
Drive health state: Unhealthy
A drive in the Unhealthy state can't currently be written to or accessed.
Operational state | Description |
---|---|
Split | The drive has become separated from the pool. Action: Replace the drive with a new disk. If you must use this disk, remove the disk from the system, make sure there's no useful data on the disk, erase the disk, and then reseat the disk. |
Not usable | The physical disk is quarantined because it's not supported by your solution vendor. Only disks that are approved for the solution and have the correct disk firmware are supported. Action: Replace the drive with a disk that has an approved manufacturer and model number for the solution. |
Stale metadata | The replacement disk was previously used and may contain data from an unknown storage system. The disk is quarantined. Action: Replace the drive with a new disk. If you must use this disk, remove the disk from the system, make sure there's no useful data on the disk, erase the disk, and then reseat the disk. |
Unrecognized metadata | Unrecognized metadata found on the drive, which usually means that the drive has metadata from a different pool on it. Action: Replace the drive with a new disk. If you must use this disk, remove the disk from the system, make sure there's no useful data on the disk, erase the disk, and then reseat the disk. |
Failed media | The drive failed and won't be used by Storage Spaces anymore. Action: Replace the drive as soon as possible to ensure full resiliency. |
Device hardware failure | There was a hardware failure on this drive. Action: Replace the drive as soon as possible to ensure full resiliency. |
Updating firmware | Azure Stack Hub is updating the firmware on the drive. This state is temporary and usually lasts less than a minute and during which time other drives in the pool handle all reads and writes. Action: Wait for Azure Stack Hub to finish the updating and check the status afterward. |
Starting | The drive is getting ready for operation. This state should be temporary--once complete, the drive should transition to a different operational state. Action: Wait for Azure Stack Hub to finish the operation and check the status afterward. |
Reasons a drive can't be pooled
Some drives just aren't ready to be in Azure Stack Hub storage pool. You can find out why a drive isn't eligible for pooling by looking at the CannotPoolReason
property of a drive. The following table gives a little more detail on each of the reasons.
Reason | Description |
---|---|
Hardware not compliant | The drive isn't in the list of approved storage models specified by using the Health Service. Action: Replace the drive with a new disk. |
Firmware not compliant | The firmware on the physical drive isn't in the list of approved firmware revisions by using the Health Service. Action: Replace the drive with a new disk. |
In use by cluster | The drive is currently used by a Failover Cluster. Action: Replace the drive with a new disk. |
Removable media | The drive is classified as a removable drive. Action: Replace the drive with a new disk. |
Not healthy | The drive isn't in a healthy state and might need to be replaced. Action: Replace the drive with a new disk. |
Insufficient capacity | There are partitions taking up the free space on the drive. Action: Replace the drive with a new disk. If you must use this disk, remove the disk from the system, make sure there's no useful data on the disk, erase the disk, and then reseat the disk. |
Verification in progress | The Health Service is checking to see if the drive or firmware on the drive is approved for use. Action: Wait for Azure Stack Hub to finish the process, and check the status afterward. |
Verification failed | The Health Service couldn't check to see if the drive or firmware on the drive is approved for use. Action: Contact Support. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles. |
Offline | The drive is offline. Action: Contact Support. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles. |