Troubleshooting Deduplication When Volume Becomes Full

Article
1/17/2024

Summary

If the deduplication volume becomes full the following symptoms result:

For VDI and virtualized backup workloads, any running Hyper-V virtual machines with virtual hard disks on the volume will be put into a pause-critical state. The VMs will remain paused and unresponsive.
General Purpose File Servers (GPFS) using deduplication will report standard out of storage free space errors via the operating system.

Note: There is no event log indicating deduplication is not keeping up with the data churn and volume is running out of free space. The only event log we have is for “disk full”, which is too late.

Cause

As with any storage system, a deduplication enabled volume will become full when enough data is written to it. The amount of data a deduplicated volume can store depends on the underlying physical device, the level of deduplication in the dataset and the ability of the system (IO storage subsystem, memory available, CPU speed) to complete the deduplication optimization processing for the amount of daily data churn.

A volume with deduplication enabled can become full when the deduplication savings percentage is not high enough or the dedup optimization job cannot keep up with optimizing the data churn.

Troubleshooting Steps

To recover from this condition, it is necessary to free up space on the volume. When deduplication is involved, simply deleting files may not be sufficient. Often, additional steps will be necessary to free up space.

Note the exact amount of space freed is difficult to predict due to the need to run a garbage collection job to clean up references to deduplicated container data that is no longer referenced by existing files.

For dedup volumes using the HyperV or Backup UsageType, one method is to Storage Migrate the virtual hard disk files of the virtual machine to a backup location and then run a Garbage Collection (GC) job on the deduplication volume that was full in order to reclaim free space.

Specifically,

Create an additional volume and share using the same configuration and settings used for the existing share, including settings for NTFS and deduplication.
Storage Migrate the virtual hard disks of the paused virtual machines to the new additional volume. See Storage Live Migration on Windows Server 2012 for a how-to video.
Run the data deduplication garbage collection (GC) on the original share which was full.
Start-DedupJob <targetvolume> -Type GarbageCollection
The GC job should succeed and reclaim free space
Restart the affected VMs.

For virtualized backup scenarios, refer to the DPM-Dedup blog post. The “Monitoring Backup Storage” section documents specific steps for recovering a DPM server that runs out of free space.

For dedup volumes using the General Purpose File Servers UsageType, moving some files to an alternate location and then manually running a Garbage Collection job as above will reclaim some free space.

As a best practice, regular monitoring of a volume’s current dedup savings rate and free space is recommended to avoid running out of space on a volume. The amount of free space available on a dedup volume is displayed in the Server Manager management UI under ‘File and Storage Services’ à ‘Volumes’ or can be queried via PowerShell using the command Get-DedupStatus | fl

Determine proper deduplication volume sizing

The following blog article details how to determine if a deduplication configuration is able to keep up with its regular workload.

http://blogs.technet.com/b/filecab/archive/2014/12/04/sizing-volumes-for-data-deduplication-in-windows-server.aspx

Having a properly sized volume for a workload is critical for avoiding scenarios where the deduplication volume completely runs out of free space. If a deduplication server cannot optimize all the new data fast enough in the amount of time given to it, the dedup volume may run out of space.

Because deduplication jobs are single-threaded, overall optimization throughput can be greatly improved by enabling deduplication on multiple volumes so the optimization jobs are better parallelized to take advantage of multiple CPUs. Completing optimization faster will help deduplication volumes keep up better with optimizing data.

Check latest deduplication QFEs for performance are installed

It is important all the deduplication QFEs for optimization and Garbage Collection are installed. Not having these QFEs installed could cause GC and optimization jobs to fail to keep up with churn and result in volumes becoming completely filled.

November 2014 Rollup for Windows Server 2012 R2: http://support.microsoft.com/kb/3000850
Deduplication Garbage Collection Job does not work as expected in Windows Server 2012: https://support.microsoft.com/en-us/kb/2897997/en-us

Verify deduplication server is not in an unsupported configuration

Deduplication has been fully tested only in a limited number of officially supported configurations. VDI and virtualized backup configurations on deduplicated volumes are only officially supported if configured on a Scale-Out File Server. Having deduplication on the same host as Hyper-V may result in deduplication getting too little of the system resources and not be able to keep up with the expected data churn. This can cause dedup volumes to fill up prematurely.

Other causes for running out of space on the dedup volume

Verify dedup schedules are enabled. If the regular garbage collection job or optimization jobs were inadvertently disabled, this can cause the volume to run out of space sooner than expected.

PowerShell cmdlet: Get-DedupSchedule | fl

Verify if optimization is keeping up. If the optimization window is not long enough to optimize all the changing data, the dedup volume may fill up sooner than expected. The dedup volume sizing blog post details ways to measure this.

Verify if Garbage Collection is keeping up. Verify GC is configured properly. For instance, GC window is too short or run too infrequently.

Verify there are no other unexpected processes that are consuming memory. This could cause the optimization jobs to run too slowly to keep up with the amount of data churn.

Share via