Windows Server: Understanding interactions between dedup and VSS

Background

In Windows Server, Data Deduplication (dedup) and Volume Shadow Copy Service (VSS) are two rich storage-related features. Dedup optimizes a storage volume using a set of jobs that optimize, garbage collect, and scrub the file data to reduce the physical storage size required. VSS captures and copies stable volume images for backup on running server systems.

Since dedup modifies the files on the volume (to save physical storage space) and VSS captures modifications to the volume (to create stable volume images), these features can interact in complex ways. These interactions depend on a variety of factors, e.g., volume size, the amount of free space, daily churn, type of churn (e.g. append versus delete/create) and shadow copies history required. Given the range of factors involved, the default configuration settings for dedup and/or VSS may need to be modified to optimize the operation of a particular system configuration and workload.

Issues

Issues between dedup and VSS typically surface as the loss of VSS snapshot history, i.e., VSS snapshot deletion. Two primary factors affect this:

  1. The space available on the volume for writing new data and saving snapshots
  2. The amount of data modified by dedup

Workarounds

There are two primary workarounds for these issues, addressing the two primary factors listed in the previous section. One or both of these workarounds can be implemented on the system.

VSS configuration

Configure VSS to use a separate (possibly dedicated) volume for its diff area (the “shadow storage area”). This can be set using vssadmin.exe as well as other tools.  (See Vssadmin add shadowstorage for instructions on using the vssadmin command.)

Note: There are other performance benefits to having the diff-area on a dedicated volume(s). 

Change dedup garbage collection (GC) setting:  

By default, dedup GC jobs are scheduled to run weekly with every fourth GC job running in “full” mode automatically (so full GC is run approximately monthly). In “full” mode, the GC job updates all dedup container files with any unreferenced blocks which, depending on a variety of factors on the volume, can create a large amount of data for VSS to snapshot. This in turn, can cause VSS to delete older snapshots. (See Deduplication Garbage Collection Overview for further details on GC and these settings.)

Full GC can be disabled via the registry with the following command:

reg add HKLM\System\CurrentControlSet\Services\ddpsvc\Settings /v DeepGCInterval /t REG_DWORD /d 0xffffffff 

Notes:

  • Full GC can still be run on-demand manually using PowerShell:  Start-DedupJob –Type GarbageCollection –Full 
  • For Windows Server 2012 installations, please make sure KB2897997 is installed. (This is not needed for Windows Server 2012 R2.)

Additional Details

As noted above, dedup and VSS are rich features of Windows Server with many configuration options. While we are always learning from customers and try to adjust the defaults and improve the level of auto-tuning in every release,  it isn’t feasible for all of the defaults and auto-tuning to be optimal for all customers. Therefore, we expose Powershell options (and sometimes registry keys) for customers who need to customize these settings for their particular system configurations and workloads.

A full Garbage Collection (GC) job can reclaim more free space than “regular” GC, however, it causes more churn on the volume as every chunk container will be compacted (re-written) if any chunks there are unreferenced. (See Deduplication Garbage Collection Overview for further details on GC and these settings.)

This ‘churn’ can cause increased VSS snapshot deletion in the following cases: 

  • A workload has many file deletes or file in-place writes:  this can cause many chunks to get unreferenced. Also, a broad distribution of deletes across the volume can affect the amount of churn if many chunk containers with old and new chunks experience compaction. 

  • A system with relatively little physical free space:  NTFS will first use free-space which does not cause snapshot diff-area consumption, but if the volume has little free space, NTFS will allocate space for new files in areas which cause ‘copy on write’. When the diff area runs out, VSS will delete the snapshots. 

In general, even without dedup enabled, it is not straightforward to predict how much shadow copy storage space is needed to maintain a certain amount of history. For example, changing files in-place versus appending usually take more shadow copy space, or changing the same file extent repeatedly takes less shadow storage space than changing different extents of a file.

What customers usually do is balance the length of history they maintain (e.g., two weeks of shadow copies) with frequency (e.g., once a day) and storage space they are willing to dedicate. These same considerations apply when dedup is enabled. One recommendation is to monitor the shadow-storage and snapshot history (e.g., using Vssadmin list shadowstorage) so the settings can be adjusted as needed.