Troubleshooting Data Deduplication Corruptions

Beyond the significant storage savings provided by the data deduplication feature in Window Server, deduplication also provides functionality to detect, report and even repair data corruptions.

Because large numbers of deduplicated files may be referencing a single popular chunk that gets corrupted, data integrity is taken very seriously by deduplication. A number of features are built into deduplication to help protect against corruption.

**Detection: ** Checksum validation is automatically done on all data and metadata on a deduplication-enabled volume whenever it is read or written.

**Reporting: ** By default, a regular weekly scrubbing job automatically inspects all data for corruptions and logs any corruption into the Deduplication Scrubbing event log.

**Redundancy: ** Copies of all metadata and popular data chunks are stored and used as alternate backup copies in the event a corruption is detected.

**Repair: ** If a data corruption is detected, deduplication will attempt to replace the corrupted data using its own redundant copies in the case of soft corruptions such as bit flips or torn writes. If Mirrored Storage Spaces are in use, deduplication will use the good mirrored copy.

Because of these extra validations built into deduplication, the deduplication subsystem is often the first system to report any early signs of data corruption in the hardware or file system.

Causes of Corruption

Despite using checksums, redundancy and repair jobs, there are still cases where deduplication will not be able to automatically recover from corruption.

Some of the most common causes for deduplication to report corruption are:

  1. Incompatible Robocopy options used when copying data
  2. Incompatible Backup/Restore program used on a dedup volume
  3. Migrating a deduplicated volume to a down-level Windows Server version
  4. Enabling compression on volume roots also enabled with deduplication
  5. Hardware issues
  6. File System corruption

Below are a number of recommendations when attempting to troubleshoot these types of deduplication corruption issues.

1) If Robocopy was used on the volume:

Using Robocopy with the /MIR option and with volume root as the target will wipe the deduplication store. Refer to the KB article here for a detailed description of the issue and a workaround: http://support.microsoft.com/kb/2834834

2) If Backup/Restore was used on the volume:

Verify the backup solution supports deduplication:

  1. Some backup vendors support “unoptimized backup”, which rehydrates the files upon backup (backup the files as ‘normal files’, full size)
  2. Some backup vendors support “optimized backup” for full volume backup, which backup the files as-is (as reparse point stub) with the chunk-store 
  3. Some backup vendors support both

Unsupported backup solutions will introduce corruptions after restore.

The backup vendor should be able to comment what their product supports and with which version. 

This is explained in detail here:  http://msdn.microsoft.com/en-us/library/hh769304(v=vs.85).aspx

3) If the deduplicated volume was migrated to a new server:

When an older version of Windows Server dedup tries to access files that were optimized by a later version of the operating system, a “File is corrupted” message will be reported.

Verify the version of the server accessing the deduplicated data is the same version level or higher than the version of the server that deduplicated the data on the volume.

Dedup is backward compatible but not forward compatible – you can upgrade and migrate to a newer version of Windows Server dedup, but older versions of Windows Server dedup cannot read data deduplicated by a newer version of Windows Server and will report the data as corrupted when trying to read.

The data will be accessible from a server with deduplication running the same version or higher as the deduplicated data.

4) Enabling compression on root of a volume with deduplication also enabled

Deduplication is not supported on volumes that have compression enabled at the root. This may lead to deduplicated files becoming corrupt and not being accessible.

Note deduplication on files in compressed folders is supported and functions normally.

5) Hardware Issues

Many hardware storage issues are detected early by the deduplication scrubbing job. Check the Deduplication Scrubbing Event logs for cases of early file corruption and attempted corruption fixes by the scrubbing job (see ‘Check Event Logs for details of corruption’ below). In addition, searching for hardware events in the system event logs and storage spaces event logs will often give additional information about hardware issues.

Any early reports of corruption in the deduplication scrubbing event logs should be verified by checking for signs of hardware issues or file system corruption with CHKDSK and ensuring current backups are in place if the hardware is suspect.

Running Get-DedupMetadata | fl is a quick way to see if there are any corruptions being reported by the scrubbing job.

General Corruption Troubleshooting:

1) Check Event Logs for details of corruption

Any corruption detected by deduplication is logged to the event log. The Scrubbing channel lists any corruptions that were detected and files that were attempted to be fixed by the job.

The deduplication event logs can be found here:

Event Viewer -> Application and Services-> Microsoft –> Windows –> Deduplication -> Scrubbing

The potentially large number of dedup scrubbing events may be difficult to parse through via the Event Viewer.

A publicly available script on Microsoft TechNet Script Center, “Get-DedupScrubbingReport” generates an easy-to-read HTML that highlights detected corruptions and the results of any attempted corruption fixes from the scrubbing job

2) Run CHKDSK in Read-Only Mode

Run Chkdsk.exe in read-only mode to get additional information about corruptions.

Note no arguments specifies a read-only scan.

Example:

Chkdsk.exe <targetvolume>

3) Run “Deep Scrubbing Job” to fix detected corruptions

Deep scrubbing is a must for corruption investigations. A deep scrubbing job should be run so all corruptions are logged in the deduplication scrubbing channel. The scrubbing events will provide a breakdown of the corruptions (corrupted chunks, affected files) as well as the exact container offsets of the corruption plus the list of affected files (up to 10K files). 

PowerShell to manually start deep scrubbing job:

Start-DedupJob <targetvolume> -Type Scrubbing –Full

Note: Deep scrubbing is never run by default and needs to be run manually.