Recover Documents from MOSS 2007 Database

Introduction

A couple years ago I wrote a prior post on how a simple VBS script can be used to extract a document from a SharePoint 2003/WSS 2.0 database (Recover Documents from SharePoint 2003 Database). After seeing the traffic routing to the post and with the adoption rate of MOSS 2007 it looks like an update is necessary.

This approach obviates the need to restore a content database into a MOSS 2007 environment if the intention is to extract a few critical documents.

Disclaimer

A few disclaimers are necessary. This script goes directly against a MOSS 2007 content database which is generally discouraged. Any code using the database directly will not be supported by MS Product Support Services. Instead, the SharePoint and WSS APIs are the way to go. If the database is modified directly rather than through the published SharePoint and WSS APIs Product Support Services cannot properly troubleshoot any unexpected issues. This script reads from the database so we're safe from errant modifications. However, the data structure the script queries could change in a future service pack.

Script

With that out of the way, let's proceed.  

The script queries the dbo.AllDocs table which contains the application documents and retrieves the most current version based on the document name which is then streamed out as binary data to a file:

Dim contentDatabase
Dim leaf
Dim outputPath

server = "[SERVERNAME]"
contentDatabase = "[CONTENTDATABASE]"
leaf = "[LEAFNODE]"
outputPath = "[OUTPUTPATH]"

ExtractDoc server, contentDatabase, leaf, outputPath

Sub ExtractDoc(server, contentDatabase, leaf, outputPath)

  Dim conStr, selectStr

  conStr = "Provider=SQLOLEDB;data Source=" + server + ";Initial Catalog=" + contentDatabase + ";Trusted_Connection=yes"

  selectStr = "SELECT dbo.AllDocStreams.Content FROM dbo.AllDocs "
  selectStr = selectStr + "INNER JOIN dbo.AllDocStreams "
  selectStr = selectStr + "  ON dbo.AllDocs.ID= dbo.AllDocStreams.ID "
  selectStr = selectStr + " AND dbo.AllDocs.Level = dbo.AllDocStreams.Level "
  selectStr = selectStr + " where LeafName='" + leaf +"' AND IsCurrentVersion=1"

  Set cn = CreateObject("ADODB.Connection")
  Set rs = CreateObject("ADODB.Recordset")
  cn.Open conStr
  Set rs = cn.Execute(selectStr)
  Set mstream = CreateObject("ADODB.Stream")
  mstream.Type = 1
  mstream.Open
  mstream.Write rs.Fields("Content").Value
  mstream.SaveToFile outputPath, 2
  rs.Close
  cn.Close
End Sub

Copy this code into Notepad and replace [SERVERNAME], [CONTENTDATABASE], [LEAFNODE] and [OUTPUTPATH] with appropriate values. Save this file as a VBS script and execute from the command line as:

C:\>CSCRIPT ExtractDoc.vbs

The SQL Query is a bit more complicated than the SharePoint 2003 version. It joins the dbo.AllDocs table with the dbo.AllDocStreams table which actually contains the blob Content field. There is also a dbo.AllDocVersions table, however, with versioning enabled this table does not appear to be updated as new versions are added. With each new version a new row is added to both the dbo.AllDocs and dbo.AllDocStreams tables. Conveniently, there is an IsCurrentVersion boolean field in the dbo.AllDocs table. The join between the dbo.AllDocs and dbo.AllDocStreams is done between the mutual uniqueidentifier ID fields and a Level field which appears to increment with each new version.

The LeafNode is the name of the file to retrieve. This sample script assumes that the document is in the root of the containing document library. If it were in a subdirectory an addtional DirName would need to be used in the query and passed as a parameter.

NOTE: This was tested with a MOSS 2007 content database. This was not tested with a WSS 3.0 content database, however, I expect the schema is the same. 

This is certainly not something to use in a production environment where automated document retrieval must be a repeatable and reliable process. But it is a quick and dirty means of extracting a document from a restored database that spares you from the overhead of restoring a SharePoint/WSS environment.

This was written using a simple VBS script so that production support folks can use it easily without having to compile a .NET assembly.

References

How to recover SharePoint document once deleted from recycle bin - He talks about using textcopy, which is part of the SQL Server 2000 Resource Kit, to perform the same task. In addition he has a number of screenshots and additional instructions which may prove helpful. The blog entry mentions that, although textcopy is not officially supported for use with SQL Server 2005, it does work. The kit is available for download if you are a MSDN subscriber. Otherwise, it's included in the book,

Comments

  • Anonymous
    June 23, 2008
    PingBack from http://blogs.msdn.com/field_notes/archive/2006/09/17/recover-documents-from-sharepoint-database.aspx

  • Anonymous
    June 24, 2008
    Excellent timing, i spent some time yesterday trying to sort this out. Works perfectly thank you.

  • Anonymous
    September 19, 2008
    With your tool I finally restore a file that was overwrite (not deleted) by a user with other different file. First I restore the previous backup in a test sharepoint environment, only the content database, and then use your tool to extract it from the restored database. Perfect!!! very helpfully. Thanks a lot.

  • Anonymous
    September 20, 2008
    In the case of an overwritten file in a document library it sounds like versioning is enabled. You may have been able to recover the document by downloading the specific version rather than through the use of this script. Nonetheless, I'm glad it worked for you.

  • Anonymous
    October 24, 2008
    I recently threw together a little tool I thought might be helpful to those who find your script helpful.  It is instead a .NET 3.5 Windows Application version of the script you have above.  It is completely untested in any environment other than my own and am not entirely sure yet if it will do the trick for anyone else, but, if you don't mind being an alpha tester of sorts, you're welcome to play with it.  There is a link to the application on my website, or you can go directly to http://www.jonmedel.com/MOSSFileGrabber.rar Please let me know if this tool is helpful to anyone!  Jon

  • Anonymous
    December 05, 2008
    Fantastic, thank you! I needed to extract one particular version of a document that MOSS was seeing as corrupt. Maybe this will help someone else. I started with a SQL query to find the correct version: The sql query that I used to identify the doc and version:  SELECT dbo.AllDocs.TimeLastModified fields , dbo.AllDocs.DirName, dbo.AllDocs.LeafName, dbo.AllDocs.Version FROM dbo.AllDocs INNER JOIN dbo.AllDocStreams ON dbo.AllDocs.ID = dbo.AllDocStreams.ID AND dbo.AllDocs.Level = dbo.AllDocStreams.Level where dbo.AllDocs.LeafName='my file name aka 'LEAFNODE' I then replaced 'AND IsCurrentVersion=1' with 'dbo.AllDocs.Version=[version number from table]' and successfully extracted the particular version of the file that I needed. -Mindy

  • Anonymous
    December 09, 2008
    Thanks so much for this excellent utility! Saved me a bunch of hassle!

  • Anonymous
    April 01, 2009
    The script is extremely helpful and I'm sure I will use it - however my current problem is that the file I want to recover has an ' in the name.  The script errors out with a syntax error even though i've encased it with "".  can anyone tell me how to get around this issue? Thanks much!

  • Anonymous
    April 17, 2009
    The comment has been removed

  • Anonymous
    May 13, 2009
    sh, Sorry for the late response. By now, you've probably figured this out. The output path needs to be a path to a file, not a just a directory.

  • Anonymous
    September 16, 2009
    The comment has been removed

  • Anonymous
    September 17, 2009
    Are you going against a MOSS 2007 content database? You may be hitting an admin database or a SharePoint 2003 database, in which case you need the script referenced here: http://blogs.msdn.com/field_notes/archive/2006/09/17/recover-documents-from-sharepoint-database.aspx

  • Anonymous
    November 06, 2009
    Major props, this worked like a charm and saved my colleague a ton of time.

  • Anonymous
    December 14, 2009
    Thanks so much, this was the part I needed. I was able to modify this to export all the images from our news list so they can be consumed by our external site.

  • Anonymous
    December 16, 2009
    And...is there any way to restore all versions of some document? Because user used webdav for file moving, not web, so he lost all versions of that document.