Modifying Open XML Documents that are in SharePoint Document Libraries using Web Services

When using the Open XML SDK with SharePoint web services, one of the most basic operations is to get a document from a document library using web services, modify it using the Open XML SDK (and LINQ to XML), and save it back to the document library.  This post describes how to do this, and provides a sample in C#.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCIt is simple to extend this sample to iterate through all documents in a library, apply some changes to each one, and save them back.  In an upcoming post, I’ll present a sample to ‘sanitize’ (remove comments, accept revisions, and remove personal information) all documents in a document library.  This is pretty useful.  I keep a library of documents that I send externally as needed, and it’s always best to not have personal information embedded in the documents.  By running this upcoming sample, I can regularly check to make sure that the document library is clean, even if other folks are editing documents in the library.

For a brief tutorial on SharePoint web services, see “Getting Started with SharePoint (WSS) Web Services using LINQ to XML”.  For this example, you need to add two references to web services (both Lists and Copy).  The procedure for adding a reference to the Copy web service is the same as adding a reference to the Lists web service.

This code uses the Open XML SDK.  Remember to add a reference to the Open XML SDK assembly.  This code uses V1 of the SDK.  It should work with V2 CTP but I haven't tried it.

The code references the System.IO.FileFormatException class, which is in the WindowsBase assembly, so add a reference to it.

This code uses the technique of converting XmlNode to XElement (and back again), as detailed in “Convert XElement to XmlNode (and Convert XmlNode to XElement)”, so that we can use LINQ to XML instead of XmlDocument.

One important aspect of the code is that you retrieve the document as a byte array:

ModifyDoc.CopyWebService.FieldInformation[] fields;
byte[] byteArray;
copy.GetItem(url, out fields, out byteArray);

After retrieving the byte array, you can write the byte array to a MemoryStream, and use the MemoryStream to open an in-memory Open XML document.  After modifying the in-memory document, you can convert it back to a byte array and serialize back to the SharePoint document library.  The technique is described in the post, “Working with In-Memory Open XML Documents”.

Here is the code to serialize it back to the SharePoint document library:

string[] urls = { url };
ModifyDoc.CopyWebService.CopyResult[] copyResults;
copy.CopyIntoItems(url, urls, fields, mem.ToArray(), out copyResults);

Now that we’ve covered these basics, in the near future, I'll show using SharePoint web services and the Open XML SDK to do some more interesting stuff.

Here is the complete listing (the code is added as an attachment to this post):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
using DocumentFormat.OpenXml.Packaging;

namespace ModifyDoc
{
public static class MyExtensions
{
public static XDocument GetXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.Annotation<XDocument>();
if (xdoc != null)
return xdoc;
using (StreamReader sr = new StreamReader(part.GetStream()))
using (XmlReader xr = XmlReader.Create(sr))
xdoc = XDocument.Load(xr);
part.AddAnnotation(xdoc);
return xdoc;
}

public static void PutXDocument(this OpenXmlPart part)
{
XDocument xdoc = part.GetXDocument();
if (xdoc != null)
{
// Serialize the XDocument object back to the package.
using (XmlWriter xw =
XmlWriter.Create(part.GetStream
(FileMode.Create, FileAccess.Write)))
{
xdoc.Save(xw);
}
}
}

public static string StringConcatenate(
this IEnumerable<string> source)
{
return source.Aggregate(
new StringBuilder(),
(s, i) => s.Append(i),
s => s.ToString());
}

public static XElement GetXElement(this XmlNode node)
{
XDocument xDoc = new XDocument();
using (XmlWriter xmlWriter = xDoc.CreateWriter())
node.WriteTo(xmlWriter);
return xDoc.Root;
}

public static XmlNode GetXmlNode(this XElement element)
{
using (XmlReader xmlReader = element.CreateReader())
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(xmlReader);
return xmlDoc;
}
}
}

class Program
{
static void Main(string[] args)
{
string documentLibraryName = "Open XML Documents";
string documentName = "Test.docx";

XNamespace s = "https://schemas.microsoft.com/sharepoint/soap/";
XNamespace rs = "urn:schemas-microsoft-com:rowset";
XNamespace z = "#RowsetSchema";

// Make sure that you use the correct namespace, as well as the correct reference
// name. The namespace (by default) is the same as the name of the application
// when you created it. You specify the reference name in the Add Web Reference
// dialog box.
//
// Namespace Reference Name
// | |
// V V
ModifyDoc.ListsWebService.Lists lists =
new ModifyDoc.ListsWebService.Lists();

// Fix Namespace and Reference Name for the Copy web service too
ModifyDoc.CopyWebService.Copy copy =
new ModifyDoc.CopyWebService.Copy();

// Update the following URL to point to the Lists web service for
// your SharePoint site.
lists.Url = "https://localhost/_vti_bin/Lists.asmx";

lists.Credentials = System.Net.CredentialCache.DefaultCredentials;
copy.Credentials = System.Net.CredentialCache.DefaultCredentials;

XElement listCollection = lists.GetListCollection().GetXElement();

// get the node for the library that we want
XElement library = listCollection
.Elements(s + "List")
.Where(l => (string)l.Attribute("Title") == documentLibraryName)
.FirstOrDefault();

if (library == null)
{
Console.WriteLine("Library {0} doesn't exist.", documentLibraryName);
Environment.Exit(0);
}

// get the ID of the library
string libId = (string)library.Attribute("ID");

XElement item = GetItemByLinkFilename(lists, libId, documentName);

if (item == null)
{
Console.WriteLine("Document {0} doesn't exist.", documentName);
Environment.Exit(0);
}

// get the document from the doc library as a byte array
string url = item.Attribute("ows_EncodedAbsUrl").Value;

ModifyDoc.CopyWebService.FieldInformation[] fields;
byte[] byteArray;
copy.GetItem(url, out fields, out byteArray);

// create a memory stream from the byte array
using (MemoryStream mem = new MemoryStream())
{
mem.Write(byteArray, 0, (int)byteArray.Length);
try
{
// create a WordprocessingDocument from the memory stream
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(mem, true))
{
XNamespace w =
"https://schemas.openxmlformats.org/wordprocessingml/2006/main";

// modify the document as necessary
// for this example, we'll insert a simple paragraph at the
// beginning of the document
XDocument doc = wordDoc.MainDocumentPart.GetXDocument();
doc.Element(w + "document")
.Element(w + "body")
.AddFirst(
new XElement(w + "p",
new XElement(w + "r",
new XElement(w + "t", "Hello, there")
)
)
);

// write the XDocument back into the Open XML document
wordDoc.MainDocumentPart.PutXDocument();
}

// use the Copy web service to save the document back to the
// document library.
string[] urls = { url };
ModifyDoc.CopyWebService.CopyResult[] copyResults;
copy.CopyIntoItems(url, urls, fields, mem.ToArray(), out copyResults);
}
catch (System.IO.FileFormatException e)
{
// document is invalid
Console.WriteLine(e);
Environment.Exit(0);
}
}
}

private static XElement GetItemByLinkFilename(
ModifyDoc.ListsWebService.Lists lists, string libId,
string documentName)
{
XNamespace z = "#RowsetSchema";

// get the XElement for the row that contains info about the document
// that we want to modify
XElement queryOptions = new XElement("QueryOptions",
new XElement("Folder"),
new XElement("IncludeMandatoryColumns", false)
);
XElement viewFields = new XElement("ViewFields");
XElement item = lists.GetListItems(libId, "", null,
viewFields.GetXmlNode(), "", queryOptions.GetXmlNode(), "")
.GetXElement()
.Descendants(z + "row")
.Where(i => (string)i.Attribute("ows_LinkFilename") == documentName)
.FirstOrDefault();
return item;
}
}
}

ModDocument.cs

Comments

  • Anonymous
    January 08, 2009
    En ce début d’année 2009, les personnes étant pour beaucoup en vacances, le web n’a pas regorgé d’une

  • Anonymous
    January 09, 2009
    Here is a list on links that I want to share with you. LINQ for Office Developers Some Office solutions

  • Anonymous
    February 08, 2009
    Dear Eric, All works well but the changes are not copied back to Sharepoint site (Services 3.0). When I look at 'copyresults' it says the document must be checked out first before changes..? Would like to merge this method for accessing the docs together with your "move-insert-delete-paragraphs-in-word-processing-documents" post, is this feasable? Look forward to hearing your comments! Thanks

  • Anonymous
    February 09, 2009
    Hi Kerry, I made a mistake in this blog post.  The Copy web service protocol specification, located at http://msdn.microsoft.com/en-us/library/cc313170.aspx, states, "This protocol does not provide a way to control whether the overwriting of files during the copy operation is allowed."  It also states, "Consider using different protocols for copying files when the protocol client needs to control whether the overwriting of existing files during the copy operation is allowed".  I believe that overwriting works with plain WSS, but perhaps doesn't with MOSS.  I haven't verified this, and the protocol spec doesn't indicate the circumstances when you can't overwrite. I also believe (but haven't verified) that it is possible to always overwrite using frontpage server extensions.  I intend to update this post soon with new code and explanations. Regarding whether it would be feasible to merge this method for accessing the docs with the move-insert-delete-paragraphs code - absolutely!  One of the overloads of the BuildOpenDocument method takes a stream - this can be used with a memory stream.  You can then get the byte array to upload from the memory stream.  This would enable powerful scenarios. -Eric

  • Anonymous
    February 09, 2009
    Thanks Eric, Look forward to your update, its the missing piece in the Jigsaw! Kerry

  • Anonymous
    February 24, 2009
    How would you go about just reading the file instead of copying it? I don't see a way of getting a read stream from just the Lists service. Thanks!

  • Anonymous
    February 24, 2009
    Hi Ryan, Actually, what you get is a byte array.  From that you can create a memory stream, if a stream is what you need.  Does this help? -Eric

  • Anonymous
    March 06, 2009
    Ces dernières semaines furent assez complètes et complexes, et le temps m’a manqué pour partager avec

  • Anonymous
    May 12, 2010
    Hi Eric, I have made use of your excellent example called "Generating Documents from SharePoint with Open XML Content Controls" on how to generate a document from a template held in a SharePoint document library. It handles repeating rows in tables perfectly but I cannot work out how I can change it to also pull out single values from a Sharepoint list and plce then in the body of the same document. I am struggling with the LINQ to XML coding. Any chance of a simple example which will work in the same solution?