Document Assembly: Merging Excel, PowerPoint, and Word Content Together

Document assembly seems to be a hot topic these days especially when combined with the power of SharePoint. Today, I want to show you a pretty rich document assembly solution that is able to take multiple Word, Excel and PowerPoint documents and merge them all together to form a final Word document. I showed this solution at both PDC and SPC, so I am going to take the opportunity in this blog to discuss some of the details around the solution.

If you want to jump straight into the code, feel free to download the solution here.

Scenario

Imagine a scenario where I work for a company that analyzes stocks and generates reports for every company/stock analyzed. These reports are typically quite rich and usually involve more than one person contributing to the content. Content is separated out into multiple Word, Excel, and PowerPoint documents where each document is assigned to an individual. Once all the content has been written, the content is all assembled into a final report as a Word document. My company has asked me to write a solution that will be able to merge all these documents programmatically.

Solution

Before I get into the details of my solution, I want to talk about a brand new feature of SharePoint 2010, called Document Sets, which I will leverage to help solve this problem. Document Sets gives users a new way to manage a collection of documents as a single object. Think of this feature as allowing for a binder of related content.

In the case of this solution, I have defined a custom Document Set as having a set of files (six in my case) that correspond to the various components of the final analysis report. Here is a screenshot of a Document Set for a company called Contoso:

image

 

Using Document Sets and given the scenario I talked about above we will need to take the following actions:

  1. Create a set of documents that represent the Document Set for my solution. One of the documents will be a Word template that represents the look and feel of the final report
  2. Create and add a custom Document Set to a SharePoint library
  3. Create a web part for the SharePoint library that has a button labeled "Assemble Documents", which will be used to merge all the content together into a final document. This web part will be added to the Document Set library
  4. Using the Open XML SDK, open the template document from the Document Set and look for all content controls
  5. For every content control found, find the corresponding document content in the library and merge that content into the final document
  6. Once the document assembly is complete, offer the user the ability to open or save the report

Step 1 – Creating the Right Template

Setting up the right template makes all the difference when creating Office document solutions. In the case of my solution I have a Document Set with six files, where one of the files is my template file. All these documents are empty except for the template document. As in many of my previous posts, the template document represents the final look of the final report. I will leverage content controls within my template to specify semantic regions within my document to be used to merge content together. The title of my content controls will represent the type of content to be merged. For example, a Word document, a chart from a spreadsheet, a table from a spreadsheet, or a SmartArt graphic from a presentation. The content of my content control will represent the name of the file that contains the content to be merged. For example, here is a screenshot of my template document highlighting one of my content controls labeled "Word:Document" with content set to "Introduction":

image

 

This content control will represent the region where I will merge the Introduction Word document into my template file. For sake of completeness the other content controls are labeled "Spreadsheet:Chart", "Spreadsheet:Table", and "Presentation:SmartArt".

You can find the files that represent my Document Set here.

Step 2 – Setting Up Document Sets in SharePoint 2010

Perform the following steps to enable Document Sets on your SharePoint 2010 site:

  1. Site Actions | Site Settings (top level site settings)
  2. Site Collections Administration | Site collection features
  3. Activate Document Sets feature, if not already activated
  4. Go to the Document Library where you want to add the Document Sets feature
  5. Site Actions | Site Settings
  6. Galleries | Site content types
  7. Create
  8. Set "Select parent content type from:" to "Document Set Content Types"
  9. Specify a name and click OK
  10. In the Site Content Types list click on the link to the created Document Set
  11. Document Set settings
  12. Specify the default content (in my case I added six documents to the set) and click OK
  13. Go to the Document Library where you want to add the Document Sets feature
  14. Library Tools | Library | Library Settings
  15. Advanced Settings
  16. Choose "Yes" for "Allow management of content types?" and click OK
  17. Content Types | Add from existing site content types
  18. Add your custom content type for the Document Set

At this point you should have a library set up with a Document Set content type.

Step 3 – Create a Web Part with an Assemble Documents Command

In order to make this solution usable, we need to add a command within our document library that allows users to merge documents together. The easiest way to accomplish this task is to create a Web Part within Visual Studio 2010. Once you've created the Web Part modify the CreateChildControls

method as follows:

protected override void CreateChildControls() { Control control = this.Page.LoadControl(_ascxPath); Controls.Add(control); base.CreateChildControls(); Button btnSubmit = new Button(); btnSubmit.Text = "Assemble Documents"; btnSubmit.Click += new EventHandler(btnSubmit_Click); Controls.Add(btnSubmit); }

This method will add a new button control where we can add our logic to merge the documents contained within a given Document Set (the merge code will be called from void btnSubmit_Click(object sender, EventArgs e).

Once we've created this web part, the next step is to add the button to our Document Set. The easiest way to accomplish this task is to use SharePoint Designer 2010. Perform the following steps to add your custom web part to the Document Set library:

  1. Open your SharePoint site within SharePoint Designer 2010
  2. Navigate to your document library that contains the Document Set
  3. All Files | Custom document library name | Forms | Document Set name | XXXXhomepage.aspx
  4. Click on the bottom of WebPartZone_CenterMain
  5. Insert | Web Part | select your custom web part created from Visual Studio image
  6. Save and Close

At this point you should see an "Assemble Documents" command show up for any created Document Set:

image

 

Step 4 – Finding the Content Controls

Finding content controls within a document involves the following steps:

  1. Open template document from the SharePoint site
  2. Load document into memory
  3. Open document (from memory) with the Open XML SDK
  4. Loop through all content controls within the document
  5. For every content control figure out the title
  6. Based on the title perform a particular action

The following code accomplishes the steps outlined above:

void btnSubmit_Click(object sender, EventArgs e) { SPFolder folder = SPContext.Current.ListItem.Folder; char[] splitter = { '/' }; string[] folderName = folder.Name.Split(splitter); string filePrefix = @"Stock Analysis Demo/" + folderName[0] + "/" + folderName[0];   SPFile template = folder.Files[filePrefix + " - Template.docx"]; SPFile file; byte[] byteArray = template.OpenBinary();   using (MemoryStream mem = new MemoryStream()) { mem.Write(byteArray, 0, (int)byteArray.Length);   using (WordprocessingDocument myDoc = WordprocessingDocument.Open(mem, true)) { MainDocumentPart mainPart = myDoc.MainDocumentPart;   foreach (Word.SdtElement sdt in mainPart.Document .Descendants<Word.SdtElement>().ToList()) { Word.Alias alias = sdt.Descendants<Word.Alias>() .FirstOrDefault(); if (alias != null) { string sdtTitle = alias.Val.Value;   if (sdtTitle == "Spreadsheet:Table") { file = folder.Files[filePrefix + " - " + sdt.InnerText + ".xlsx"]; ImportTableFromSpreadsheet(mainPart, sdt, file); } else if (sdtTitle == "Spreadsheet:Chart") { file = folder.Files[filePrefix + " - " + sdt.InnerText + ".xlsx"]; ImportChartFromSpreadsheet(mainPart, sdt, file); } else if (sdtTitle == "Presentation:SmartArt") { file = folder.Files[filePrefix + " - " + sdt.InnerText + ".pptx"]; ImportSmartArtFromPowerPoint(mainPart, sdt, file); } else if (sdtTitle == "Word:Document") { file = folder.Files[filePrefix + " - " + sdt.InnerText + ".docx"]; AddAltChunk(mainPart, sdt, file); } } } }   ... } }

Step 5 – Merging Content Together

In this solution, there are four types of content to be merged:

  1. Word documents
  2. SmartArt graphics from presentations
  3. Charts from spreadsheets
  4. Tables of data from spreadsheets

Let's talk about each of these content types.

Step 5a – Merging Word Documents Together

By far the easiest way to assemble Word documents together is to take advantage of altChunks, which I have already blogged about in the past. In any case, here is the code necessary to merge documents together on SharePoint:

protected int id = 1;   void AddAltChunk(MainDocumentPart mainPart, Word.SdtElement sdt, SPFile filename) { string altChunkId = "AltChunkId" + id; id++; byte[] byteArray = filename.OpenBinary();   AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart( AlternativeFormatImportPartType.WordprocessingML, altChunkId);   using (MemoryStream mem = new MemoryStream()) { mem.Write(byteArray, 0, (int)byteArray.Length); mem.Seek(0, SeekOrigin.Begin); chunk.FeedData(mem); }   Word.AltChunk altChunk = new Word.AltChunk(); altChunk.Id = altChunkId; //Replace content control with altChunk information   OpenXmlElement parent = sdt.Parent; parent.InsertAfter(altChunk, sdt); sdt.Remove();   }

Pretty easy stuff!

Step 5b – Import SmartArt Graphics from PowerPoint to Word

Again, I've already blogged about importing SmartArt from PowerPoint to Word. Here is the code necessary to accomplish this task:

void ImportSmartArtFromPowerPoint(MainDocumentPart mainPart, Word.SdtElement sdt, SPFile filename) { string docLayoutPartId = ""; string docDataPartId = ""; string docColorsPartId = ""; string docStylePartId = "";   byte[] byteArray = filename.OpenBinary();   using (MemoryStream mem = new MemoryStream()) { mem.Write(byteArray, 0, (int)byteArray.Length);   using (PresentationDocument myPres = PresentationDocument.Open(mem, true)) { PresentationPart presPart = myPres.PresentationPart; //Get the slide that contains the SmartArt graphic SlidePart slide = (SlidePart)presPart.GetPartById("rId3"); //Get all the appropriate parts associated with the SmartArt DiagramLayoutDefinitionPart layoutPart = slide.DiagramLayoutDefinitionParts.First(); DiagramDataPart dataPart = slide.DiagramDataParts.First(); DiagramColorsPart colorsPart = slide.DiagramColorsParts.First(); DiagramStylePart stylePart = slide.DiagramStyleParts.First();   //Get some of the appropriate properties off the SmartArt graphic PPT.GraphicFrame graphicFrame = slide.Slide.Descendants<PPT.GraphicFrame>().First(); PPT.NonVisualDrawingProperties drawingPr = graphicFrame .Descendants<PPT.NonVisualDrawingProperties>().First(); Draw.Extents extents = graphicFrame.Descendants<Draw.Extents>().First();   //Import SmartArt into Word document //Add the SmartArt parts to the Word document DiagramLayoutDefinitionPart docLayoutPart = mainPart.AddPart<DiagramLayoutDefinitionPart>(layoutPart); DiagramDataPart docDataPart = mainPart.AddPart<DiagramDataPart>(dataPart); DiagramColorsPart docColorsPart = mainPart.AddPart<DiagramColorsPart>(colorsPart); DiagramStylePart docStylePart = mainPart.AddPart<DiagramStylePart>(stylePart); //Get all the relationship ids of the added parts docLayoutPartId = mainPart.GetIdOfPart(docLayoutPart); docDataPartId = mainPart.GetIdOfPart(docDataPart); docColorsPartId = mainPart.GetIdOfPart(docColorsPart); docStylePartId = mainPart.GetIdOfPart(docStylePart);   //Use the document reflector to figure out how to add a SmartArt //graphic to Word //Change attribute values based on specifics related to the SmartArt Word.Paragraph p = new Word.Paragraph( new Word.Run( new Word.Drawing( new WP.Inline( new WP.Extent() { Cx = extents.Cx, Cy = extents.Cy }, new WP.EffectExtent() { LeftEdge = 0L, TopEdge = 0L, RightEdge = 0L, BottomEdge = 0L }, new WP.DocProperties() { Id = drawingPr.Id, Name = drawingPr.Name }, new WP.NonVisualGraphicFrameDrawingProperties(), new Draw.Graphic( new Draw.GraphicData( new Dgm.RelationshipIds() { DataPart = docDataPartId, LayoutPart = docLayoutPartId, StylePart = docStylePartId, ColorPart = docColorsPartId } ) { Uri = "https://schemas.openxmlformats.org/drawingml/2006/diagram" }) ) { DistanceFromTop = (UInt32Value)0U, DistanceFromBottom = (UInt32Value)0U, DistanceFromLeft = (UInt32Value)0U, DistanceFromRight = (UInt32Value)0U })));   //Swap out the content control for the SmartArt OpenXmlElement parent = sdt.Parent; parent.InsertAfter(p, sdt); sdt.Remove(); } } }

The only difference between the code above and the code I showed you in my previous post is that the code above works with files on SharePoint.

Step 5c – Import Charts from Excel to Word

Again, I am going to leverage the same code I showed you in a previous post called importing charts from spreadsheets to Word documents. Here is the same code modified to work with files that exist within SharePoint:

void ImportChartFromSpreadsheet(MainDocumentPart mainPart, Word.SdtElement sdt, SPFile spreadsheetFileName) { //create a new paragraph that has an inline drawing object Word.Paragraph p = new Word.Paragraph(); Word.Run r = new Word.Run(); p.Append(r); Word.Drawing drawing = new Word.Drawing(); r.Append(drawing); //These dimensions work perfectly for my template document WP.Inline inline = new WP.Inline( new WP.Extent() { Cx = 5486400, Cy = 3200400 });   byte[] byteArray = spreadsheetFileName.OpenBinary();   using (MemoryStream mem = new MemoryStream()) { mem.Write(byteArray, 0, (int)byteArray.Length);   //Open Excel spreadsheet using (SpreadsheetDocument mySpreadsheet = SpreadsheetDocument.Open(mem, true)) { //Get all the appropriate parts WorkbookPart workbookPart = mySpreadsheet.WorkbookPart; WorksheetPart worksheetPart = XLGetWorksheetPartByName(mySpreadsheet, "Sheet2"); DrawingsPart drawingPart = worksheetPart.DrawingsPart; ChartPart chartPart = (ChartPart)drawingPart.GetPartById("rId1");   //Clone the chart part and add it to my Word document ChartPart importedChartPart = mainPart.AddPart<ChartPart>(chartPart); string relId = mainPart.GetIdOfPart(importedChartPart);   //The frame element contains information for the chart GraphicFrame frame = drawingPart.WorksheetDrawing.Descendants<GraphicFrame>().First(); string chartName = frame.NonVisualGraphicFrameProperties.NonVisualDrawingProperties.Name; //Clone this node so we can add it to my Word document Draw.Graphic clonedGraphic = (Draw.Graphic)frame.Graphic.CloneNode(true); ChartReference c = clonedGraphic.GraphicData.GetFirstChild<ChartReference>(); c.Id = relId;   //Give the chart a unique id and name WP.DocProperties docPr = new WP.DocProperties(); docPr.Name = chartName; docPr.Id = GetMaxDocPrId(mainPart) + 1;   //add the chart data to the inline drawing object inline.Append(docPr, clonedGraphic); drawing.Append(inline); } } OpenXmlElement parent = sdt.Parent; parent.InsertAfter(p, sdt); sdt.Remove(); }     WorksheetPart XLGetWorksheetPartByName(SpreadsheetDocument document, string sheetName) { WorkbookPart wbPart = document.WorkbookPart;   // Find the sheet with the supplied name, and then use that Sheet object // to retrieve a reference to the appropriate worksheet. Excel.Sheet theSheet = wbPart.Workbook.Descendants<Excel.Sheet>() .Where(s => s.Name == sheetName).FirstOrDefault();   if (theSheet == null) { throw new ArgumentException("sheetName"); }   return (WorksheetPart)(wbPart.GetPartById(theSheet.Id)); }     uint GetMaxDocPrId(MainDocumentPart mainPart) { uint max = 1;   //Get max id value of docPr elements foreach (WP.DocProperties docPr in mainPart.Document.Descendants<WP.DocProperties>()) { uint id = docPr.Id; if (id > max) max = id; } return max; }

Step 5d – Import Table of Data from Excel to Word

In a previous post I showed you how to import a table from Word to Excel. Today, I will show you how to do the reverse. Here are the steps necessary to accomplish this task:

  1. Create a Word table with the Open XML SDK that will hold the data from Excel
  2. Open the Excel file with the Open XML SDK
  3. Retrieve the specific sheet that contains the table of data
  4. In the found sheet, locate the SheetData object, which contains the table of data
  5. For every row in the SheetData object, retrieve all the cell data
  6. Once all data in the row is retrieved, create a Word row with the same data
  7. Append the created Word row to the Word table

To help with the tasks listed above I am going to take advantage of some of the Open XML SDK code snippets that were published. Here is the code necessary to accomplish the tasks outlined above:

void ImportTableFromSpreadsheet(MainDocumentPart mainPart, Word.SdtElement sdt, SPFile spreadsheetFileName) { ArrayList cellText = new ArrayList();   //create Word table Word.Table tbl = new Word.Table(); Word.TableProperties tblPr = new Word.TableProperties(); Word.TableStyle tblStyle = new Word.TableStyle(); tblStyle.Val = "LightShading-Accent1"; tblPr.AppendChild(tblStyle);   Word.TableWidth tblW = new Word.TableWidth(); tblW.Width = 5000; tblW.Type = Word.TableWidthUnitValues.Pct; tblPr.Append(tblW);   tbl.AppendChild(tblPr);   byte[] byteArray = spreadsheetFileName.OpenBinary();   using (MemoryStream mem = new MemoryStream()) { mem.Write(byteArray, 0, (int)byteArray.Length);   using (SpreadsheetDocument mySpreadsheet = SpreadsheetDocument.Open(mem, true)) { WorkbookPart workbookPart = mySpreadsheet.WorkbookPart; WorksheetPart worksheetPart = XLGetWorksheetPartByName(mySpreadsheet, "Sheet1");   Excel.SheetData sheetData = worksheetPart.Worksheet.GetFirstChild<Excel.SheetData>();   foreach (Excel.Row r in sheetData) { foreach (Excel.Cell c in r) { cellText.Add(XLGetCellValue(c, workbookPart)); } Word.TableRow tr = CreateRow(cellText); tbl.Append(tr); cellText = new ArrayList(); } } }   //Swap out the content control for the SmartArt OpenXmlElement parent = sdt.Parent; parent.InsertAfter(tbl, sdt); sdt.Remove(); }   Word.TableRow CreateRow(ArrayList cellText) { Word.TableRow tr = new Word.TableRow(); //create cells with simple text foreach (string s in cellText) { Word.TableCell tc = new Word.TableCell(); Word.Paragraph p = new Word.Paragraph(); Word.Run r = new Word.Run(); Word.Text t = new Word.Text(s); r.AppendChild(t); p.AppendChild(r); tc.AppendChild(p); tr.AppendChild(tc); }   return tr; }   // Get the value of a cell, given a file name, sheet name, and address name. string XLGetCellValue(Excel.Cell c, WorkbookPart wbPart) { string value = null;   // If the cell doesn't exist, return an empty string: if (c != null) { value = c.InnerText;   // If the cell represents an integer number, you're done. // For dates, this code returns the serialized value that // represents the date. The code handles strings and booleans // individually. For shared strings, the code looks up the corresponding // value in the shared string table. For booleans, the code converts // the value into the words TRUE or FALSE. if (c.DataType != null) { switch (c.DataType.Value) { case Excel.CellValues.SharedString: // For shared strings, look up the value in the shared strings table. var stringTable = wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault(); // If the shared string table is missing, something's wrong. // Just return the index that you found in the cell. // Otherwise, look up the correct text in the table. if (stringTable != null) { value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText; } break;   case Excel.CellValues.Boolean: switch (value) { case "0": value = "FALSE"; break; default: value = "TRUE"; break; } break; } } }   return value; }

Step 6 – Offer Assembled Document to User

The last step is to offer the assembled document to the user with an Open/Save/Cancel dialog as follows:

image

 

Here is the code snippet to create this dialog based on a document that is in memory:

void btnSubmit_Click(object sender, EventArgs e) { ... HttpResponse resp = HttpContext.Current.Response; resp.ClearContent(); resp.ClearHeaders(); resp.AddHeader("Content-Disposition", "attachment; filename=Assembled Document.docx"); resp.ContentEncoding = System.Text.Encoding.UTF8; resp.OutputStream.Write(mem.ToArray(), 0, (int)mem.Length); resp.Flush(); resp.Close(); resp.End(); }

End Result

Putting everything together and running the code we end up with a document that contains all the content contained within our library merged into a final report:

image

 

Pretty cool stuff!

Zeyad Rajabi

Comments

  • Anonymous
    January 04, 2010
    Hi, Is it possible for me to automate the whole process from creating of the word document to exporting it to PDF without any human intevention? Is is possible to create an application that runs in the background keeping an eye on the folder with word docs and creating it to PDF the moment something comes up?

  • Anonymous
    January 04, 2010
    Karim – Yes, it is possible to automate the entire process. I would recommend using Word Automation Services (http://blogs.msdn.com/microsoft_office_word/archive/2009/12/16/Word-Automation-Services_3A00_-What-It-Does.aspx) to create a PDF file from a Word document. You can leverage workflows/events to track when documents on SharePoint are created/saved. Hope that helps.

  • Anonymous
    January 04, 2010
    This is so cool and very similar in concept to my SharePoint based solution.   What I've struggling with is to find a way to extract PowerPoint slides as graphic images or individual slide object which can then be embedded within the resulting output document(Smart Art import is cool but only gets you so far).  I can do this with 3rd party tools such as Aspose.Net but is there a way to do this with the SDK?

  • Anonymous
    January 05, 2010
    The comment has been removed

  • Anonymous
    January 05, 2010
    The comment has been removed

  • Anonymous
    January 05, 2010
    PowerPoint has a function to send slides to Word for use as handouts etc.  In 2007 it would seem that each slide is embedded as a PowerPoint object which is individually editable.  I suspect that I could use the SDK to do the same by chopping a copy of the source PowerPoint into a series of single slide presentations and embedding them one by one into the output Word document. I believe when you Send to Word from older versions of PowerPoint the slides come over as jpg images (usually embedded within a Word table) and this is actually all I need.   Ideally what I'd want to do is to grab a slide and be able to save it out as a graphic file or stream and then use the SDK to import the resulting image into the Word composite document. Alternatively, I see in the SDK references to thumbnail objects.  Is there a way to create an internal thumbnail graphic for each slide which could then be accessed and embedded into Word.

  • Anonymous
    January 05, 2010
    The comment has been removed

  • Anonymous
    January 06, 2010
    @Rick – Glad to hear you got your solution to work. @Colin – Interesting scenario. Unfortunately, the Open XML SDK is unable to generate an image for each of the slides within a deck. In addition, there isn’t a PowerPoint Service within SharePoint that is able to accomplish this task. As you mention, you can embed each of the slides as a PowerPoint file within a Word document. The thumbnail approach only works if the file the “Save Thumbnail” checkbox has been checked on document save (Save As dialog). In other words, the thumbnail needs to be generated by the application, which is probably not possible with your solution.

  • Anonymous
    January 07, 2010
    Ugg!  I'm sure I can't be the only person wanting to be able to extract PowerPoint slides as images. If PowerPoint services dosen't exist it does beg the question as to how SharePoint manages to extract these images to the _w folder of a Slide Library.

  • Anonymous
    January 11, 2010
    Colin - No you're not the only one :) I've given this feedback to the PowerPoint team (it might be a good idea to ask them the same question on their blog). I believe the Slide Library feature relies on PowerPoint client to help create images for each of the slides.