Announcing the Release of the December 2009 CTP for the Open XML SDK

I'm really happy to announce the 4th CTP for the Open XML SDK 2.0 for Microsoft Office! There were four major improvements we made to the SDK:

  1. Full support for the Office 2010 Open XML formats
  2. Office 2010 schema and semantic level validation
  3. General improvements based on a recent Open XML SDK usability study
  4. Open XML SDK tools improvement

Full Support for the Office 2010 Open XML Formats

With the latest CTP you are able to create, edit, and consume Office generated Open XML formats for either Office 2007 or Office 2010. Looking back at the original Open XML SDK architecture diagram I showed you when we first announced the Open XML SDK, we have extended the base level layer to include Office 2010 support:

image

What does that mean? Well, let me show you some examples.

The Open XML Packaging API component allows you to add/remove parts within an Open XML package. This component functions by providing you strongly typed classes for every part within a package. In Office 2010 we've added new parts to the package in order to support some of our new features. With this CTP you are able to deal with these new parts with classes. For example, Word 2010 added a new part called stylesWithEffects. This part is used to help round trip styles that are based on Word's new text effects feature.

image

 

The Open XML Low Level DOM component allows you to create, edit, or consume xml contained in parts contained within an Open XML package. Like the packaging API component, this component functions by providing you with strongly typed classes for every element supported within the xml. For example, imagine we have the following text in a Word document, which uses the new text effects feature glow:

image

 

This text is represented in xml and the Open XML SDK as the following:

XML

Open XML SDK Code

 clip_image001

using W14 = DocumentFormat.OpenXml.Office2010.Word;   ...   Run run2 = new Run();   RunProperties runProperties1 = new RunProperties();   W14.Glow glow1 = new W14.Glow() { GlowRadius = 228600L };   W14.SchemeColor schemeColor1 = new W14.SchemeColor() { Val = W14.SchemeColorValues.ExtraSchemeColor5 }; W14.Alpha alpha1 = new W14.Alpha() { Val = 60000 }; W14.SaturationModulation saturationModulation1 = new W14.SaturationModulation() { Val = 175000 }; schemeColor1.Append(alpha1); schemeColor1.Append(saturationModulation1);   glow1.Append(schemeColor1); runProperties1.Append(glow1); Text text2 = new Text(); text2.Text = "text effects";   ...

As you can see from the code snippet above, leveraging the Office 2010 Open XML functionality of the SDK is as simple as including the appropriate set of namespace references.

Office 2010 Schema and Semantic Level Validation

In a previous post I talked about how to find Open XML errors with the Open XML SDK validation functionality. In that post, I talked about finding both schema and semantic (constraints defined in the prose of the documentation that are not represented in schema markup) validation errors. With the latest SDK CTP you are now able to differentiate between errors specific to Office 2007 or Office 2010. The code to validate a file is pretty much the same as the code I showed in the previous post mentioned. The major difference is that we moved the validation functionality to be under its own namespace reference: "using DocumentFormat.OpenXml.Validation". We moved this functionality to its own namespace to improve discoverability of the feature. The OpenXmlValidator class now takes in an enumeration called FileFormatVersions that allows you to specify either Office 2007 or 2010:

image

 

The validation functionality also has the ability to automatically deal with extensibility markup, like alternate content blocks, as defined by the Open XML standard.

General Open XML SDK Improvements

We recently conducted a usability study for the Open XML SDK where we asked participants to complete a set of tasks. Based on the results of the study we found that users had issues around the complexity of the Open XML SDK Open methods. In the August 2009 CTP of the SDK, each of the classes WordprocessingDocument, SpreadsheetDocument and PresentationDocument had nine overloaded Open methods. For example, let's look at the WordprocessingDocument class as an example:

  • public static WordprocessingDocument Open(Package package);
  • public static WordprocessingDocument Open(Package package, bool autoSave);
  • public static WordprocessingDocument Open(Stream stream, bool isEditable);
  • public static WordprocessingDocument Open(string path, bool isEditable);
  • public static WordprocessingDocument Open(Package package, MCMode mode, bool processMCInWholePackage);
  • public static WordprocessingDocument Open(Stream stream, bool isEditable, bool autoSave);
  • public static WordprocessingDocument Open(string path, bool isEditable, bool autoSave);
  • public static WordprocessingDocument Open(Stream stream, bool isEditable, MCMode mode, bool processMCInWholePackage);
  • public static WordprocessingDocument Open(string path, bool isEditable, MCMode mode, bool processMCInWholePackage);

Now with the latest CTP, the number of overloaded methods for Open has been reduced to six methods:

  • public static WordprocessingDocument Open(Package package);
  • public static WordprocessingDocument Open(Package package, OpenSettings openSettings);
  • public static WordprocessingDocument Open(Stream stream, bool isEditable);
  • public static WordprocessingDocument Open(string path, bool isEditable);
  • public static WordprocessingDocument Open(Stream stream, bool isEditable, OpenSettings openSettings);
  • public static WordprocessingDocument Open(string path, bool isEditable, OpenSettings openSettings);

Advanced open settings are now part of a new class called OpenSettings. This class allows you to pre-process documents according to markup compatibility, auto save files, or caps the open method to deal with a specific number of characters within a part. Hopefully this change will make it easier to deal with the Open method.

Open XML SDK Tool Improvements

The last major improvement I want to call out is around the free Open XML SDK tool that comes with the SDK. In the August 2009 CTP, the Open XML SDK shipped with three separate tools:

  • Document Reflector
  • Open XML Class Explorer
  • Open XML Diff

Based on feedback we have separated the download of the tools from the SDK dll. In other words, the Open XML SDK download page now has two download links: one for the SDK dll and one for a new consolidated Open XML SDK tool. As of the December 2009 CTP, the Open XML SDK tools are consolidated into one tool. In addition we've added some new functionality to the SDK. Here is a video that shows you an overview of the new tool:

The new tool adds the following functionality:

  • Documentation for Office 2010 Open XML formats
  • Validation functionality. This feature allows you to validate Open XML files according to Office 2007 or Office 2010
  • New reflection feature that combines reflection with the Open XML Diff feature. This new feature allows you to automatically generate SDK code that transforms one document into another document

Let us know what you think of the new tool. I think of the new tool as the Open XML version of Macro Recording.

More Feedback Always Welcome

Please continue to send us your feedback, either on this blog or at our Microsoft Connect site for the Open XML SDK. We look forward to hearing from you.

Zeyad Rajabi

Comments

  • Anonymous
    December 14, 2009
    Hi guys, I've started to play with the new CTP. Here is a very simple sample code used to create a PowerPoint presentation from scratch. The problem is that the resulting file won't open and the Validator doesn't show any error. The error I get when opening the file is the well known "PowerPoint found unreadable content in....", and the recovery doesn't help either. Here is the code:  Dim absoluteFileName As String = Server.MapPath("~/PowerPointTemp/Test.pptx")        Using pptxDoc As PresentationDocument = PresentationDocument.Create(absoluteFileName, PresentationDocumentType.Presentation)            Dim presentationPart As PresentationPart = pptxDoc.PresentationPart            Dim slideMasterPart As SlideMasterPart = presentationPart.AddNewPart(Of SlideMasterPart)()            For Each chart As dotnetCHARTING.Chart In charts                Dim slide As SlidePart = presentationPart.AddNewPart(Of SlidePart)()                Dim imagePart As ImagePart = slide.AddImagePart(ImagePartType.Png)                Using outputStream As IO.Stream = CType(imagePart.GetStream, IO.Stream)                    Using inputStream As IO.Stream = chart.GetChartStream                        inputStream.Position = 0                        Dim len As Integer = Convert.ToInt32(inputStream.Length)                        Dim bytes() As Byte = New Byte((len) - 1) {}                        Dim bytesRead As Integer = inputStream.Read(bytes, 0, len)                        If (bytesRead = len) Then                            outputStream.Write(bytes, 0, len)                        End If                    End Using                End Using            Next chart            Dim validator As New OpenXmlValidator(FileFormatVersions.Office2007)            Dim errors As IEnumerable(Of Validation.ValidationErrorInfo) = validator.Validate(pptxDoc)            For Each ee As Validation.ValidationErrorInfo In errors            Next        End Using

  • Anonymous
    December 14, 2009
    The comment has been removed

  • Anonymous
    December 15, 2009
    Rasetti - It will be easier for me to help debug this issue if you can share your entire solution with me. Can you place the solution on a public server for me to download?

  • Anonymous
    December 29, 2009
    Why are there no Release Notes available with these CTP releases?  Where can I find out what was added/fixed/changed from the last CTP release? Surely, there is some list somewhere?

  • Anonymous
    December 29, 2009
    General information on what was changed in the latest CTP of the SDK can be found here: http://msdn.microsoft.com/en-us/library/cc471858(office.14).aspx

  • Anonymous
    December 30, 2009
    Thanks Zeyad,   The above link was for the V1.0 and old 2008 info. The format and type of content is more along the lines of what I'm expecting. From there I could get to the Dec 2009 CTP notes:  http://msdn.microsoft.com/en-us/library/cc471858(office.14,lightweight).aspx Unfortunately, it appears to be only a placeholder for such a document as it doesn't contain much.  (As of this writing, your announcement above contains much more content) Nothing is mentioned about the new type additions / renaming of Sdt / numbering classes and types which are breaking changes, along with the volume of changes to parameter types that are also breaking changes. The most puzzling is the conversion of many strongly typed Val properties now becoming just generic string properties.   This is a shift from (more specific) enum/signed/unsigned integer type values to run of the mill generic String values. This (seems?) headed in the wrong direction towards removing the ability to use the compiler to help with parameter type correctness checking. Is there any insight on why this shift?  (Or am I misunderstanding this somehow) It would be really helpful to have some actual release notes.  :) I understand that this is a CTP, but surely your team has a handle on what's new/changed in this release and should make a concerted effort to share that.

  • Anonymous
    December 30, 2009
    The biggest change to the SDK with the latest CTP is the addition of Office 2010 Open XML file format support. One of the changes introduced into the file format for Office 2010 was relaxing some of the restrictions of attribute values. In fact, many of these schema changes were introduced as part of the ISO standardization of the Open XML file format. In other words, the schema changed from Ecma v1 of the Open XML standard to the ISO version of the Open XML standard. We've done work in both Office 2007 SP2 and Office 2010 to support the latest version of the Open XML standard as defined by ISO. For example, one of the changes introduced was the way WordprocessingML attribute values store values that specify some type of measurement. Instead of just specifying a number value, where the units of measurement was defined in the standard, a change was made to allow for the units of measurement to be stored right within the attribute value. In other words, a schema change was made to allow for string values for these attributes. For example, these types of attribute values can now have a value of "2cm" or "5in". Notice the units of measurement. In order to support both Office 2007 and Office 2010 we decided to relax some of the strongly typed restrictions of some of our objects. That being said, our validation functionality will be able to validate values according to Office 2007 vs. Office 2010. I will make sure to write up a blog post that gives you guys more details around these changes. Thanks for the feedback.

  • Anonymous
    December 30, 2009
    The comment has been removed

  • Anonymous
    January 04, 2010
    Colin - Very cool post and solution. I will make sure to write a post that links to your post/solution. Check out the latest post (http://blogs.msdn.com/brian_jones/archive/2010/01/04/document-assembly-merging-excel-powerpoint-and-word-content-together.aspx), which showcases another document assembly type solution. Again, thanks for sharing.

  • Anonymous
    January 07, 2010
    Hi, it is great to have new preview releases of the v2 Open XML sdk, but it would be really useful to know how near you are to production release. I have read that this won't be before Office 14 is released, and that Office 14 is expected in H1 2010, but could you give us a hint how long it will take (if at all) to get the sdk released after Office 14 is available. Thanks Patrick