OPC
A New Standard For Packaging Your Data
Jack Davis and Andrey Shur
This article discusses:
|
This article uses the following technologies: Open Packaging Conventions, .NET Framework 3.0 |
Code download available at: Packaging 2007_08.exe(984 KB)
Contents
File Format Organization
Packages
Package Parts
Package Relationships
Package Digital Signatures
Package Properties
Creating and Accessing Packages and Parts
Creating and Accessing Relationships
Pack URI
Authentication and Validation
Rights Management and Encryption
Packaging a Web Page
Exploring a Package
Reading a Package
Signing and Validating a Package
Designing a File Format
Many applications integrate content with various additional resources. A Web browser, for example, displays a page that integrates HTML, image files, style sheets, and other types of content. Similarly, a word processor builds a document that combines text, style definitions, image files, and other elements. For the most part, applications use one of two approaches to organize the content: a flat-file organization where content is stored as separate files organized on disk, or binary container files where all the content is packaged in a single custom file. More and more, applications are tending towards the latter.
In the move toward open standards, a new file packaging technology has evolved as part of the 2007 Microsoft® Office System Open XML specification that was recently approved by the ECMA International standards organization. An underlying component of this standard is Open Packaging Conventions (OPC), which defines a structured means to store application data together with related resources using a standard ZIP file. This new packaging technology is already being used in several Microsoft products, including the 2007 Office System applications. The XML Paper Specification (XPS), which defines the new print-spool and document presentation format for Window Vista™, also implements the storage and transport of high-fidelity documents based on OPC. See the "Online Resources" sidebar for more information about the technologies discussed in this article.
So what makes the portable container technology offered by Open Packaging Conventions different? Since it is an open standard, OPC provides a container technology you can use without having to code your own custom binary container files. And it supports a number of enhanced features, including content addressable URIs, MIME types, relational structuring, and authentication and validation. With the Microsoft .NET Framework 3.0, the packaging APIs also offer options for encryption with rights management. Moreover, because they adhere to an open standard, package-based files can be accessed through high-level services such as workflow applications and virus scanners.
API support for the ECMA Open Packaging Conventions is built into Windows Vista and included as part of the .NET Framework 3.0 for use with Windows® XP and Windows Server® 2003. In this article, we’ll examine the new standard, showing how you can use the .NET Framework 3.0 APIs to organize your application’s storage of multiple data streams in a single portable package.
File Format Organization
Microsoft Word 2007 and Excel® 2007 both use Open Packaging for document storage, however, they use different schemas and file organizations. The specific organization of a package’s contents defines its format, which is typically reflected in the extension (like .docx or .xlsx).
Using packaging, you can define your own file organization, filename extension, and file type association for your application. As an example, we’ll create our own custom package type, .htmx, that will store a Web page along with the local files it depends on (style sheets, scripts, image files, and so on).
Packages
A Package is the basic storage unit of the Open Packaging Conventions standard. In the implementation for the .NET Framework 3.0, System.IO.Packaging.Package is defined as an abstract class from which specific physical implementations are derived. The default and primary physical implementation of a Package in the .NET Framework 3.0 is the ZipPackage derived class that uses a standard ZIP archive (see Figure 1). The following code shows the basic steps used to create and open a new package:
Figure 1** File Formats Using Open Packaging Conventions **
using System.IO.Packaging;
...
// Path and name for the package file.
string packageFile = @”C:\webpages\packaging.htmx”;
// Create and open the specified package file for writing.
// (The ‘using’ statement ensures that package is
// closed and disposed when it goes out of scope.)
using (Package package = Package.Open(packageFile, FileMode.Create))
{
... // Store content files as parts in the package.
}
Open Packaging uses ZIP files because they are a known industry format that is easy to work with, inspect, and access. In fact, you can take a package-based file (a .docx file, for example), rename it with a .zip extension, and access its contents using a standard ZIP utility, such as the Compressed Folders feature built into Windows Explorer.
A package stores two basic elements: PackageParts (which are referred to simply as parts) and PackageRelationships (which are referred to as relationships). Parts represent the actual content being packaged (see Figure 2). Higher-level components are provided by building upon the basic part and relationship elements. These include PackageDigitalSignatures, PackageProperties, and components to support encryption and rights management.
Figure 2** Basic Elements of a Package **
Package Parts
A part is a data stream, analogous to a file in a file system or ZIP archive. Parts can hold any type of data—binary, text, images, and so on. Figure 3 outlines a sample package that contains several parts with different types of content.
Figure 3** Packaging Various Forms of Content **
When a part is stored in a package, it is defined with a unique URI-formatted PartName along with a MIME ContentType. Part names start with a forward slash, defining an absolute path based from the package root. You can easily add or remove a part to or from the package, write to a part, update it, or read the part from the package, simply by referencing its unique URI part name.
While the PartName URIs can appear to represent a folder hierarchy, there are actually no separate folders in a package. A package cannot contain empty folders and there is no concept of a default current directory. Because there can be no empty folders, there is no need to create, delete, or otherwise manage folders separately. If the package is opened in a graphical ZIP utility, the utility will often display the paths of the part names in a folder representation. Similarly if the package is unzipped, the utility will recreate the apparent folder hierarchy. The folder representation, however, is an artifact of the ZIP utility and is not inherent in the package itself.
The code sample shown in Figure 4 illustrates the process of creating a part and then writing data from a disk file to the part stored in the package.
Package Relationships
A relationship defines an association between two items: a specified source and a specified target. The source of a relationship can be either the package itself (a package-level relationship) or a specified part in the package (a part-level relationship). The target of a relationship can be either a specified part in the package or a specified external resource. An external resource can be another package, a part within another package, or any other type URI-addressable data entity. Thus, there are four combinations of relationships that can be defined:
- Between a package (source) and a specified part (target) contained in the package.
- Between a package (source) and a specified resource (target) external to the package.
- Between a specified part (source) within the package and a second specified part (target) also contained within the package.
- Between a specified part (source) within the package and a specified resource (target) external to the package.
The source part is considered the owner of the relationship. If the source part is deleted, all the relationships owned by that part are also deleted. Relationships are defined and stored separately in special relationship parts in the package. Note that creating or deleting a relationship does not physically alter the source or the target elements in any way.
Relationships offer a number of advantages:
Relational Structuring Relationships allow associations between content and resources to be determined without needing to read or parse content streams. Preprocessing tasks can aggregate, cache, and access needed resources without having to understand the details of the content in use.
Content Discoverability Relationship types combined with part MIME content types enable quick discoverability of the structure, associations, and content of the parts contained in the package.
Schema Independence Content and resource associations defined by relationships are schema independent. This is especially helpful for content defined in XML markup where parsing would typically require knowledge of the specific schema in use.
Reference Integrity For XML-based content, you can specify resources in markup by a reference to a relationship ID, rather than embedding resource URI references directly in markup. This simplifies the XML markup and eliminates the possibility that content references and relationship references might become out of sync.
Figure 5 illustrates a sample HTML page packaged with its style sheets and image file resources, as well as the relationships that define the associations. A package-level relationship is used to identify the URI of the PackagePart that contains the base HTML content for the page. By querying for the package’s root-html relationship, the URI of HTML Web page part can be directly determined. With the HTML part as the source, part-level relationships for the style sheet, scripts, and image resources can be easily located and resolved. (For each resource referenced in the HTML page, there is a corresponding part-level relationship.)
Figure 5** Sample HTML Page Package **(Click the image for a larger view)
The code shown in Figure 6 demonstrates how to create a package, create and add a part, and then create a package relationship.
Figure 6 Create a Package, Part, and Package Relationship
using System.IO.Packaging;
// Path and name of the package file.
string packageFile = @”C:\webpages\packaging.htmx”;
// Path and name of the file to store in the package.
string partFile = @”C:\inetpub\wwwroot\packaging.htm”;
// Part name URI, MIME content type, and compression option.
Uri partNameUri = PackUriHelper.CreatePartUri(
new Uri(“/packaging.htm”, UriKind.Relative));
string partType = “text/html”; // MIME type
CompressionOption compression = CompressionOption.Normal;
for the package relationship type.
string packageRelationshipType = “https://schemas.openxmlformats.org/” +
“package/2007/relationships/htmx/root-html”;
// Create and open the package file.
using (Package package = Package.Open(packageFile, FileMode.Create))
{
// Create the part.
PackagePart part = package.CreatePart(
partNameUri, partType, compression);
// Write the data from the file to the part.
using (FileStream fileStream = File.OpenRead(partFile))
{
CopyStream(fileStream, part.GetStream()); // Copy data to part.
}
// Create a package-level relationship to the Web page part.
package.CreateRelationship(
part.Uri, TargetMode.Internal, packageRelationshipType);
}
Using relationships makes it easy to open and discover the associations between the content and resource items stored in the package. Figure 7 is an example of opening a package and then using the root-html package relationship to locate the root content part contained in the package. The root part can then be queried to return relationships that identify additional resources that the root element needs.
Figure 7 Locating the Root Content Part in a Package
using System.IO.Packaging;
// Path and name of the package file.
string packageFile = @”C:\webpages\packaging.htmx”;
// Name for the root package relationship type.
string rootHtmlRelationshipType = “https://schemas.openxmlformats.org/” +
“package/2007/relationships/htmx/root-html”;
string resourceRelationshipType = “https://schemas.openxmlformats.org/” +
“package/2007/relationships/htmx/required-resource”;
// Open the package for reading. - - - - - - - - - - - - - - -
using (Package package =
Package.Open(packageFile, FileMode.Open, FileAccess.Read))
{
// A package can contain multiple root items, iterate
// through each. Get the “root-html” package relationship.
foreach (PackageRelationship relationship in
package.GetRelationshipsByType(rootHtmlRelationshipType))
{
// Get the part referenced by the relationship TargetUri.
PackagePart rootPart = package.GetPart(relationship.TargetUri);
// Open and access the part data stream.
using (Stream dataStream = rootPart.GetStream())
{
... // Access the root part’s data stream.
}
// A part can have associations with other parts.
// Locate and iterate through each associated part.
// Iterate through each “required-resource” part.
foreach (PackageRelationship resourceRelationship in
rootPart.GetRelationshipsByType(
resourceRelationshipType))
{
// Open the Resource Part and write the contents to a file.
PackagePart resourcePart = package.GetPart(
resourceRelationship.TargetUri);
... // Party with the resource part.
}
}
}
Package Digital Signatures
Built upon a composition of parts and relationships, a package can also contain PackageDigitalSignature items (digital signatures for short). A digital signature uses an X.509 certificate to securely sign parts and relationships in the package. The signature provides two features. It identifies and authenticates the individual or entity that has signed a given set of parts and relationships, and it validates that the signed parts and relationships have not been modified.
The digital signature does not prevent a part or relationship from being changed, but a validation check against the signature will fail if the signed item has been altered. The application can then take appropriate action—for example, prevent the part from being opened or notify the user that the data is not secure.
The diagram in Figure 8 illustrates a package that contains parts and relationships along with a digital signature that has been used to sign specific items. If any of the signed items are modified in any way, validation against the digital signature will fail.
Figure 8** Signing Parts and Relationships **(Click the image for a larger view)
For more information about package digital signatures, see the MSDN® article "The Digital Signing Framework of the Open Packaging Conventions".
Package Properties
A package can also store a set of public metadata as PackageProperties. This set includes 16 common properties that describe the package and its contents. Use of these properties is optional. Figure 9 summarizes the properties available with a package.
Figure 9 Properties Available with a Package
Property | Description |
---|---|
Category | A categorization of the package content (example: Letter, Proposal, Resume) |
ContentStatus | The status of the package content (example: Draft, Reviewed, Final). |
ContentType | The type of content that is contained in the package (example: Whitepaper, Bulletin). |
Created | The date and time the package was created (see also Modified). |
Creator | The name of the individual or entity that created the package (see also LastModifiedBy). |
Description | A description of the package content. |
Identifier | A unique identification assigned to the package and its content. |
Keywords | A delimited set of keywords to support search and indexing. |
Language | An RFC 3066 tag that identifies the language for which the package content is written. |
LastModifiedBy | The name of the individual or entity that last modified the package (see also Creator). |
LastPrinted | The date and time the package content was last printed. |
Modified | The date and time the package content was last modified (see also Created). |
Revision | The revision number of the package content (see also Version). |
Subject | The topic of the package content. |
Title | A name given to the package content. |
Version | The version number of the package content (see also Revision). |
Creating and Accessing Packages and Parts
You can use the Package.Open method to create a new package or open an existing package. Package.Open provides overloads that instantiate packages, either through a specified filename or by a given file stream. In addition, Package.Open includes settings that let you specify options for file mode (System.IO.FileMode) and file access (System.IO.FileAccess).
After a package is instantiated and opened, you can use the part management methods to create and access the parts contained in the package (see Figure 10). After obtaining a part instance—either through the CreatePart method or the GetPart method—you can call the part GetStream method to return the stream for reading or writing the content data. We discuss how to access the data content of a part in more detail later.
Figure 10 Package Part Methods
Package Method | Description |
---|---|
CreatePart | Creates a new part in the package. |
DeletePart | Deletes a specified part from the package. |
GetPart | Returns the part with a specified name. |
GetParts | Returns a collection of all the parts contained in the package. |
PartExists | Determines whether a specified part exists in the package. |
Creating and Accessing Relationships
When using the System.IO.Packaging APIs, you can create package-level and part-level relationships through the CreateRelationship methods of the Package and PackagePart classes. The CreateRelationship methods use five items to define a relationship:
- The source package or part
- A URI for the target part or external resource
- A targetMode that identifies if the target is internal or external to the package
- A URI-formatted relationshipType
- A unique relationshipID
Once created, relationships can be accessed through the GetRelationshipsByType, GetRelationship, or GetRelationships methods of the Package and PackagePart classes (see Figure 11). We provide more detail about how relationships are stored in a package in the section "Exploring a Package."
Figure 11 Relationship Methods
Package/Part Method | Description |
---|---|
CreateRelationship | Creates a package-level or part-level relationship to a specified target. |
DeleteRelationship | Deletes a specified relationship owned by this package or part. |
GetRelationshipsByType | Returns all relationships owned by this package or part that match a specified relationshipType. |
GetRelationship | Returns the relationship owned by this package or part that matches a specified relationshipID. |
GetRelationships | Returns all relationships owned by this package or part. |
RelationshipExists | Determines whether a specified relationship owned by this package or part exists. |
Pack URI
In order to access a part contained in a package, an address for both the package and the selected part needs to be specified. A package can be identified through conventional URI schemes, such as http, ftp, or file. However, conventional URI schemes do not provide a means to specify an individual part within the package. Based on the extensible architecture for URI schemes (RFC 3986), the Open Packaging Conventions standard defines a pack URI scheme that can address individual parts contained in a package.
Creating a pack URI is fairly simple. Given a conventional URI for a package file and the name of a part contained in that package, a fully-qualified pack URI address that identifies the part can be created in two steps.
Say, for instance, the URI is https://www.proseware.com/mypackage.docx and the part name is /images/chocolate.jpg. The first step is to encode the URI of the package to form a pack authority. This is done by replacing "?", "@", ":", "%", and "," characters with their percent-encoded equivalents ("?"="%3f", "@"="%40", ":"="%3a", "%"="%25", ","="%2c") and replacing all forward slash characters with commas. Thus, the URI would look like this:
http%3a,,www.proseware.com,mypackage.docx
The second step is to combine the "pack://" scheme prefix, pack authority, and part name to form the pack URI for the specified part. The result for our sample would be:
pack://http%3a,,www.proseware.com,mypackage.docx/images/chocolate.jpg
There are additional steps to handle other forms of URIs, but understanding the basic process outlined here is sufficient for most uses.
There are some important classes you ought to be familiar with. These include:
PackUriHelper The PackUriHelper class provides methods that make it easy to create pack URIs, as well as to extract and return the original package URI or part name from a given pack URI.
PackWebRequest To access data through URIs, .NET-based applications can use the abstract WebRequest and WebResponse classes from which scheme-specific subclasses are implemented and registered. Based on the scheme of the URI specified to the WebRequest.Create method, the appropriate subclass is automatically selected and used. With the .NET Framework 3.0, WebRequest and WebResponse directly support pack URIs. A feature specific to pack URIs is that streams returned by the GetResponseStream method are fully seekable (if seek is not supported by the network protocol, seek operations are handled client-side transparently by the pack implementation). Through the use of pack URIs your application can directly access any part contained in a package on any URI accessible data source.
Authentication and Validation
Packaging services include support for digital signatures that can be used to securely sign and validate content within a package. Key PackageDigitalSignatureManager class methods include VerifyCertificate, Sign, VerifySignature, and RemoveSignature.
Rights Management and Encryption
Beyond the functionality defined by the Open Packaging Conventions, the System.IO.Packaging APIs in the .NET Framework 3.0 offer support for protecting packaged content both for privacy and security. Using encryption with rights management in combination with a Windows Rights Management server, packages can be encrypted to limit access to specific individuals or groups. For more information on rights management and encryption, see the "Rights Managed Package Publish" and "Rights Managed Package Viewer" .NET Framework 3.0 samples provided with the Windows SDK.
Packaging a Web Page
We’ll use a simple Web page with its local resources to illustrate some of the basic operations with packages. Note, however, that this example will limit us to some extent since Web pages are based on a fixed HTML schema and we want to avoid altering the existing page content. An extensible XML-based schema provides more flexibility for additional packaging features. We’ll discuss these and other options in the "Package File Format Design" section.
Now that we’ve covered the basics of packaging, we’ll write a sample program, PackageHtmxWrite, for creating our own custom .htmx package (the full source for PackageHtmxWrite is available on the MSDN Magazine Web site). We will store in the package a Web page together with its local style sheets, scripts, and image file resources. To identify the starting part that contains the HTML page, we’ll create a package-level relationship that points to the HTML part that’s stored in the package. We’ll also create part-level relationships that define associations between the Web page and each local resource stored in the package. The relationships will make the associations between each of the components directly discoverable and eliminate the need to search the package to locate the Web page or to parse the HTML content to locate the related resources.
Since the Web page and its local resources are typically defined through relative paths, we can use the relative path and filename of each item as the part name for storing the data stream in the package. For example, given the HTML tag <img src="images/packaging-sign.png">, we can use the path and filename of the src attribute to define the URI name /images/packaging-sign.png for storing the image as a part in the package. The local resources of the Web page are specified within the HTML, therefore we need to parse the HTML once to identify all of the resource files used in the page.
PackageHtmxWrite’s File | Open command lets the user choose a Web page to load and display in the Web browser control. After the page is loaded, the File | Save As command allows the users to specify the location and filename for storing the Web page and its local resources in an .htmx file. Specifically, File | Save As invokes methods that parse the page’s HTML and create a list of the local resource files to be included in the package.
The WritePackage method then performs a series of steps to create the package. First it creates the package with the user-specified path and filename. Then, for the root HTML Web page, it creates a part for storing the root page; this involves storing the HTML data into the Web page part and creating a package-level relationship that identifies the root HTML Web page part. For each local resource referenced in the Web page, the method creates a part for storing the resource, stores the resource’s data into the part, and creates a part-level relationship from the Web page part that targets the resource part. The WritePackage code shown in Figure 12 illustrates the process.
Figure 12 Creating a Package
private void WritePackage(
string packageFilepath, string parsedHtml, Hashtable resourceHash)
{
// Create and open the specified package file for writing.
// (The ‘using’ statement ensures that the package is
// closed and disposed when it goes out of scope.)
using (Package package = Package.Open(
packageFilepath, FileMode.Create))
{
string docPath =
webBrowser.Url.AbsoluteUri.Substring(rootPrefix.Length);
string docPartName = Path.GetFileName(packageFilepath);
docPartName = Path.ChangeExtension(docPartName, “htm”);
if (!docPath.EndsWith(“/”))
{
// docPath is in the format of “/folder1/folder2/1.html”
// Make it “/folder1/folder2/”
docPath = docPath.Remove(docPath.LastIndexOf(‘/’)+1);
}
docPartName = docPath + docPartName;
// The Web page and its resource files will likely be an absolute
// path with a common folder prefix. Determine the common folder
// prefix to remove when the parts are stored.
string commonFolderString =
GetCommonFolderString(resourceHash.Values, docPartName);
// Remove prefix of common folders from the Web page part name.
docPartName = docPartName.Substring(commonFolderString.Length);
// Create a Uri name for the Web page part.
Uri webpagePartUri = PackUriHelper.CreatePartUri(
new Uri(docPartName, UriKind.Relative));
// Create a package part to store the Web page HTML.
PackagePart webpagePart = package.CreatePart(webpagePartUri,
System.Net.Mime.MediaTypeNames.Text.Html,
CompressionOption.Normal);
// Write the Web page HTML to the Web page part.
using (StreamWriter sw = new StreamWriter(
webpagePart.GetStream(), System.Text.Encoding.UTF8))
{
sw.Write(parsedHtml); // Write the Web page part.
}
// Create a package-level relationship to the Web page part.
package.CreateRelationship(webpagePart.Uri,
TargetMode.Internal, _packageRelationshipType);
// Create and write each of the Web page’s local resource parts.
foreach (Uri resourceUri in resourceHash.Keys)
{
string contenttype;
// Open the stream for each resource file.
using (Stream s = GetResourceStream(
resourceUri, out contenttype))
{
// Nothing to do if the resource file has no stream.
if (s == null) continue;
// Get the package part name for the resource.
ResourceInfo ri = (ResourceInfo)resourceHash[resourceUri];
string partName = ri.partName;
// Remove the leading common folders.
partName = partName.Substring(commonFolderString.Length);
// Create the part name URI for the resource.
Uri partUri = PackUriHelper.CreatePartUri(
new Uri(partName, UriKind.Relative));
// Use normal compression unless the data has already
// been compressed or is an octet-stream.
CompressionOption compression = CompressionOption.Normal;
if ( contenttype.StartsWith(“image”)
|| contenttype.StartsWith(“video”)
|| contenttype.StartsWith(“audio”)
|| contenttype.StartsWith(“application/octet-stream”))
compression = CompressionOption.NotCompressed;
// Create the part for storing the resource file
// in the package.
PackagePart resourcePart =
package.CreatePart(partUri, contenttype, compression);
// Copy the data from the file to the package part.
CopyStream(s, resourcePart.GetStream());
// Create a part-level relationship from the Web page part
// (owner) to the resource part (relationship target).
webpagePart.CreateRelationship(partUri,
TargetMode.Internal, resourceRelationshipType);
}
}
}
}
Exploring a Package
When executed, the PackageHtmxWrite sample creates an .htmx package that contains the original Web page along with its local resources. Since the package is a ZIP file, we can use the standard Windows ZIP functionality to examine the components stored in the package. Figure 13 illustrates the organization of the parts contained in the packaging.htmx sample included in the download accompanying this article. Here’s a quick tour of the contents.
Figure 13** Organization of Parts in Packaging.htmx **(Click the image for a larger view)
A [Content_Types].xml part is stored in all packages. This part contains a list of the MIME types and extensions for all of the other parts in the package. In our sample, [Content_Types].xml contains the following information:
<?xml version=”1.0” encoding=”utf-8” ?>
<Types xmlns=
“https://schemas.openxmlformats.org/package/2006/content-types”>
<Default Extension=”xml” ContentType=”text/xml” />
<Default Extension=”htm” ContentType=”text/html” />
<Default Extension=”html” ContentType=”text/html” />
<Default Extension=”rels” ContentType=
“application/vnd.openxmlformats-package.relationships+xml” />
<Default Extension=”jpg” ContentType=”image/jpeg” />
<Default Extension=”png” ContentType=”image/png” />
<Default Extension=”css” ContentType=”text/css” />
</Types>
Within a package, relationships are defined in XML parts with a .rels extension; package-level relationships are defined in a part named /_rels/.rels. A package can have any number of package-level relationships. The relationship with the type /root-html is stored in the part /_rels/.rels. This is the relationship we created to identify the HTML Web page part name. We can use this relationship type in our programs later to return the part name of any HTML Web page we store. In our sample, the /_rels/.rels contains the following XML:
<?xml version=”1.0” encoding=”utf-8” ?>
<Relationships xmlns=
“https://schemas.openxmlformats.org/package/2006/relationships”>
<Relationship Type=
“https://schemas.microsoft.com/opc/2007/relationships/htmx/root-html”
Target=”/packaging.htm” TargetMode=”Internal”
Id=”Rf29a606b57094466” />
</Relationships>
The .htm (or .html) file is an embedded copy of the HTML file that the user selected in the application.
Within a package, part-level relationships are stored in a part with a special name. First the source part’s PartName is divided into its path and filename components. The relationship part name is then formed by appending _rels/ to the path and .rels to the filename. So, if the source PartName is /aaa/bbb/mypage.htm, the path is /aaa/bbb/ and the filename is mypage.htm. The resulting relationship part name is /aaa/bbb/_rels/mypage.htm.rels.
A part can have any number of relationships. In the packaging.htmx package, the packaging.htm part defines its relationships in the part /_rels/packaging.htm.rels.
Reading a Package
The accompanying PackageHtmxRead sample application makes use of a Web browser control to display a Web page packaged in an HTMX file. Since the Web browser doesn’t understand HTMX package files, PackageHtmxRead must first unzip the HTML file and its resources to a local temporary directory. The OpenHtmxPackage method shown in Figure 14 demonstrates how to unzip the HTMX content and then display the Web page.
Figure 14 Unzip Content and Display Web Page
public bool OpenHtmxPackage(string filepath)
{
// Extract the Web page and its local resources to the temp folder.
_tempFolder = GetTempFolder();
// Create a new tempFolder directory. If the tempFolder
// exists, delete it and create a new empty one.
DirectoryInfo directoryInfo = new DirectoryInfo(_tempFolder);
if (directoryInfo.Exists) directoryInfo.Delete(true);
// Extract the Web page and its local resources to the temp folder.
_htmlFilepath = ExtractPackageParts(filepath, _tempFolder);
// Check that the Web page has a valid path and filename.
if (_htmlFilepath == null)
{
WritePrompt(“ Error: web page not found”);
return false;
}
// Convert the path and filename to a URI for the browser control.
Uri webpageUri;
try
{
webpageUri = new Uri(_htmlFilepath, UriKind.Absolute);
}
catch (System.UriFormatException)
{
string msg = _htmlFilepath + “\n\nThe specified path and “ +
“filename cannot be converted to a valid URI.\n\n”;
System.Windows.MessageBox.Show(msg, “Invalid URI”,
MessageBoxButton.OK, MessageBoxImage.Error);
return false;
}
// Load the Web page.
webBrowser.ScriptErrorsSuppressed = false;
webBrowser.Url = webpageUri;
return true;
}
To extract the individual files, the OpenHtmxPackage method calls the ExtractPackageParts method, which queries the package for the relationship that identifies the root HTML part. The root HTML part is then copied to its relative location in the target directory.
Next, the HTML part is queried for part-level relationships that identify the associated resource parts. The resource part for each part-level relationship is then copied to its relative location in the target directory. Again, all this is done simply through the use of relationships—there’s no parsing of any part content! The sample code in Figure 15 shows the ExtractPackageParts and ExtractPart methods used to unzip the HTML file and its local resource files.
Figure 15 Unzip HTML File and its Resource Files
ExtractPackageParts
private string ExtractPackageParts(
string packageFile, string targetDirectory)
{
Uri uriDocumentTarget = null;
// Open the Package.
using (Package package =
Package.Open(packageFile, FileMode.Open, FileAccess.Read))
{
PackagePart documentPart = null, resourcePart = null;
// Examine the package-level relationships and look for
// the relationship with the "root-html" RelationshipType.
foreach (PackageRelationship relationship in
package.GetRelationshipsByType(_packageRelationshipType))
{
// Resolve the relationship target URI so
// the root-html document part can be retrieved.
uriDocumentTarget = PackUriHelper.ResolvePartUri(
new Uri("/", UriKind.Relative), relationship.TargetUri);
// Open the document part and write its contents to a file.
documentPart = package.GetPart(uriDocumentTarget);
ExtractPart(documentPart, targetDirectory);
}
// Examine the root part’s part-level relationships and look
// for relationships with "required-resource" RelationshipTypes.
Uri uriResourceTarget = null;
foreach (PackageRelationship relationship in
documentPart.GetRelationshipsByType(
resourceRelationshipType))
{
// Resolve the Relationship Target Uri so the resource part
// can be retrieved.
uriResourceTarget = PackUriHelper.ResolvePartUri(
documentPart.Uri, relationship.TargetUri);
// Open the resource part and write the contents to a file.
resourcePart = package.GetPart(uriResourceTarget);
ExtractPart(resourcePart, targetDirectory);
}
}
// Return the path and filename to the file referenced
// by the HTMX package’s "root-html" package-level relationship.
return targetDirectory + uriDocumentTarget.ToString().TrimStart(‘/’);
}
...
private const string _packageRelationshipType =
"https://schemas.openxmlformats.org/package/2007/" +
"relationships/htmx/root-html";
private const string _resourceRelationshipType =
"https://schemas.openxmlformats.org/package/2007/" +
"relationships/htmx/required-resource";
ExtractPart
private static void ExtractPart(
PackagePart packagePart, string targetDirectory)
{
// Remove leading slash from the Part Uri and make a new
// relative Uri from the result.
string stringPart = packagePart.Uri.ToString().TrimStart(‘/’);
Uri partUri = new Uri(stringPart, UriKind.Relative);
// Create an absolute file URI by combining the target directory
// with the relative part URI created from the part name.
Uri uriFullFilePath = new Uri(
new Uri(targetDirectory, UriKind.Absolute), partUri);
// Create the necessary directories based on the full part path
Directory.CreateDirectory(
Path.GetDirectoryName(uriFullFilePath.LocalPath));
// Write the file from the part’s content stream.
using (FileStream fileStream =
File.Create(uriFullFilePath.LocalPath))
{
CopyStream(packagePart.GetStream(), fileStream);
}
}
Signing and Validating a Package
The PackageHtmxSign sample program shows how digital signatures can be used to sign parts and relationships contained in a package. Using an X.509 certificate, you can identify the entity signing the content elements and create an encrypted hash that can then be used to validate that the content has not changed.
Typically an application would define a policy that lists the specific parts and relationships to be signed. To keep things simple for our .htmx format, our policy is to simply sign all parts and relationships. The SignAllParts method shows how to digitally sign all the parts and relationships in a package. Meanwhile, the ValidateSignatures method could be used to validate that all the signed elements have not been altered.
Designing a File Format
When planning your own package file format, there are several considerations you should keep in mind. First, use a package-level relationship to identify a starting part in the package. While you can use a fixed and mandatory part name, hardcoded part names can limit the flexibility of your package design. Relationships and relationship types enable easy discoverability without relying on predefined part names.
For parts that reference other resources, consider creating a relationship for each target resource. Building a tree of relationships and relationship types makes it easy to discover the structure of the items contained in the package without having to parse the content of each part.
For cases where all resources referenced by a part are defined with relationships, you may want to replace the references in the part content with the corresponding relationship ID values. By using relationship IDs, resource references can be changed simply by updating the appropriate relationship without having to parse and modify the content of the part itself. Note that this may not be practical if a part will be extracted or separated from the relationships that reference the actual target resources.
Try to avoid using relative references to resources outside the package. If the package is moved, the relative references to external resources will likely break.
Decide in advance whether the presence of unknown parts or relationships in your package format should be allowed. It’s easy to simply ignore unknown parts and relationships when they appear, but keep in mind that third-parties could add items into your package for undesirable purposes, such as tracking. To improve security, your application should regard unknown parts or relationships as invalid formats (or at least display an alert). And if the content is not intended to be changed, seriously consider signing the parts and relationships in your package.
Online Resources
- Office Open XML File Formats
- The Addressing Model of the Open Packaging Conventions
- Pack URIs in Windows Presentation Foundation
- Walkthrough: Word 2007 XML Format
Jack Davis is a Program Manager at Microsoft on the Windows Documents and Printing team. In a previous life he worked as a programmer/writer with the Windows Presentation Foundation SDK team. Jack can be reached at jack.davis@microsoft.com.
Andrey Shur is a Program Manager at Microsoft on the Windows Documents and Printing team.