Documentation on formats of stream objects in OLE files.

Parth Gupta 180 Reputation points
2024-05-31T10:52:43.25+00:00

Hi,

I am trying to parse a ".doc" file (Microsoft Word 97-2003 Document) (OLE), and I am looking for some documentation.

I found the following reference:

https://video2.skills-academy.com/en-us/openspecs/office_file_formats/ms-doc/ccd7b486-7881-484c-a137-51170af7cc22

https://video2.skills-academy.com/en-us/openspecs/windows_protocols/ms-cfb/53989ce4-7b05-4f8d-829b-d08d6148375b

Through these documentations, I can parse the sectors, the directory structure, the FAT and mini FAT, etc. I am, however, looking for documentation on the format of streams like 'Data', '\x05DocumentSummaryInformation', '\x05SummaryInformation', '\x01CompObj' and others.

Kindly lead to technical documentation of the above formats (if any).

Thanks.

@Tom Jebo

Office Open Specifications
Office Open Specifications
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Open Specifications: Technical documents for protocols, computer languages, standards support, and data portability. The goal with Open Specifications is to help developers open new opportunities to interoperate with Windows, SQL, Office, and SharePoint.
127 questions
{count} votes

Accepted answer
  1. Mike Bowen 1,516 Reputation points Microsoft Employee
    2024-05-31T19:51:58.77+00:00

    Hi @Parth Gupta ,

    There are multiple things referred to as "Data" in the documentation, but if you're referring to the Data stream for a Word document, it is defined in MS-DOC 2.1.3 Data Stream

    The other streams you asked about are defined in:

    I hope that answers your question.

    Best regards,

    Michael Bowen

    Sr. Escalation Engineer Microsoft Open Specifications

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful