List of all URLs in docx file

Parth Gupta 180 Reputation points
2023-07-14T09:31:58.6933333+00:00

Hi,

I am trying to analyze various .docx files (OOXML) for purposes of finding maliciousness in them. Now i know that one way is to hide malicious URLs. Now, due to the file structure of these files, they contain a lot of links that look like this:
http://schemas.microsoft.com/office/word/2012/wordml"

http://schemas.openxmlformats.org/markup-compatibility/2006"

http://schemas.openxmlformats.org/officeDocument/2006/relationships"

etc..

Could you provide an exhaustive list of legitimate URLs found in docx files. For example, I know that URLs starting with "schemas.microsoft.com" are legitimate.

Please provide me with a list of legitimate URLs found in docx files.

Thanks,

Parth Gupta

Office Open Specifications
Office Open Specifications
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Open Specifications: Technical documents for protocols, computer languages, standards support, and data portability. The goal with Open Specifications is to help developers open new opportunities to interoperate with Windows, SQL, Office, and SharePoint.
127 questions
0 comments No comments
{count} votes

Accepted answer
  1. Mike Bowen 1,516 Reputation points Microsoft Employee
    2023-07-14T18:13:27.63+00:00

    Hi @Parth Gupta ,

    I can't provide an exhaustive list of valid URIs, but if you look through the the Office Standard you can find a collection of valid URIs. @John Korchok is correct, however, those are URI namespaces used by the file format and should not pose a risk, so either way you can rule that out.

    Best Regards,

    Michael Bowen

    Escalation Engineer - Microsoft Open Specifications

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. John Korchok 5,156 Reputation points
    2023-07-14T15:23:48.0333333+00:00

    Those are URIs, not URLs. Here an article explaining the difference:

    https://en.wikipedia.org/wiki/Uniform_Resource_Identifier

    URIs can also function as URLs, but most of the ones in an Office file do not.

    1 person found this answer helpful.