MS-OVBA - V3 Content Normalized Data

Alexander P 21

Hi!
I am to implement the V3 Content Normalized Data function.
When appending the buffer of the references the algorithm says:

IF next USHORT = REFERENCENAME.Id THEN
...
END IF

IF next USHORT = REFERENCENAME.Reserved THEN
...
END IF

As this function is not reading anything so what is the next USHORT?

Any help appreciated!

Jeff McCashland 476 Reputation points Microsoft Employee

2021-11-18T19:40:34.89+00:00

Hello Alexander,

Thank you for posting your question. One of our engineers will respond soon.

Best Regards,
Jeff McCashland
Microsoft Open Specifications
Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-18T21:38:54.763+00:00

Hi @Alexander P

I will be helping you with this question.

Hungchun Yu
Microsoft Open Specifications
SvenSo 1 Reputation point

2021-11-19T11:20:02.567+00:00

I am currently implementing this function too (what a coincidence :D).
I was struggling there too, made some assumptions and tried multiple ways.
Unfortunately, it's absolutely hard to find an error with no sample data to validate intermediate hashes.

@Hung-Chun Yu
Do you think it'd be possible to share some samplesets ( vbaProject.bin + resulting V3ContentNormalizedData + resulting FormsNormalizedData + resulting NormalizeProjectStream ). This would help developers to find errors themself and interpret the spec in the right way.
Alexander P 21 Reputation points

2021-11-19T11:28:33.83+00:00

Hi!
I just need to know what the next USHORT means. To me a next USHORT would be something to read (a 2 byte value) and not to write. But this algorithm should describe the creation of the buffer! FormsNormalizedData is not the problem, because it was also used in the agile signature.
Cheers
Alex
Alexander P 21 Reputation points

2021-11-19T13:29:09.603+00:00

Hi @SvenSo ,
what are your thoughts in the two "IF next USHORT..." statements?
Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-19T19:18:13.65+00:00

Hi @SvenSo

I will ask Product Group to see if there are Sample sets that we can share with Public.

Hung-Chun Yu 976 Microsoft Employee

@Alexander P and @SvenSo

Here is what I found out what next USHORT means. Following is not official yet, is still under review by feature owner and Product Group

      //Peek (but do not read) the next USHORT in the stream.  
         IF next USHORT = 0x0016 THEN   
             APPEND Buffer WITH REFERENCENAME.Id (section 2.3.4.2.2.2)   
             APPEND Buffer WITH REFERENCENAME.SizeOfName (section 2.3.4.2.2.2)   
             APPEND Buffer WITH REFERENCENAME.Name (section 2.3.4.2.2.2)   
         END IF  

         //Peek (but do not read) the next USHORT in the stream (this may be the same USHORT as was peeked above if the above block was skipped)  
         IF next USHORT = 0x003E THEN   
             APPEND Buffer WITH REFERENCENAME.Reserved (section 2.3.4.2.2.2)   
             APPEND Buffer WITH REFERENCENAME.SizeOfNameUnicode (section 2.3.4.2.2.2)   
             APPEND Buffer WITH REFERENCENAME.NameUnicode (section 2.3.4.2.2.2)   
         END IF

Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-29T13:01:21.41+00:00

Hi @SvenSo

I got your sample request via dochelp. It will be an while before official Microsoft Samples will be ready.
Thanks to Alex, who is willing to share his samples with the public.

Here are the shared links NormalizedData_V3-8.bin and NormalizedData_V3-9.bin
Alexander P 21 Reputation points

2021-11-29T13:13:46.78+00:00

Hi @SvenSo ,
here are the links to the sample files V3-9.xlsm and V3-8.xlsm
SvenSo 1 Reputation point

2021-11-30T10:27:51.827+00:00

Dear @Alexander P and @Hung-Chun Yu

Thanks to these sample files and the updated documentation, I was able to fix the implementation within hours.
Thanks very much for your help!

Best regards,
Sven

Accepted answer

Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-19T22:31:05.063+00:00

Previous post does not contain the revision mark.

Here is a link to the extracted section.
https://1drv.ms/w/s!AvKjnW8J-ArBgfhx-vvx9k5V-t7xMQ?e=yCpFUm

Thank @Alexander P who provided suggested updates!
Please sign in to rate this answer.
Alexander P 21 Reputation points

2021-11-20T08:31:38.247+00:00

Hello @Hung-Chun Yu ,
I think that I know now what you mean, but the explenation you gave is not sufficient for a someone new to the documentation to fully understand the specs.
But I remebered a conversation with a MS employee about the agile signature and the missing part of the documentation that is now called FormsNormalizedData. He gave me a similar pseudo code to get the data, but he spoke of reading streams in contrary to the records that are mentioned in the algorithm.

I will try to explain what I mean.
The V3 Content Normalized Data speaks of records. Theses records are read when the VBA project is read (during the opening of the file). So these records now reside in memory and then you only work with those records! So when implementing the algorithm, you do not read any stream, you just work with the records you read when opening the file.
As working with these in memory records, you do never have a "next USHORT".

Your algorithm only makes sense, if you actually read the dir-stream when creating the content buffer. Maybe that is what you ment by "PARAMETERS Storage as VBA Storage..." in the beginning of the function. But the storage contains more than one stream!
In my opinion it yould make sense to either exchange the text "PARAMETERS Storage... " to "PARAMETERS dir-Stream (section 2.3.4.2) or you should just write "start reading the dir stream".

I will try to finalize the hash generation this weekend and get back to you if I have more information.

Cheers and a nice weekend
Alex

Alexander P 21 Reputation points

2021-11-20T19:21:06.977+00:00

Hi!
I have implemented the algorithm and the cryptographic digest is not matching.
Your explanation for the next USHORT just means: if there is a REFERENCENAME record within th REFERENCECONTROL record then write it down.
I' ve prepared a small Excel file with a short macro. Just the standard references and I've logged the V3ContentNormalizedData, the FormsNormalizedData and the ProjectNormalizedData.
The FormsNormalizedData should be OK, because it works, when I just attach the legacy and the agile signatures.

The bad thing is - i cannot upload a ZIP file with all those files. :-(

Cheers Alex

Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-21T07:13:25.233+00:00

@Alexander P

That is an excellent feedback, I will bring this feedback to the feature owner. Did you remember the name of Microsoft person who helped you?

Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-21T07:16:13.617+00:00

@ AlexanderP-7851

If can share the filelink via DropBox, OneDrive, or Google Drive. Or you can email the zip file to dochelp at microsoft.com, in the body of the email forward to Hung-Chun Yu. Looking forward to the file.

Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-23T18:13:20.883+00:00

@Alexander P

Thank you for point out that existing spec is somewhat misreading.

As for the customer’s comments about “records in memory” vs “stream”, this is a misreading of the documentation. All our project hashing happens at the storage / stream level and does not require loading the project into the VBA runtime, so all our documentation regarding signing is documented based on the storage/stream format and has nothing to do with in-memory representation (which is an implementation detail anyway). In fact, all our format documentation for all of Office is documenting on-disk format, not in-memory anything.

Product group is working on a sample that we can share with implementer so it should help people understand how it work.

Alexander P 21 Reputation points

2021-11-24T07:09:28.307+00:00

The problem is: you are assuming, that others do that the same way. For the FormsNomalizedData this would be the best way to implement it, because you just copy the stream buffers.
For the V3 Content Normalized Data there is not just reading. Some values are skipped. Others are transformed (MBSC to wide char) and you always speak of records. How someone treats those records must be left to the one implementing the algorithm. In my case, I read the whole project structure when opening the office file. I need to do that, because my users can read, manipulate, sign code. So I do not want to parse or read streams, do compressing/decompressing of code more than one time. Doing it several times would be ineficcient. When you document everything in a pseudo code, which creates a buffer, you should not assume that there is a "next anything". The better way would be in this case: If the ReferenceControl contains a NameRecordExtended record (section 2.3.4.2.2.2) Write ReferenceName.Id, ReferenceName.SizeOfName, ...
Then there would be a consistency in your algorithm.
Cheers
Alex

Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-24T07:11:33.743+00:00

Hi @Alexander P

Thank you for your suggestion.

Alexander P 21 Reputation points

2021-11-24T07:32:26.79+00:00

New insights
I made some tests with the writing of the ReferenceControl/ReferenceOriginal records.
Here your documentation - in contrary to the description of ReferenceRegistered - is OK. The LibIds are not transformed to wide char.
The documentation of ReferenceProject is also OK.
But my next test included a userform. As your documentation says, there was no changed to FormsNormalizedData. But, if I now add the FormsNormalizedData, the signature is considered as changed.
I will zip the sample and send it to you.

Alexander P 21 Reputation points

2021-11-29T12:32:29.04+00:00

Everything is OK but, you can omit the function with package normalized data. This is not used. Just the corrected Project Normalized data is needed
Package Normalized Package Data
FUNCTION NormalizePackageStream
PARAMETERS Stream AS stream
RETURNS array of bytes

DECLARE Buffer AS array of bytes SET Buffer TO resizable array of bytes FOR EACH property in ProjectProperties (section 2.3.1.1) IF property NOT is ProjectId (section 2.3.1.2) OR ProjectDocModule (section 2.3.1.4) OR ProjectProtectionState (section 2.3.1.15) OR ProjectPassword (section 2.3.1.16) OR ProjectVisibilityState (section 2.3.1.17) orid ProjectPckage THEN APPEND Buffer WITH property name APPEND Buffer WITH property value END IF END FOR APPEND the string “Host Extender Info” to Buffer APPEND HostExtenderRef without NWLN to Buffer

END FUNCTION
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

1 additional answer

Alexander P 21 Reputation points

2021-11-21T09:20:35.567+00:00

Hi!
I've just sent you an email with the zip file and the information about the MS employee I ve been talking to.
Cheers
Alex
Please sign in to rate this answer.
Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-21T13:08:49.293+00:00

@Alexander P

Filed received. I will post here when I get an update.
Thank you very much

Hungchun Yu
Microsoft Open Specifications

Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-22T18:27:51.43+00:00

Hi @Alexander P

Let us know if the Sample Code that was emailed to you helped?

Hungchun Yu
Microsoft Open Specifications

Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-23T18:16:15.053+00:00

Hi @Alexander P

Thank you for the updated Sample. I shared it with product group. I will send you an update via email.

Hung-Chun Yu 976 Reputation points Microsoft Employee

2021-11-29T11:59:24.073+00:00

Hi @Alexander P

Thank for provided great suggestion on the MS-OVBA specifications and point out the issues you have uncovered.
While Microsoft work on the incorporation and review of suggestions for the future revisions.

Could you kindly share your Samples (with your company's private information removed) you worked on with other implementers? If you can provide a shared Link in your reply, we would greatly appreciated.

Also if you can shared what you learned and pitfall that other implementers can avoid that can speed up their implementations.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

MS-OVBA - V3 Content Normalized Data

1 additional answer

Your answer