Binary Encoding, Part 2
The binary format we developed is based on a tokenized stream of records and a few Huffman-like coding strategies. Each record starts with a one byte record type value. The record type byte is then followed by binary content of variable format and size based on the type. Each record in the stream of records translates into a document fragment. By concatenating all of the fragments produced from the record stream together we can obtain a document based on the original XML infoset. It’s relatively simple compared to many binary XML formats while still being highly expressive.
Here are the main properties of interest:
- Moderate reduction of message size. If a standard compression program, such as Gzip, could reduce your SOAP messages by 50%, then you might expect the binary encoding to reduce the same messages by say 30%. Those numbers are purely illustrative as space reduction is highly data-dependent and the relationship between the binary encoding and other compression formats is often non-linear and hard to predict. You likely could still run compression after the binary encoding to recapture that difference plus typically an additional bit more. The binary protocol doesn’t intentionally introduce randomness into the encoded document that would interfere with compression as you would see with processes such as encryption.
Moderate reduction of processing cost. Text-based XML serialization can have measurable cost when network speeds are high and the computational complexity of the service is low. The binary encoding is not as fast as a direct buffer copy when the same format is used for both transfer and application data structures, but the binary encoding is cheaper in many respects than handling a textual rendering of the same content.
- Few concepts required for understanding. In order to understand the binary encoding you need to know about records, string tables, and the XML infoset. There are dozens of types of records but it is unnecessary to have deep knowledge about what any record means. The first few sentences of today’s post have largely summed up what the binary encoding is about.
- Covers the expressive needs of data contracts and also most general-purpose XML documents. Next time I’ll talk about what is and what isn’t supported by the binary encoding.