An overview of .NET serializers
A common question developers have about serialization in .NET is what serializer they should use. The answer usually depends on two things: what format you want your serialized data to be in and whether you want to be working with shared types or shared contracts. Since this isn't always an obvious distinction, here's a bit more about the difference between shared types and shared contracts.
Shared Types
Shared types serialization happens under the assumption that any type serialized to a stream is available in the very same assembly on the machine where the stream is being deserialized. These types of serializers will actually write out the name of the type and the assembly that type belongs to so that on deserialization, the serializer knows what type to create to map the stream data back into an object. For example, consider the following possible output for BinaryFormatter (one of the Shared Types serializers):
☺ ????☺ ♀☻ KConsoleApplication10, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null♣☺ ▬ConsoleApplication10.A☺ ♠myType♦▬ConsoleApplication10.B☻ ☻ ♥ ♣♥ ▬ConsoleApplication10.B☺ ☺s☺☻ ♠♦ Youssef♂
Code:
IFormatter formatter = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
formatter.Serialize(ms, new A());
ms.Flush();
ms.Position = 0;
Console.WriteLine(new StreamReader(ms).ReadToEnd());
Notice how the assembly name is written out on the wire, along with the full type names for the types being serialized. The main advantage of using shared types is that you get “tight coupling”. If you install the same assemblies on all the machines in your enterprise for example, you can be guaranteed that the type being serialized is exactly the same type being deserialized according to the CLR. Tight coupling also means that Shared Type serializers can handle OO features like inheritance much better than shared contract serializers. Another advantage of using shared types is that you don’t need to specify what type you’re serializing when creating a serializer. Any type can be serialized by a shared types serializer since the type of the graph gets serialized out to the wire.
The downsides of using shared types are that you lose in performance and cannot use shared types serialization in cross-platform scenarios. Performance suffers because the assembly name and type names get serialized out even though the deserializing machine might know what type to expect. For example, let’s say you have a server application that should only ever receive instances of a type MyType. Well, using BinaryFormatter to send a million messages to this server would mean that the assembly name of MyType would get written out on the wire a million times, even though it’s not needed at all. This can be a large performance cost. The other downside is that, for all practical purposes, shared type serializers are platform-dependent. An instance of a type serialized using BinaryFormatter cannot be easily deserialized by a Java application, or any other platform’s serializer for that matter. Using shared contracts fixes both of these issues.
Shared Contracts
In shared contract serializers, the serializer and deserializer agree on the types being sent on the wire beforehand. So, in the previous example, the serializer and deserializer would agree that MyType would be sent on the wire, and agree about what MyType “looks like”. They might agree that MyType has a string parameter and an int parameter. The means by which the agreement happens can be manual, that is creating the same type on both ends of the wire, or implicit through the SOA paradigm: the server exposes its metadata in the form of a WSDL and XSDs, the client creates a proxy based off of the metadata, and now the server and client can communicate with the understanding that they are dealing with the same set of types. The types don’t have to be exactly the same, but they have to map to the same XML in the metadata’s XSDs. This is why the coupling here is “loose” compared to the coupling used in Shared Type serializers. Because the serializer agrees on the types being sent beforehand, the assembly and type names don’t have to be written out on the wire:
<A xmlns="https://schemas.datacontract.org/2004/07/ConsoleApplication10" xmlns:i="https://www.w3.org/2001/XMLSchema-instance"><myType><s>Youssef</s></myType></A>
Code:
XmlObjectSerializer serializer = new DataContractSerializer(typeof(A));
serializer.WriteObject(new XmlTextWriter(Console.Out), new A());
The advantages of using shared contracts are (surprise, surprise) that you enable cross-platform interoperability and gain in performance. The downside is that you have to worry about “how” your serializer and deserializer will agree on types, and you lose tight CLR coupling (which brings about known type issues).
Here’s a table categorizing the .NET serializers:
Shared Types |
Shared Contracts |
|
Binary |
BinaryFormatter |
|
XML |
NetDataContractSerializer |
XmlSerializer DataContractSerializer |
Json |
DataContractJsonSerializer |
I purposely didn’t include SoapFormatter, but SoapFormatter is a Shared Types serializer. This table should make it clear in most cases what serializer to use. If you pick between Shared Types and Shared Contracts and pick between the different formats, you should be left with only one conflict:
XmlSerializer vs DataContractSerializer
The short story between XmlSerializer and DataContractSerializer is that you should use XmlSerializer if you need a great deal of control over the shape of the XML being emitted and use DataContractSerializer otherwise. XmlSerializer allows you to configure to a great extent the XML you want to be emitted. This is useful for example if you need your serialized instances to comply with a particular XML schema. DataContractSerializer allows you to configure the XML much less, but is more functional, more performant, and generally easier to use.
For example, DataContractSerializer doesn’t allow you to configure XML attributes on your emitted XML, while XmlSerializer does. There are so many differences between XmlSerializer and DataContractSerializer that I could devote an entire blog post to this issue… and I will as soon as I get a chance.
Conclusion
The .NET framework ships with a good number of built-in serializers. Deciding whether you want to use Shared Types or Shared Contracts and what format you need your serialized instances to appear in should allow you to decide what serializer’s best suited for your needs in most cases.
Comments
Anonymous
April 15, 2009
Thank you for submitting this cool story - Trackback from DotNetShoutoutAnonymous
June 12, 2009
One of the most common requests I get about serialization is to explain how to get collections to serializeAnonymous
June 12, 2009
Probably the aspect of WCF serialization developers have the hardest time with is known types. In fact,