XML, the New Database Heresy

C.J. Date, one of the most influential names in the relational database world, had some harsh words about XML's encroachment into the world of relational databases in a recent article entitled Date defends relational model  that appeared on SearchDatabases.com. Key parts of the article are excerpted below

Date reserved his harshest criticism for the competition, namely object-oriented and XML-based DBMSs. Calling them "the latest fashions in the computer world," Date said he rejects the argument that relational DBMSs are yesterday's news. Fans of object-oriented database systems "see flaws in the relational model because they don't fully understand it," he said.

Date also said that XML enthusiasts have gone overboard.

"XML was invented to solve the problem of data interchange, but having solved that, they now want to take over the world," he said. "With XML, it's like we forget what we are supposed to be doing, and focus instead on how to do it."

Craig S. Mullins, the director of technology planning at BMC Software and a SearchDatabase.com expert, shares Date's opinion of XML. It can be worthwhile, Mullins said, as long as XML is only used as a method of taking data and putting it into a DBMS. But Mullins cautioned that XML data that is stored in relational DBMSs as whole documents will be useless if the data needs to be queried, and he stressed Date's point that XML is not a real data model.

Craig Mullins points are more straightforward to answer since his comments don't jibe with the current state of the art in the XML world. He states that you can't query XML documents stored in databases but this is untrue. Almost three years ago, I was writing articles about querying XML documents stored in relational databases. Storing XML in a relational database doesn't mean it has to be stored in as an opaque binary BLOB or as a big, bunch of text which cannot effectively be queried. The next version of SQL Server will have extensive capabilities for querying XML data in relational database and doing joins across relational and XML data, a lot of this functionality is described in the article on XML Support in SQL Server 2005. As for XML not having a data model, I beg to differ. There is a data model for XML that many applications and people adhere to, often without realizing that they are doing so. This data model is the XPath 1.0 data model, which is being updated to handled typed data as the XQuery and XPath 2.0 data model.

Now to tackle the meat of C.J. Date's criticisms which is that XML solves the problem of data interchange but now is showing up in the database. The thing first point I'd like point out is that there are two broad usage patterns of XML, it  is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The latter type of data will only grow now that office productivity software like Microsoft Office have enabled users to save their documents as XML instead of proprietary binary formats. In many cases, these documents cannot simply shredded into relational tables. Sure you can shred an Excel spreadsheet written in spreadsheetML into relational tables but is the same really feasible for a Word document written in WordprocessingML? Many enterprises would rather have their important business data being stored and queried from a unified location instead of the current situation where some data is in document management systems, some hangs around as random files in people's folders while some sits in a database management system.

As for stating that critics of the relational model don't understand it, I disagree. One of the major benefits of using XML in relational databases is that it is a lot easier to deal with fluid schemas or data with sparse entries with XML. When the shape of the data tends to change or is not fixed the relational model is simply not designed to deal with this. Constantly changing your database schema is simply not feasible and there is no easy way to provide the extensibility of XML where one can say "after the X element, any element from any namespace can appear". How would one describe the capacity to store “any data” in a traditional relational database without resorting to an opaque blob?

I do tend to agree that some people are going overboard and trying to model their data hierarchically instead of relationally which experience has thought us is a bad idea. Recently on the XML-DEV mailing list entitled Designing XML to Support Information Evolution where Roger L. Costello described his travails trying to model his data which was being transferred as XML in a hierarchical manner. Micheal Champion accurately described the process Roger Costello went through as having "rediscovered the relational model". In a response to that thread I wrote "Hierarchical databases failed for a reason".

Using hierarchy as a primary way to model data is bad for at least the following reasons

  1. Hierarchies tend to encourage redundancy. Imagine I have a <Customer> element who has one or more <ShippingAddress> elements as children as well as one or more <Order> elements as children as well. Each order was shipped to an address, so if modelled hierarchically each <Order> element also will have a <ShippingAddress> element which leads to a lot of unnecessary duplication of data.
  2. In the real world, there are often multiple groups to which a piece of data belongs which often cannot be modelled with a single hierarchy.  
  3. Data is too tightly coupled. If I delete a <Customer> element, this means I've automatically deleted his entire order history since all the <Order> elements are children of <Customer>. Similarly if I query for a <Customer>, I end up getting all the <Order> information as well.

To put it simply, experience has taught the software world that the relational model is a better way to model data than the hierarchical model. Unfortunately, in the rush to embrace XML many a repreating the mistakes from decades ago in the new millenium.

Comments

  • Anonymous
    May 28, 2004
    The comment has been removed

  • Anonymous
    May 28, 2004
    I have to disagree with your points against a hierarchical data model:

    1) I'm not sure what's bad about this -- if you're saying that having the atual ShippingAddress makes it overly verbose, well, sure, but your example can be modeled in many ways such as:
    <ShippingAddress city="San Diego" state="CA">
    <Order>
    <ProductID=1/>
    </Order>
    <Order>
    <ProductID=2/>
    </Order>
    </ShippingAddress>

    probably wouldn't want to set it up in this fashion...

    2) True, this kind of fits in your first example -- but XSD KEY/KEYREF is able to model some of this..

    3) sometimes this is very good, if I get customers the next step might be querying for customer orders -- having it right there reduces the need for another roundtrip to the DB... in any case, it depends on the situation if this is a good thing or not, and it might be feasable to reduce the granularity by structuring the data in another way or using multiple XML files..

  • Anonymous
    May 28, 2004
    Sean,
    (1) & (2) Sure, it is possible to use IDs and keys but in that case what you are doing isn't hierarchical modeling but relational modeling. The question isn't whether you can't model relationally in XML but whether you should strictly use hierarchy when modelling XML data. It seems you agree with me that using hierarchical modelling isn't the best option given your examples of using keys or IDs.

    (3) Again, it seems you agree. Your point is that you can decouple the data by removing the hierarchy but in which case you aren't primarily modelling the data hierarchically. Of course, there are some cases where the tight data coupling is good but I've seen a number of cases where our users complain about the lack of filtering or subtree pruning in languages like XPath and XQuery when performing queries such as get <Customer> but don't bother fetching the <Order>s.

  • Anonymous
    June 04, 2004
    The comment has been removed

  • Anonymous
    May 30, 2009
    PingBack from http://outdoorceilingfansite.info/story.php?id=5298

  • Anonymous
    May 31, 2009
    PingBack from http://outdoorceilingfansite.info/story.php?id=22934

  • Anonymous
    June 16, 2009
    PingBack from http://fixmycrediteasily.info/story.php?id=704