Data Access API of the Day Part II – Componentizing Data Access

Welcome to Part II of Data Access API of the Day; a brief history of the evolution of Microsoft’s Data Access APIs.

In Part I we looked at ODBC as Microsoft’s C-based Relational API, and the DAO and RDO automation interfaces that made relational databases exposed through ODBC available to languages like VB.

After discarding the idea of compromising ODBC’s relational heritage by grafting on support for non-relational stores and components, we looked to Microsoft’s Component Object Model (COM). Our goals for building reusable data access components seemed to fit in well with COM's componentized architecture, and OLE DB was born. 

[In retrospect "OLE DB" -- which stood for "OLE DataBase" -- wasn’t that great of a name; it didn’t really have anything to do with "OLE" (which originally stood for "Object Linking and Embedding", but became a generic term for COM components) and its main design premise was to work with all types of data; not just data from a database.  I used to joke that, other than the "OLE" and the "DB", the name was just fine. Except the fact that, at the time, the Microsoft Word spellchecker suggested "OddBall" as a replacement for "OLEDB", but even this didn’t seem so bad given that the suggested alternative for our competitor "IDAPI" was "Diaper Pail"… But I digress…]

In addition to being a first-class COM-based interface for working with data, OLE DB was designed to support the concept of a "federated" (or componentized) database.  The idea was to define a common base "Rowset" interface that could be used to represent a set of "tuples" (rows) regardless of where they came from or what they represented -- a query result, a base table or view, an ISAM, Text, or Excel file, the registry, email, directory services, etc.  OLE DB was even intended to be a part of "Cairo OFS" – Microsoft’s first foray into an object file system.  On top of this base Rowset interface, you could use COM’s QueryInterface to detect whether additional functionality (for example, scrolling, updating, or index-based navigation) was supported. 

Given this common data representation, we could build common components to add functionality against any type of data, such as a common indexing engine, a common cursoring component, or a common query processor that could provide database functionality on top of a multitude of both relational and non-relational data sources.  The fact that the representation of the source data matched the representation of the final results provided performance benefits as well as common model for code and tools to work with sets of data.

OLE DB's ability to expose common interfaces over a multitude of data source types made it particularly appealing to distributed services – in fact, Microsoft SQL Server today uses an OLE DB Rowset representation as its internal interface between the Relational Engine and Storage Engine, as well as supporting distributed queries among SQL Server and other heterogeneous data sources, and to support full-text indexing of SQL Server content through the Microsoft Search engine.

Like ODBC, OLE DB was a "low level" interface that used pointers, explicit memory management, and explicit lifetime control.  And like ODBC, it required a separate wrapper API to make it available (and suitable) to automation languages like Visual Basic.  So, we rearranged the letters of DAO, quickly rejected "DOA", and came up with "ADO".  "ADO" initially stood for "ActiveX Data Objects" (the "X" was silent, but had to be there due to a copyright issue) but eventually we dropped the ActiveX altogether, and it just became ADO.

ADO followed the same Connection/Command/Recordset model of DAO, but integrated technology from the acquired FOX database team for an advanced local cursor engine.  This "Disconnected Recordset" exposed the same ADO Recordset model as a forward only cursor over a stream of results from the database, and properties on the Recordset were used to determine whether the results were scrollable or updatable, and what type of isolation the results had from changes made to the store.  The Disconnected Recordset used metadata from the original query to generate insert, update, and delete statements to propagate local changes back to the store.

Together, ADO and OLE DB made up what we called "Microsoft’s Universal Data Access".

And then something happened.  Recognizing that we were losing developer mindshare to Java (COM in its original form was complex, and things like reference counting, memory handling, etc. were hard to get right) Microsoft embarked upon an ambitious project to develop a new, language-independent, managed framework for writing applications (.NET Framework) along with a new programming language (C#).  One of the strengths of the new platform was that the various components and APIs of the Framework were designed and reviewed as a cohesive unit, rather than a semi-related set of technologies shipped by different isolated teams.

Next: Part III – Disconnected Programming in a Managed Environment with ADO.NET

--

Mike Pizzo
Architect, Data Programmability

Comments