A great
post by Dare, especially his bringing into context the essence
of this matter refrred to by C.J. Date as "XML the New Database
Heresy".
I have little to add to this matter as
ourunderstanding and vision is aptly expressed via the
architecture and feature set of Virtuoso (this area was
actually addressed circa 1999).
We are heading into a era of multi-model databases,
these are single database engines that are capable of effectively
serving the requirements of the Hierarchical, Network, Relational,
and Object database
models. As we get closer to the unravelling of universal
storage, hopefully this will get clearer.
Back to Dare's commentary:
C.J. Date, one of the most influential names in the relational
database world, had some harsh words about XML's encroachment into
the world of relational databases in a recent article entitled
Date defends relational modelthat appeared on
SearchDatabases.com. Key parts of the article are excerpted
below
Date reserved his harshest criticism for the competition, namely
object-oriented and XML-based DBMSs. Calling them "the latest
fashions in the computer world," Date said he rejects the argument
that relational DBMSs are yesterday's news. Fans of object-oriented
database systems "see flaws in the relational model because they
don't fully understand it," he said.
Date also said that XML enthusiasts have gone overboard.
"XML was invented to solve the problem of data interchange, but
having solved that, they now want to take over the world," he said.
"With XML, it's like we forget what we are supposed to be doing,
and focus instead on how to do it."
Craig S. Mullins, the director of technology planning at BMC
Software and a SearchDatabase.com expert, shares Date's opinion of
XML. It can be worthwhile, Mullins said, as long as XML is only
used as a method of taking data and putting it into a DBMS. But
Mullins cautioned that XML data that is stored in relational DBMSs
as whole documents will be useless if the data needs to be queried,
and he stressed Date's point that XML is not a real data model.
Craig Mullins points are more straightforward to
answer since his comments don't jibe with the current state of the
art in the XML world. He states that you can't query XML documents
stored in databases but this is untrue. Almost three years ago, I
was writing articles about
querying XML documents stored in relational databases. Storing
XML in a relational database doesn't mean it has to be stored
inas an opaquebinary BLOB or as a big, bunch of text
which cannot effectively be queried. The next version of SQL Server
will have extensive capabilities for querying XML data in
relational database and doing joins across relational and XML data,
a lot of this functionality isdescribed in the article on
XML Support in SQL Server 2005. As for XML not having a data
model, I beg to differ. There is a data model for XML that many
applications and people adhere to, often without realizing that
they are doing so. This data model is the XPath 1.0
data model, which is being updated to handled typed data as the
XQuery and
XPath 2.0 data model.
Now to tackle the meat of C.J. Date's criticisms which
is that XML solves the problem of data interchange but now is
showing up in the database. The thing first point I'd like point
out is that there are two broad usage patterns of XML, it is
used to represent both rigidly structured tabular data(e.g.,
relational data or serialized objects) and semi-structured data
(e.g., office documents). The latter type of data will only grow
now that office productivity software like Microsoft Office have enabled
users to save theirdocuments as XML instead of proprietary
binary formats. In many cases, these documents cannot simply
shredded into relational tables. Sure you can shred an Excel
spreadsheet written inspreadsheetML into relational tables
but is the same really feasible for a Word document written in
WordprocessingML? Many enterprises would rather have their
important business data being stored and queried from a unified
location instead of the current situation where some data is in
document management systems, some hangs around as random files in
people's folders while some sits in a database management
system.
As for stating that critics of the relational model
don't understand it,I disagree. One of the
majorbenefits of using XMLin relational databases is
that it is a lot easier to deal with fluid schemas ordata
with sparse entries with XML. When the shape of the data tends to
change or is not fixed the relational model is simply
notdesigned to deal with this. Constantly changing your
database schema is simply not feasible and there is no easy way to
provide the extensibility of XML where one can say "after the
X element, any element from any
namespace can appear". How would one describe the capacity to store
“any data” in a traditional relational database without resorting
to an opaque blob?
I do tend to agree that some people are going
overboard and trying to model their data hierarchically instead of
relationally which experience has thought us is a bad idea.
Recently on the XML-DEV mailing list entitled Designing
XML to Support Information Evolution where Roger L. Costello
described his travails trying to model his data which was being
transferred as XML in a hierarchical manner. Micheal Champion
accurately described the process Roger Costello went through as
having "rediscovered the relational model". Ina response to
that thread I wrote "Hierarchical databases failed for a
reason".
Using hierarchy as a primary way to model data is bad
for at least the following reasons
-
Hierarchies tend to encourage redundancy. Imagine I have a
<Customer> element who has one or more
<ShippingAddress> elements as children as well as one or more
<Order> elements as children as well. Each order was shipped
to an address, so if modelled hierarchically each <Order>
element also will have a <ShippingAddress> element which
leads to a lot of unnecessary duplication of data.
-
In the real world, there are often multiplegroups to
which a piece of data belongs which often cannot be modelled with a
single hierarchy.
-
Data is too tightly coupled. If I delete a <Customer>
element, this means I've automatically deleted his entire order
history since allthe <Order> elements are children of
<Customer>. Similarly if I query for a <Customer>, I
end up getting all the <Order> information as well.
To put it simply, experience has taught the software world that
the relational model is a better way to model data than the
hierarchical model. Unfortunately, in the rush to embrace XML many
a repreating the mistakes from decades ago in the new
millenium.