Data
Structures and RDF Time to chime in on the RDF debate. There
are four general ways of storing information: A list, in which one
has a number of items, which may or not be related to one another.
A table, in which one has a number of items (records), each with a
distinct set of properties or columns. A tree, in which one has a
hierarchy of items. A graph, in which one has a number of items
(nodes), with the nodes connected to each other in some way. There
are others, but they are more or less just variations of the same.
There are examples all over of each type. Arrays are examples of
lists. Of course, they are used all over the place. Relational
databases typically store all of their data in tables. So do
spreadsheets. Trees are used for mail or news messages and your
bookmarks. XML is a syntax for specifying trees of information. The
Windows and Classic Macintosh file systems are presented and/or
stored as a tree. The Unix file system however isn't a tree. It's a
graph. RDF is a graph. The Web is also a graph -- it's a bunch of
pages connected via links. Each of the four storage methods, lists,
tables, trees, and graphs, increase in complexity as you go up.
Lists are simple to store. Graphs are the most difficult. Actually,
that doesn't need to be the case. But, very few programming
languages come with any kind of Graph structure ready to use. Due
to the complexity, you should probably store data in the lowest
type possible, depending on the kind of data you have. You can
always use one of the structures higher than what is necessary. A
list could be stored in a table with only one column, a table can
be stored in a tree, where a root node has a set of records, each
with a set of properties, and a tree is really a specialized form
of graph. However, the reverse is not true. You can't store a graph
in a tree, you can't store a tree in a table, and you can't store a
table in a list. Any place where you see someone trying to is a
hack. Many people don't know this though. So they just store
everything in a tabular database or in XML, regardless of what it
is. This has two problems. First, you get data that can be stored
in a simpler format, stored in some more complex format. So you get
people passing lists of things around using XML. Or, configuration
files stored in XML. Second, you get people trying to coerce more
complex data into a simpler format, so you might see people trying
to shove trees of data into a database. Or you get serialized RDF
written as XML. Many people think that XML is the ultimate format
for storing data. It isn't. It can represent trees nicely, and it
can do tables and lists if you really wanted it to, but it can't
represent graphs, not cleanly anyway. Perhaps what is needed is an
eXtensible Graph Language, which represents graphs of data. There
is RDF-XML, and XGMML but both use a language for describing trees.
Actually, it shouldn't be called the eXtensible Graph Language,
because then people will get confused thinking it's like XML.
Because a tree can be represented as a graph, all data could be
represented in the Graph Language (not that it should be, of
course), unlike XML which can't. Of course, this assumes there
isn't some higher level structure above the graph. Long, long ago,
people stored data in lists, because that was all that was
available. Then, someone came up with the idea of storing data in
tables. So relational databases came along and people moved up the
ladder to tables. A few years ago, XML came along so data moved up
again to trees. Can you guess what will happen next? The Semantic
Web folks want us to move to using graphs. Should we move to
graphs? Seems to be the next logical step in information evolution.
What's holding us back? Well, it's probably too soon. The world is
still in the tree phase. One day, graphs will start to become more
popular -- it will just take time. In 30 years, someone might come
up with something beyond graphs, and we'll all slowly switch to it
as well. There's also the RSS in RDF debate. Many people don't see
the value in storing RSS data in RDF. This is because the
information stored in a single RSS file isn't a graph -- it's a
tree, so plain-old XML actually makes more sense. Of course, the
Semantic Web folks don't agree. Why? Because they aren't thinking
in terms of a single RSS file - they are thinking of building giant
collections of RSS data, all linked together so that it forms one
giant - hey, it's not a tree - it's a graph. Then, you can search
and navigate it like you can with the existing Web. But of course,
the Semantic Web lets the servers and the software you're using,
know more about what you're talking about. This is unlike current
popular search engines like Google which are pretty much just
guessing. You can make it better, sure, but the best way to acheive
accuracy is if someone tells it the answer to begin with.