Data Structures and RDF Time to chime in on the RDF debate. There are four general ways of storing information: A list, in which one has a number of items, which may or not be related to one another. A table, in which one has a number of items (records), each with a distinct set of properties or columns. A tree, in which one has a hierarchy of items. A graph, in which one has a number of items (nodes), with the nodes connected to each other in some way. There are others, but they are more or less just variations of the same. There are examples all over of each type. Arrays are examples of lists. Of course, they are used all over the place. Relational databases typically store all of their data in tables. So do spreadsheets. Trees are used for mail or news messages and your bookmarks. XML is a syntax for specifying trees of information. The Windows and Classic Macintosh file systems are presented and/or stored as a tree. The Unix file system however isn't a tree. It's a graph. RDF is a graph. The Web is also a graph -- it's a bunch of pages connected via links. Each of the four storage methods, lists, tables, trees, and graphs, increase in complexity as you go up. Lists are simple to store. Graphs are the most difficult. Actually, that doesn't need to be the case. But, very few programming languages come with any kind of Graph structure ready to use. Due to the complexity, you should probably store data in the lowest type possible, depending on the kind of data you have. You can always use one of the structures higher than what is necessary. A list could be stored in a table with only one column, a table can be stored in a tree, where a root node has a set of records, each with a set of properties, and a tree is really a specialized form of graph. However, the reverse is not true. You can't store a graph in a tree, you can't store a tree in a table, and you can't store a table in a list. Any place where you see someone trying to is a hack. Many people don't know this though. So they just store everything in a tabular database or in XML, regardless of what it is. This has two problems. First, you get data that can be stored in a simpler format, stored in some more complex format. So you get people passing lists of things around using XML. Or, configuration files stored in XML. Second, you get people trying to coerce more complex data into a simpler format, so you might see people trying to shove trees of data into a database. Or you get serialized RDF written as XML. Many people think that XML is the ultimate format for storing data. It isn't. It can represent trees nicely, and it can do tables and lists if you really wanted it to, but it can't represent graphs, not cleanly anyway. Perhaps what is needed is an eXtensible Graph Language, which represents graphs of data. There is RDF-XML, and XGMML but both use a language for describing trees. Actually, it shouldn't be called the eXtensible Graph Language, because then people will get confused thinking it's like XML. Because a tree can be represented as a graph, all data could be represented in the Graph Language (not that it should be, of course), unlike XML which can't. Of course, this assumes there isn't some higher level structure above the graph. Long, long ago, people stored data in lists, because that was all that was available. Then, someone came up with the idea of storing data in tables. So relational databases came along and people moved up the ladder to tables. A few years ago, XML came along so data moved up again to trees. Can you guess what will happen next? The Semantic Web folks want us to move to using graphs. Should we move to graphs? Seems to be the next logical step in information evolution. What's holding us back? Well, it's probably too soon. The world is still in the tree phase. One day, graphs will start to become more popular -- it will just take time. In 30 years, someone might come up with something beyond graphs, and we'll all slowly switch to it as well. There's also the RSS in RDF debate. Many people don't see the value in storing RSS data in RDF. This is because the information stored in a single RSS file isn't a graph -- it's a tree, so plain-old XML actually makes more sense. Of course, the Semantic Web folks don't agree. Why? Because they aren't thinking in terms of a single RSS file - they are thinking of building giant collections of RSS data, all linked together so that it forms one giant - hey, it's not a tree - it's a graph. Then, you can search and navigate it like you can with the existing Web. But of course, the Semantic Web lets the servers and the software you're using, know more about what you're talking about. This is unlike current popular search engines like Google which are pretty much just guessing. You can make it better, sure, but the best way to acheive accuracy is if someone tells it the answer to begin with.

Kingsley Idehen's Blog Data Space

Details

Subscribe

Tag Cloud

Post Categories

Subscribe

Recent Articles

Comments

Post Comment

Kingsley Idehen's Blog Data Space

Details

Subscribe

Tag Cloud

Post Categories

Subscribe

Recent Articles

Related

Comments

Post Comment