The recent Wikipedia imbroglio centered around DBpedia is the fundamental driver for this particular blog post. At time of writing this blog post, the DBpedia project definition in Wikipedia remains unsatisfactory due to the following shortcomings:
- inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
- inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.
Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.
What is DBpedia?
A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.
When was it Created?
As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.
Who's Behind It?
OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).
How is it Constructed?
The steps are as follows:
- RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
- Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
- SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia
In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.
Why is it Important?
It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.
How Do I Use it?
In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.
What Can I Use it For?
Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:
- List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
- Tagging - as a compliment to existing practices
- Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
- Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.
Related