The recent Wikipedia imbroglio centered around
DBpedia is the fundamental driver for this
particular blog post. At time of writing this blog post,
the DBpedia project definition in Wikipedia
remains unsatisfactory due to the following shortcomings:
- inaccurate and incomplete definition of the Project's What,
Why, Who, Where, When, and How
- inaccurate reflection of project essence, by skewing focus
towards data
extraction and data set dump production, which is at best a quarter
of the project.
Here are some insights on DBpedia, from the perspective of
someone intimately involved with the other three-quarters of the
project.
What is DBpedia?
A live Web accessible RDF
model database (Quad Store) derived from Wikipedia content
snapshots, taken periodically. The RDF database underlies a
Linked Data Space comprised of: HTML (and most recently
HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot
(warehouse) and Live Editions (currently being hot-staged).
This post is about the snapshot (warehouse) edition, I'll drop a
different post about the DBpedia Live Edition where a new
Delta-Engine covers both extraction and database record
replacement, in realtime.
When was it Created?
As an idea under the moniker "DBpedia" it was conceptualized in
late 2006 by researchers at University of Leipzig (lead by Soren
Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of
DBpedia (as described above) was released in February 2007. The
official DBpedia coming out party occurred at WWW2007, Banff,
during the inaugural Linked Data gathering, where it
showcased the virtues and immense potential of TimBL's Linked Data meme.
Who's Behind It?
OpenLink Software (developers of OpenLink
Virtuoso and providers of Web Hosting
infrastructure), University of Leipzig, and Freie Univerity,
Berlin. In addition, there is a burgeoning community of
collaborators and contributors responsible DBpedia based
applications, cross-linked data sets, ontologies (OpenCyc,
SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia
wouldn't be possible without the global content contribution and
curation efforts of Wikipedians, a point typically overlooked
(albeit inadvertently).
How is it Constructed?
The steps are as follows:
- RDF data set dump preparation via Wikipedia content extraction
and transformation to RDF model data, using the N3 data
representation format - Java and PHP
extraction code produced and maintained by the teams at Leipzig and
Berlin
- Deployment of Linked Data that enables Data browsing and
exploration using any HTTP aware user agent (e.g. basic Web
Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the
Pubby Linked Data Server during the early months of the DBpedia
project)
- SPARQL compliant Quad Store, enabling direct access to database
records via SPARQL (Query language, REST or SOAP Web Service, plus
a variety of query results serialization formats) - OpenLink
Virtuoso since first public release of DBpedia
In a nutshell, there are four distinct and vital components to
DBpedia. Thus, DBpedia doesn't exist if all the project offered was
a collection of RDF data dumps. Likewise, it doesn't exist without
a fully populated SPARQL compliant Quad Store. Last but not least,
it doesn't exist if you have a fully loaded SPARQL compliant Quad
Store isn't up to the cocktail of challenges (query load and
complexity) presented by live Web database accessibility.
Why is it Important?
It remains a live exemplar for any individual or organization
seeking to publishing or exploit HTTP based Linked Data on the
World Wide Web. Its existence continues to
stimulate growth in both density and quality of the burgeoning Web
of Linked Data.
How Do I Use it?
In the most basic sense, simply browse the HTML based resource
decriptor pages en route to discovering erstwhile undiscovered
relationships that exist across named entities and subject
matter concepts / headings. Beyond that, simply look at DBpedia
as a master lookup table in a Web hosted distributed database setup; enabling you to
mesh your local domain specific details with DBpedia records via
structured relations (triples or 3-tuples records), comprised of
HTTP URIs from both realms e.g., via owl:sameAs relations.
What Can I Use it For?
Expanding on the Master-Details point above, you can use its
rich URI corpus to alleviate tedium associated
with activities such as:
- List maintenance - e.g., Countries, States, Companies, Units of
Measurement, Subject Headings etc.
- Tagging - as a compliment to existing practices
- Analytical Research - you're only a LINK (URI) away from
erstwhile difficult to attain research data spread across a broad
range of topics
- Closed Vocabulary Construction - rather than commence the
futile quest of building your own closed vocabulary, simply
leverage Wikipedia's human curated vocabulary as our common
base.
Related