RDF-ization is a term used by the Semantic Web community to describe the
process of generating RDF from non RDF Data Sources such as
(X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries,
Calendars, Contact Managers, Feed Subscriptions, Wikis, and other
information resource collections.
If the RDF generated, results in an entity-to-entity level network (graph) in
which each entity is endowed with a de-referencable HTTP based ID
(a URI), we end up with an enhancement to the
adds Hyperdata linking across extracted entities,
to the existing Hypertext based Web of linked documents (pages,
images, and other information resource types). Thus, I can use the
same URL linking mechanism to reference a
broader range of "Things" i.e., documents, things that documents
are about, or things loosely associated with documents.
The Virtuoso Sponger is an example of an RDF Middleware solution from OpenLink Software. It's an in-built
component of the Virtuoso Universal Server, and deployable in many
forms e.g., Software as Service (SaaS) or traditional software
installation. It delivers RDF-ization services via a collection of
Web information resource specific Cartridges/Providers/Drivers
covering Wikipedia, Freebase, CrunchBase, WikiCompany, OpenLibrary,
Digg, eBay, Amazon, RSS/Atom/OPML feed sources, XBRL, and many
RDF-ization alone doesn't ensure valuable RDF based Linked Data on the Web. The process of
producing RDF Linked Data is ultimately about the art of
effectively describing resources with an eye for context.
RDF-ization Processing Steps
- Entity Extraction
- Vocabulary/Schema/Ontology (Data Dictionary) mapping
- HTTP based Proxy URI generation
- Linked Data Cloud Lookups (e.g., perform UMBEL lookup to
add "isAbout" fidelity to graph and then lookup DBpedia and other LOD instance data enclaves for Identical
individuals and connect via "owl:sameAs")
- RDF Linked Data Graph projection that uses the
description of the container information resource to expose the
URIs of the distilled entities.
The animation that follows illustrates the process (5,000 feet
view), from grabbing resources via HTTP GET, to injecting RDF
Linked Data back into the Web cloud:
Note: the Shredder is a Generic Cartridge, so you would have one
of these per data source type (information resource type).