What is Linked Data oriented RDF-ization?

Details

Kingsley Uyi Idehen

Lexington, United States

FOAF

RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.

If the RDF generated, results in an entity-to-entity level network (graph) in which each entity is endowed with a de-referencable HTTP based ID (a URI), we end up with an enhancement to the Web that adds Hyperdata linking across extracted entities, to the existing Hypertext based Web of linked documents (pages, images, and other information resource types). Thus, I can use the same URL linking mechanism to reference a broader range of "Things" i.e., documents, things that documents are about, or things loosely associated with documents.

The Virtuoso Sponger is an example of an RDF Middleware solution from OpenLink Software. It's an in-built component of the Virtuoso Universal Server, and deployable in many forms e.g., Software as Service (SaaS) or traditional software installation. It delivers RDF-ization services via a collection of Web information resource specific Cartridges/Providers/Drivers covering Wikipedia, Freebase, CrunchBase, WikiCompany, OpenLibrary, Digg, eBay, Amazon, RSS/Atom/OPML feed sources, XBRL, and many more.

RDF-ization alone doesn't ensure valuable RDF based Linked Data on the Web. The process of producing RDF Linked Data is ultimately about the art of effectively describing resources with an eye for context.

RDF-ization Processing Steps

Entity Extraction
Vocabulary/Schema/Ontology (Data Dictionary) mapping
HTTP based Proxy URI generation
Linked Data Cloud Lookups (e.g., perform UMBEL lookup to add "isAbout" fidelity to graph and then lookup DBpedia and other LOD instance data enclaves for Identical individuals and connect via "owl:sameAs")
RDF Linked Data Graph projection that uses the description of the container information resource to expose the URIs of the distilled entities.

The animation that follows illustrates the process (5,000 feet view), from grabbing resources via HTTP GET, to injecting RDF Linked Data back into the Web cloud:

Note: the Shredder is a Generic Cartridge, so you would have one of these per data source type (information resource type).