Injecting Linked Data into the Web has been a major pain point for those who seek personal, service, or organization-specific variants of DBpedia. Basically, the sequence goes something like this:
The problems typically take the following form:
To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.
Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.
Many options allow you to easily and quickly generate RDF data from other data sources:
Install the Faceted Browser VAD package (fct_dav.vad) which delivers the following:
Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --
A service from OpenLink Software, available at: http://uriburner.com, that enables anyone to generate structured descriptions -on the fly- for resources that are already published to HTTP based networks. These descriptions exist as hypermedia resource representations where links are used to identify:
The hypermedia resource representation outlined above is what is commonly known as an Entity-Attribute-Value (EAV) Graph. The use of generic HTTP scheme based Identifiers is what distinguishes this type of hypermedia resource from others.
The virtues (dual pronged serendipitous discovery) of publishing HTTP based Linked Data across public (World Wide Web) or private (Intranets and/or Extranets) is rapidly becoming clearer to everyone. That said, the nuance laced nature of Linked Data publishing presents significant challenges to most. Thus, for Linked Data to really blossom the process of publishing needs to be simplified i.e., "just click and go" (for human interaction) or REST-ful orchestration of HTTP CRUD (Create, Read, Update, Delete) operations between Client Applications and Linked Data Servers.
In similar vane to the role played by FeedBurner with regards to Atom and RSS feed generation, during the early stages of the Blogosphere, it enables anyone to publish Linked Data bearing hypermedia resources on an HTTP network. Thus, its usage covers two profiles: Content Publisher and Content Consumer.
The steps that follow cover all you need to do:
That's it! The discoverability (SDQ) of your content has just multiplied significantly, its structured description is now part of the Linked Data Cloud with a reference back to your site (which is now a bona fide HTTP based Linked Data Space).
HTML+RDFa based representation of a structured resource description:
<link rel="describedby" title="Resource Description (HTML)"type="text/html" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
JSON based representation of a structured resource description:
<link rel="describedby" title="Resource Description (JSON)" type="application/json" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
N3 based representation of a structured resource description:
<link rel="describedby" title="Resource Description (N3)" type="text/n3" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
RDF/XML based representations of a structured resource description:
<link rel="describedby" title="Resource Description (RDF/XML)" type="application/rdf+xml" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
As an end-user, obtaining a structured description of any resource published to an HTTP network boils down to the following steps:
If you are a developer, you can simply perform an HTTP operation request (from your development environment of choice) using any of the URL patterns presented below:
URIBurner is a "deceptively simple" solution for cost-effective exploitation of HTTP based Linked Data meshes. It doesn't require any programming or customization en route to immediately realizing its virtues.
If you like what URIBurner offers, but prefer to leverage its capabilities within your domain -- such that resource description URLs reside in your domain, all you have to do is perform the following steps:
When you install your own URIBurner instances, you also have the ability to perform customizations that increase resource description fidelity in line with your specific needs. All you need to do is develop a custom extractor cartridge and/or meta cartridge.
Motivation for this post arose from a series of Twitter exchanges between Tony Hirst and I, in relation to his blog post titled: So What Is It About Linked Data that Makes it Linked Data™ ?
At the end of the marathon session, it was clear to me that a blog post was required for future reference, at the very least :-)
"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.
Information makes the world tick!
Information doesn't exist without data to contextualize.
Information is inaccessible without a projection (presentation) medium.
All information (without exception, when produced by humans) is subjective. Thus, to truly maximize the innate heterogeneity of collective human intelligence, loose coupling of our information and associated data sources is imperative.
Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources bearing structured representations of data object descriptions. Remember, you have a single Identifier abstraction (generic HTTP URI) that embodies: Data Object Name and Data Representation Location (aka URL).
A structured representation of data exists when an Entity (Datum), its Attributes, and its Attribute Values are clearly discernible. In the case of a Linked Data Object, structured descriptions take the form of a hypermedia based Entity-Attribute-Value (EAV) graph pictorial -- where each Entity, its Attributes, and its Attribute Values (optionally) are identified using Generic HTTP URIs.
Examples of structured data representation formats (content types) associated with Linked Data Objects include:
You markup resources by expressing distinct entity-attribute-value statements (basically these a 3-tuple records) using a variety of notations:
You can achieve this task using any of the following approaches:
Our data access middleware heritage (which spans 16+ years) has enabled us to assemble a rich portfolio of coherently integrated products that enable cost-effective evaluation and utilization of Linked Data, without writing a single line of code, or exposing you to the hidden, but extensive admin and configuration costs. Post installation, the benefits of Linked Data simply materialize (along the lines described above).
Our main Linked Data oriented products include:
Socially enhanced enterprise and invididual collaboration is becoming a focal point for a variety of solutions that offer erswhile distinct content managment features across the realms of Blogging, Wikis, Shared Bookmarks, Discussion Forums etc.. as part of an integrated platform suite. Recently, Socialtext has caught my attention courtesy of its nice features and benefits page . In addition, I've also found the Mike 2.0 portal immensely interesting and valuable, for those with an enterprise collaboration bent.
Anyway, Socialtext and Mike 2.0 (they aren't identical and juxtaposition isn't seeking to imply this) provide nice demonstrations of socially enhanced collaboration for individuals and/or enterprises is all about:
As is typically the case in this emerging realm, the critical issue of discrete "identifiers" (record keys in sense) for data items, data containers, and data creators (individuals and groups) is overlooked albeit unintentionally.
Rather than using platform constrained identifiers such as:
It enables you to leverage the platform independence of HTTP scheme Identifiers (Generic URIs) such that Identifiers for:
simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked Data Objects endowed with High SDQ (Serendipitious Discovery Quotient). For example my Personal WebID is all anyone needs to know if they want to explore:
Even when you reach a point of equilibrium where: your daily activities trigger orchestratestration of CRUD (Create, Read, Update, Delete) operations against Linked Data Objects within your socially enhanced collaboration network, you still have to deal with the thorny issues of security, that includes the following:
FOAF+SSL, an application of HTTP based Linked Data, enables you to enhance your Personal HTTP scheme based Identifer (or WebID) via the following steps (peformed by a FOAF+SSL compliant platform):
Contrary to conventional experiences with all things PKI (Public Key Infrastructure) related, FOAF+SSL compliant platforms typically handle the PKI issues as part of the protocol implementation; thereby protecting you from any administrative tedium without compromising security.
Understanding how new technology innovations address long standing problems, or understanding how new solutions inadvertently fail to address old problems, provides time tested mechanisms for product selection and value proposition comprehension that ultimately save scarce resources such as time and money.
If you want to understand real world problem solution #1 with regards to HTTP based Linked Data look no further than the issues of secure, socially aware, and platform independent identifiers for data objects, that build bridges across erstwhile data silos.
If you want to cost-effectively experience what I've outlined in this post, take a look at OpenLink Data Spaces (ODS) which is a distributed collaboration engine (enterprise of individual) built around the Virtuoso database engines. It simply enhances existing collaboration tools via the following capabilities:
Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items (if missing)
Your Life, Profession, Web, and Internet do not need to become mutually exclusive due to "information overload".
A platform or service that delivers a point of online presence that embodies the fundamental separation of: Identity, Data Access, Data Representation, Data Presentation, by adhering to Web and Internet protocols.
Typical post installation (Local or Cloud) task sequence:
I've just outlined a snippet of the capabilities of the OpenLink Data Spaces platform. A platform built using OpenLink Virtuoso, architected to deliver: open, platform independent, multi-model, data access and data management across heterogeneous data sources.
All you need to remember is your URI when seeking to interact with your data space.
A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.
From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:
From a Middleware perspective it provides:
From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering
From the general System Administrator's perspective it provides:
Higher level user oriented offerings include:
For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:
Now that the virtues of dynamic generation of RDF based Linked Data are becoming clearer, I guess it's time to unveil the Virtuoso Sponger driven Dynamic Linked Data constellation diagram.
Our diagram depicts the myriad of data sources from which RDF Linked Data is generated "on the fly" via our data source specific RDF-zation cartridges/drivers. It also unveils how the sponger leverages the Linked Data constellations of UMBEL, DBpedia, Bio2Rdf, and others for lookups.
RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.
If the RDF generated, results in an entity-to-entity level network (graph) in which each entity is endowed with a de-referencable HTTP based ID (a URI), we end up with an enhancement to the Web that adds Hyperdata linking across extracted entities, to the existing Hypertext based Web of linked documents (pages, images, and other information resource types). Thus, I can use the same URL linking mechanism to reference a broader range of "Things" i.e., documents, things that documents are about, or things loosely associated with documents.
The Virtuoso Sponger is an example of an RDF Middleware solution from OpenLink Software. It's an in-built component of the Virtuoso Universal Server, and deployable in many forms e.g., Software as Service (SaaS) or traditional software installation. It delivers RDF-ization services via a collection of Web information resource specific Cartridges/Providers/Drivers covering Wikipedia, Freebase, CrunchBase, WikiCompany, OpenLibrary, Digg, eBay, Amazon, RSS/Atom/OPML feed sources, XBRL, and many more.
RDF-ization alone doesn't ensure valuable RDF based Linked Data on the Web. The process of producing RDF Linked Data is ultimately about the art of effectively describing resources with an eye for context.
The animation that follows illustrates the process (5,000 feet view), from grabbing resources via HTTP GET, to injecting RDF Linked Data back into the Web cloud:
Note: the Shredder is a Generic Cartridge, so you would have one of these per data source type (information resource type).
The sweet spot of Web 3.0 (or any other Web.vNext moniker) is all about providing Web Users with a structured and interlinked data substrate that facilitates serendipitous discovery of relevant "Things" i.e., a Linked Data Web -- a Web of Linkable Entities that goes beyond documents and other information resource (data containers) types.
Understanding potential Linked Data Web business models, relative to other Web based market segments, is best pursued via a BCG Matrix diagram, such as the one I've constructed below:
To conclude, the Linked Data Web's market opportunities are all about the evolution of the Web into a powerful substrate that offers a unique intersection of "Link Density" and "Relevance", exploitable across horizontal and vertical market segments to solutions providers. Put differently, SDQ is how you take "The Ad" out of "Advertising" when matching Web users to relevant things :-)