Details
Kingsley Uyi Idehen
Lexington, United States
Subscribe
Post Categories
Subscribe
Recent Articles
Display Settings
DBpedia + BBC (combined) Linked Data Space Installation Guide
What?
The DBpedia + BBC
Combo Linked Dataset is a preconfigured Virtuoso Cluster (4 Virtuoso Cluster Nodes,
each comprised of one Virtuoso Instance; initial deployment is to a
single Cluster Host, but license may be converted for physically
distributed deployment), available via the Amazon EC2 Cloud,
preloaded with the following datasets:
Why?
The BBC has been publishing Linked Data from its Web Data Space for a number of years. In line
with best practices for injecting Linked Data into the World Wide Web (Web), the BBC datasets are
interlinked with other datasets such as DBpedia and
MusicBrainz.
Typical follow-your-nose exploration using a Web Browser (or
even via sophisticated SPARQL query crawls) isn't always practical
once you get past the initial euphoria that comes from
comprehending the Linked Data concept. As your queries get more
complex, the overhead of remote sub-queries increases its impact,
until query results take so long to return that you simply give
up.
Thus, maximizing the effects of the BBC's efforts requires
Linked Data that shares locality in a Web-accessible Data Space —
i.e., where all Linked Data sets have been loaded into the same
data store or warehouse. This holds true even when leveraging
SPARQL-FED style virtualization — there's always a need to localize
data as part of any marginally-decent locality-aware
cost-optimization algorithm.
This DBpedia + BBC dataset, exposed via a preloaded and
preconfigured Virtuoso Cluster, delivers a practical point of
presence on the Web for immediate and cost-effective exploitation
of Linked Data at the individual and/or service specific
levels.
How?
To work through this guide, you'll need to start with 90 GB of free
disk space. (Only 41 GB will be consumed after you delete the
installer archives, but starting with 90+ GB ensures enough work
space for the installation.)
Install Virtuoso
Download Virtuoso installer archive(s) . You
must deploy the Personal or Enterprise Edition; the Open Source
Edition does not support Shared-Nothing Cluster Deployment.
Obtain a Virtuoso Cluster license .
Install Virtuoso .
Set key environment variables and start the OpenLink License
Manager, using command (this may vary depending on your shell and
install directory):
.
/opt/virtuoso/virtuoso-enterprise.sh
Optional: To keep the default single-server configuration
file and demo database intact, set the VIRTUOSO_HOME
environment variable to a different directory, e.g.,
export
VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Note: You will have to adjust this setting every time
you shift between this cluster setup and your single-server setup.
Either may be made your environment's default through the
virtuoso-enterprise.sh
and related scripts.
Set up your cluster by running the
mkcluster.sh
script. Note that initial deployment of
the DBpedia + BBC Combo requires a 4 node cluster, which is
the default for this script.
Start the Virtuoso Cluster with this command:
virtuoso-start.sh
Stop the Virtuoso Cluster with this command:
virtuoso-stop.sh
Using the DBpedia + BBC Combo dataset
Navigate to your installation directory.
Download the combo dataset installer script — bbc-dbpedia-install.sh
.
For best results, set the downloaded script to fully executable
using this command:
chmod 755
bbc-dbpedia-install.sh
Shut down any Virtuoso instances that may be currently
running.
Optional: As above, if you have decided to keep the
default single-server configuration file and demo database intact,
set the VIRTUOSO_HOME
environment variable
appropriately, e.g.,
export
VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Run the combo dataset installer script with this command:
sh bbc-dbpedia-install.sh
Verify installation
The combo dataset typically deploys to EC2 virtual machines in
under 90 minutes; your time will vary depending on your network
connection speed, machine speed, and other variables.
Once the script completes, perform the following steps:
Verify that the Virtuoso Conductor (HTTP-based Admin UI) is in
place via:
http://localhost:[port]/conductor
Verify that the Virtuoso SPARQL endpoint is in place via:
http://localhost:[port]/sparql
Verify that the Precision Search & Find UI is in place
via:
http://localhost:[port]/fct
Verify that the Virtuoso hosted PivotViewer is in place via:
http://localhost:[port]/PivotViewer
Related
02/17/2011 17:15 GMT-0500
Modified: 03/29/2011 10:09
GMT-0500
Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5)
A declarative query language from the W3C for querying
structured propositional data (in the form of
3-tuple [triples] or 4-tuple [quads] records)
stored in a deductive database (colloquially referred
to as triple or quad stores in Semantic Web and Linked Data parlance).
SPARQL is inherently platform independent. Like SQL , the query language and the backend
database engine are distinct. Database clients capture SPARQL
queries which are then passed on to compliant backend
databases.
Why is it important?
Like SQL for relational databases, it provides a powerful
mechanism for accessing and joining data across one or more data
partitions (named graphs identified by IRIs). The aforementioned
capability also enables the construction of sophisticated Views,
Reports (HTML or those produced in native form by desktop
productivity tools), and data streams for other services.
Unlike SQL, SPARQL includes result serialization formats and an
HTTP based wire protocol. Thus, the ubiquity and sophistication of
HTTP is integral to SPARQL i.e., client side applications (user
agents) only need to be able to perform an HTTP GET against a
URL en route to exploiting the power of
SPARQL.
How do I use it, generally?
Locate a SPARQL endpoint (DBpedia , LOD Cloud
Cache , Data.Gov , URIBurner , others ), or;
Install a SPARQL compliant database server (quad or triple
store) on your desktop, workgroup server, data center, or cloud
(e.g., Amazon EC2 AMI )
Start the database server
Execute SPARQL Queries via the SPARQL
endpoint.
How do I use SPARQL with Virtuoso ?
What follows is a very simple guide for using SPARQL against
your own instance of Virtuoso:
Software Download and Installation
Data Loading from Data Sources exposed at Network Addresses
(e.g. HTTP URLs) using very simple methods
Actual SPARQL query execution via SPARQL endpoint.
Installation Steps
Download Virtuoso Open Source or Virtuoso Commercial Editions
Run installer (if using Commercial edition of Windows Open
Source Edition, otherwise follow build guide)
Follow post-installation guide and verify installation by
typing in the command: virtuoso -? (if this fails check you've
followed installation and setup steps, then verify environment
variables have been set)
Start the Virtuoso server using the command:
virtuoso-start.sh
Verify you have a connection to the Virtuoso Server via the
command: isql localhost (assuming you're using default DB settings)
or the command: isql localhost:1112 (assuming demo database) or
goto your browser and type in:
http://<virtuoso-server-host-name>:[port]/conductor (e.g.
http://localhost:8889/conductor for default DB or
http://localhost:8890/conductor if using Demo DB)
Go to SPARQL endpoint which is typically --
http://<virtuoso-server-host-name>:[port]/sparql
Run a quick sample query (since the database always has system
data in place): select distinct * where {?s ?p ?o} limit 50 .
Troubleshooting
Ensure environment settings are set and functional -- if using
Mac OS X or Windows, so you don't have to worry about this, just
start and stop your Virtuoso server using native OS services
applets
If using the Open Source Edition, follow the getting started guide -- it covers PATH
and startup directory location re. starting and stopping Virtuoso
servers.
Sponging (HTTP GETs against external Data Sources) within
SPARQL queries is disabled by default. You can enable this feature
by assigning "SPARQL_SPONGE " privileges to user
"SPARQL". Note, more sophisticated security exists via WebID based ACLs .
Data Loading Steps
Identify an RDF based structured data source of interest -- a
file that contains 3-tuple / triples available at an address on a
public or private HTTP based network
Determine the Address (URL) of the RDF data source
Go to your Virtuoso SPARQL endpoint and type in the following
SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM
<RDFDataSourceURL> WHERE {?s ?p ?o}
All the triples in the RDF resource (data source accessed via
URL) will be loaded into the Virtuoso Quad Store (using RDF Data
Source URL as the internal quad store Named Graph IRI) as part of
the SPARQL query processing pipeline.
Note: the data source URL doesn't even have to be RDF based --
which is where the Virtuoso Sponger Middleware comes into play
(download and install the VAD installer package first) since it
delivers the following features to Virtuoso's SPARQL engine:
Transformation of data from non RDF data sources (file content,
hypermedia resources, web services
output etc..) into RDF based 3-tuples (triples)
Cache Invalidation Scheme Construction -- thus, subsequent
queries (without the define get:soft "replace" pragma will not be
required bar when you forcefully want to override cache).
If you have very large data sources like DBpedia etc. from
CKAN, simply use our bulk loader .
SPARQL Endpoint Discovery
Public SPARQL endpoints are emerging at an ever increasing rate.
Thus, we've setup up a DNS lookup service that provides access to a
large number of SPARQL endpoints. Of course, this doesn't cover all
existing endpoints, so if our endpoint is missing please ping
me .
Here are a collection of commands for using DNS-SD to discover
SPARQL endpoints:
dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for
services instances
dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results
in Zone File format
Related
Using HTTP from Ruby -- you can just
make SPARQL Protocol URLs re. SPARQL
Using SPARQL Endpoints via Ruby -- Ruby
example using DBpedia endpoint
Interactive SPARQL Query By Example (QBE)
tool -- provides a graphical user interface (as is common in
SQL realm re. query building against RDBMS engines) that works with any
SPARQL endpoint
Other methods of loading RDF data into
Virtuoso
Virtuoso Sponger -- architecture and how
it turns a wide variety of non RDF data sources into SPARQL
accessible data
Using OpenLink Data Explorer (ODE) to
populate Virtuoso -- locate a resource of interest; click on a
bookmarklet or use context menus (if using ODE extensions for
Firefox, Safari, or Chrome); and you'll have SPARQL accessible data
automatically inserted into your Virtuoso instance.
W3C's SPARQLing Data Access Ingenuity --
an older generic SPARQL introduction post
Collection of SPARQL Query Examples --
GoodRelations (Product Offers), FOAF (Profiles), SIOC
(Data Spaces -- Blogs , Wikis , Bookmarks , Feed Collections , Photo Galleries , Briefcase/DropBox , AddressBook , Calendars , Discussion Forums )
Collection of Live SPARQL Queries against LOD
Cloud Cache -- simple and advanced queries.
01/16/2011 02:06 GMT-0500
Modified: 01/19/2011 10:43
GMT-0500
URIBurner: Painless Generation & Exploitation of Linked Data (Update 1 - Demo Links Added)
What is URIBurner?
A service from OpenLink Software , available at: http://uriburner.com , that enables anyone to
generate structured descriptions -on the fly- for resources that
are already published to HTTP based networks. These descriptions
exist as hypermedia resource representations where links are used
to identify:
the entity (data object or datum) being
described,
each of its attributes, and
each of its attributes values (optionally).
The hypermedia resource representation outlined above is what is
commonly known as an Entity -Attribute-Value (EAV) Graph. The use
of generic HTTP scheme based Identifiers is what distinguishes this
type of hypermedia resource from others.
Why is it Important?
The virtues (dual pronged serendipitous discovery) of publishing
HTTP based Linked Data across public (World Wide Web ) or private (Intranets and/or
Extranets) is rapidly becoming clearer to everyone. That said, the
nuance laced nature of Linked Data publishing presents significant
challenges to most. Thus, for Linked Data to really blossom the
process of publishing needs to be simplified i.e., "just click and
go" (for human interaction) or REST-ful orchestration of HTTP CRUD
(Create, Read, Update, Delete) operations between Client
Applications and Linked Data Servers.
How Do I Use It?
In similar vane to the role played by FeedBurner with regards to
Atom and RSS feed generation, during the early stages of the
Blogosphere, it enables anyone to publish Linked Data bearing
hypermedia resources on an HTTP network. Thus, its usage covers two
profiles: Content Publisher and Content Consumer.
Content Publisher
The steps that follow cover all you need to do:
place a
tag within your HTTP based hypermedia
resource (e.g. within section for HTML )
use a URL via the @href attribute value to
identify the location of the structured description of your
resource, in this case it takes the form:
http://linkeddata.uriburner.com/about/id/{scheme-or-protocol}/{your-hostname-or-authority}/{your-local-resource}
for human visibility you may consider adding associating a
button (as you do with Atom and RSS) with the URL above.
That's it! The discoverability (SDQ) of your content has just
multiplied significantly, its structured description is now part of
the Linked Data Cloud with a reference back to your site (which is
now a bona fide HTTP based Linked Data Space ).
Examples
HTML+RDFa based representation of a structured
resource description:
<link rel="describedby" title="Resource Description
(HTML)"type="text/html"
href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
JSON based representation of a structured resource
description:
<link rel="describedby" title="Resource Description
(JSON)" type="application/json"
href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
N3 based representation of a structured resource
description:
<link rel="describedby" title="Resource Description
(N3)" type="text/n3"
href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
RDF/XML based representations of a structured resource
description :
<link rel="describedby" title="Resource Description
(RDF/XML)" type="application/rdf+xml"
href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
Content Consumer
As an end-user, obtaining a structured description of any
resource published to an HTTP network boils down to the following
steps:
go to: http://uriburner.com
drag the Page Metadata Bookmarklet link to your Browser's
toolbar
whenever you encounter a resource of interest (e.g. an HTML
page) simply click on the Bookmarklet
you will be presented with an HTML representation of a
structured resource description (i.e., identifier of the entity
being described, its attributes, and its attribute values will be
clearly presented).
Examples
If you are a developer, you can simply perform an HTTP operation
request (from your development environment of choice) using any of
the URL patterns presented below:
HTML:
curl -I -H "Accept: text/html"
http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
JSON:
curl -I -H "Accept: application/json"
http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl
http://linkeddata.uriburner.com/about/data/json/{scheme}/{authority}/{local-path}
Notation 3 (N3):
curl -I -H "Accept: text/n3"
http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl
http://linkeddata.uriburner.com/about/data/n3/{scheme}/{authority}/{local-path}
curl -I -H "Accept: text/turtle"
http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl
http://linkeddata.uriburner.com/about/data/ttl/{scheme}/{authority}/{local-path}
RDF/XML:
curl -I -H "Accept: application/rdf+xml"
http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl
http://linkeddata.uriburner.com/about/data/xml/{scheme}/{authority}/{local-path}
Conclusion
URIBurner is a "deceptively simple" solution for cost-effective
exploitation of HTTP based Linked Data meshes. It doesn't require
any programming or customization en route to immediately realizing
its virtues.
If you like what URIBurner offers, but prefer to leverage its
capabilities within your domain -- such that resource description
URLs reside in your domain, all you have to do is perform the
following steps:
download a copy of Virtuoso (for local
desktop, workgroup, or data center installation) or
instantiate Virtuoso via the Amazon EC2 Cloud
enable the Sponger Middleware component via the RDF Mapper VAD
package (which includes cartridges for over 30 different resources
types )
When you install your own URIBurner instances, you also have the
ability to perform customizations that increase resource
description fidelity in line with your specific needs. All you need
to do is develop a custom extractor cartridge and/or meta
cartridge.
Related:
03/10/2010 12:52 GMT-0500
Modified: 03/11/2010 10:16
GMT-0500
Meshups Demonstrating How SPARQL-GEO Enhances Linked Data Exploitation (Update 2)
Deceptively simple demonstrations of how Virtuoso 's SPARQL -GEO extensions to SPARQL lay critical
foundation for Geo Spatial solutions that seek to leverage the
burgeoning Web of Linked Data .
SPARQL Endpoint: Linked Open Data Cache (8.5 Billion+ Quad
Store which includes data from Geonames and the Linked GeoData Project Data Sets) .
Live Linked Data Meshup Links:
Related
03/06/2010 17:43 GMT-0500
Modified: 03/24/2010 11:44
GMT-0500
Revisiting HTTP based Linked Data (Update 1 - Demo Video Links Added)
Motivation for this post arose from a series of Twitter
exchanges between Tony Hirst and I, in relation to his
blog post titled: So What Is It About Linked Data that Makes it
Linked Data™ ?
At the end of the marathon session, it was clear to me that a blog post was required for future
reference, at the very least :-)
"Data Access by Reference " mechanism for Data
Objects (or Entities) on HTTP networks. It enables you to Identify
a Data Object and Access its structured Data Representation via a
single Generic HTTP scheme based Identifier (HTTP URI ). Data Object representation formats may
vary; but in all cases, they are hypermedia oriented, fully structured, and
negotiable within the context of a client-server message
exchange.
Why is it Important?
Information makes the world tick!
Information doesn't exist without data to contextualize.
Information is inaccessible without a projection (presentation)
medium.
All information (without exception, when produced by humans) is
subjective. Thus, to truly maximize the innate heterogeneity of
collective human intelligence, loose coupling of our information
and associated data sources is imperative.
How is Linked Data Delivered?
Linked Data is exposed to HTTP networks (e.g. World Wide Web ) via hypermedia resources
bearing structured representations of data object descriptions.
Remember, you have a single Identifier abstraction (generic HTTP
URI) that embodies: Data Object Name and Data Representation
Location (aka URL ).
How are Linked Data Object Representations Structured?
A structured representation of data exists when an Entity (Datum), its Attributes, and its
Attribute Values are clearly discernible. In the case of a Linked
Data Object, structured descriptions take the form of a hypermedia
based Entity -Attribute-Value (EAV) graph pictorial
-- where each Entity, its Attributes, and its Attribute Values
(optionally) are identified using Generic HTTP URIs.
Examples of structured data representation formats (content
types) associated with Linked Data Objects include:
text/html
text/turtle
text/n3
application/json
application/rdf+xml
Others
How Do I Create Linked Data oriented Hypermedia Resources?
You markup resources by expressing distinct
entity-attribute-value statements (basically these a 3-tuple
records) using a variety of notations:
(X)HTML+RDFa ,
JSON ,
Turtle ,
N3 ,
TriX ,
TriG ,
RDF/XML , and
Others (for instance you can use Atom data format extensions to
model EAV graph as per OData initiative from Microsoft).
You can achieve this task using any of the following
approaches:
Notepad
WYSIWYG Editor
Transformation of Database Records via Middleware
Transformation of XML based Web Services
output via Middleware
Transformation of other Hypermedia Resources via
Middleware
Transformation of non Hypermedia Resources via Middleware
Use a platform that delivers all of the above.
Practical Examples of Linked Data Objects Enable
Describe Who You Are, What You Offer, and What You Need via
your structured profile, then leave your HTTP network to perform
the REST (serendipitous discovery of relevant things)
Identify (via map overlay) all items of interest based on a
2km+ radious of my current location (this could include vendor
offerings or services sought by existing or future customers)
Share the latest and greatest family photos with family members
*only* without forcing them to signup for Yet Another Web 2.0
service or Social Network
No repetitive signup and username and password based login
sequences per Web 2.0 or Mobile Application combo
Going beyond imprecise Keyword Search to the new frontier of
Precision Find - Example, Find Data Objects associated with the
keywords: Tiger, while enabling the seeker disambiguate across the
"Who", "What", "Where", "When" dimensions (with negation
capability)
Determine how two Data Objects are Connected - person to
person, person to subject matter etc. (LinkedIn outside the walled
garden)
Use any resource address (e.g blog
or bookmark URL) as the conduit into a Data Object mesh that
exposes all associated Entities and their social network
relationships
Apply patterns (social dimensions) above to traditional
enterprise data sources in combination (optionally) with external
data without compromising security etc.
How Do OpenLink Software Products Enable Linked
Data Exploitation?
Our data access middleware heritage (which spans 16+ years) has
enabled us to assemble a rich portfolio of coherently integrated
products that enable cost-effective evaluation and utilization of
Linked Data, without writing a single line of code, or exposing you
to the hidden, but extensive admin and configuration costs. Post
installation, the benefits of Linked Data simply materialize (along
the lines described above).
Our main Linked Data oriented products include:
OpenLink Data Explorer -- visualizes Linked
Data or Linked Data transformed "on the fly" from hypermedia and
non hypermedia data sources
URIBurner -- a "deceptively simple" solution
that enables the generation of Linked Data "on the fly" from a
broad collection of data sources and resource types
OpenLink Data Spaces -- a platform for
enterprises and individuals that enhances distributed collaboration
via Linked Data driven virtualization of data across its native
and/or 3rd party content manager for: Blogs, Wikis, Shared
Bookmarks, Discussion Forums, Social Networks etc
OpenLink Virtuoso -- a secure and
high-performance native hybrid data server (Relational, RDF-Graph,
Document models) that includes in-built Linked Data transformation
middleware (aka. Sponger).
Related
03/04/2010 10:16 GMT-0500
Modified: 03/08/2010 09:59
GMT-0500
Linked Data & Socially Enhanced Collaboration (Enterprise or Individual) -- Update 1
Socially enhanced enterprise and invididual collaboration is
becoming a focal point for a variety of solutions that offer
erswhile distinct content managment features across the realms of
Blogging, Wikis, Shared Bookmarks, Discussion Forums etc.. as part
of an integrated platform suite. Recently, Socialtext
has caught my attention courtesy of its nice features and benefits page . In addition,
I've also found the Mike 2.0 portal immensely interesting and
valuable, for those with an enterprise collaboration bent.
Anyway, Socialtext and Mike 2.0 (they aren't identical and
juxtaposition isn't seeking to imply this) provide nice
demonstrations of socially enhanced collaboration for individuals
and/or enterprises is all about:
Identifying Yourself
Identifying Others (key contributors, peers,
collaborators)
Serendipitous Discovery of key contributors, peers, and
collaborators
Serendipitous Discovery by key contributors, peers, and
collaborators
Develop and sustain relationships via socially enhanced
professional network hybrid
Utilize your new "trusted network" (which you've personally
indexed) when seeking help or propagating a meme .
As is typically the case in this emerging realm, the critical
issue of discrete "identifiers" (record keys in sense) for data items, data containers,
and data creators (individuals and groups) is overlooked albeit
unintentionally.
How HTTP based Linked Data Addresses the Identifier
Issue
Rather than using platform constrained identifiers such as:
email address (a "mailto" scheme identifier),
a dbms user account,
application specific account, or
OpenID.
It enables you to leverage the platform independence of HTTP
scheme Identifiers (Generic URIs) such that Identifiers for:
You,
Your Peers,
Your Groups, and
Your Activity Generated Data,
simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked
Data Objects endowed with High SDQ (Serendipitious Discovery
Quotient). For example my Personal WebID is all anyone needs to know if
they want to explore:
My Profile (which includes references to data objects
associated with my interests, social-network, calendar, bookmarks
etc.)
Data generated by my activities across various data spaces (via
data objects associated with my online accounts e.g. Del.icio.us , Twitter , Last.FM )
Linked Data Meshups via URIBurner (or any
other Virtuoso instance) that provide an extend
view of my profile
How FOAF +SSL adds Socially aware Security
Even when you reach a point of equilibrium where: your daily
activities trigger orchestratestration of CRUD (Create, Read,
Update, Delete) operations against Linked Data Objects within your
socially enhanced collaboration network, you still have to deal
with the thorny issues of security, that includes the
following:
Single Sign On,
Authentication, and
Data Access Policies.
FOAF+SSL, an application of HTTP based Linked Data, enables you
to enhance your Personal HTTP scheme based Identifer (or WebID) via
the following steps (peformed by a FOAF+SSL compliant
platform):
Imprint WebID within a self-signed x.509 based public key
(certificate) associated with your private key (generated by
FOAF+SSL platform or manually via OpenSSL)
Store public key components (modulous and exponent) into your
FOAF based profile document which references your Personal HTTP
Identifier as its primary topic
Leverage HTTP URL component of WebID for making public key
components (modulous and exponent) available for x.509 certificate
based authentication challenges posed by systems secured by
FOAF+SSL (directly) or OpenID (indirectly via FOAF+SSL to OpenID
proxy services).
Contrary to conventional experiences with all things PKI (Public
Key Infrastructure) related, FOAF+SSL compliant platforms typically
handle the PKI issues as part of the protocol implementation;
thereby protecting you from any administrative tedium without
compromising security.
Conclusions
Understanding how new technology innovations address long
standing problems, or understanding how new solutions inadvertently
fail to address old problems, provides time tested mechanisms for
product selection and value proposition comprehension that
ultimately save scarce resources such as time and money.
If you want to understand real world problem solution #1 with
regards to HTTP based Linked Data look no further than the issues
of secure, socially aware, and platform independent identifiers for
data objects, that build bridges across erstwhile data silos.
If you want to cost-effectively experience what I've outlined in
this post, take a look at OpenLink
Data Spaces (ODS ) which is a distributed collaboration
engine (enterprise of individual) built around the Virtuoso
database engines. It simply enhances existing collaboration tools
via the following capabilities:
Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items
(if missing)
Ability to integrate across a myriad of Data Source Types
rather than a select few across RDBM Engines, LDAP, Web Services, and
various HTTP accessible Resources (Hypermedia or Non Hypermedia
content types)
Addition of FOAF+SSL based authentication
Addition of FOAF+SSL based Access Control Lists (ACLs) for
policy based data access.
Related:
03/02/2010 15:47 GMT-0500
Modified: 03/03/2010 19:50
GMT-0500
Exploring the Value Proposition of Linked Data
The primary topic of a meme
penned by TimBL in the form of a Design Issues Doc (note: this is how TimBL
has shared his thoughts since the Beginning of
the Web ).
There are a number of dimensions to the meme, but its primary
purpose is the reintroduction of the HTTP URI -- a vital component of the Web's core
architecture.
What's Special about HTTP URIs?
They possess an intrinsic duality that combines persistent and
unambiguous Data
Identity with platform & representation format independent Data
Access. Thus, you can use a string of characters that look like a
contemporary Web URL to unambiguously achieve the
following:
Identity or Name Anything of Interest
Describe Anything of Interest by associating the Description
Subject's Identity with a constellation of Attribute and Value
pairs (technically: an Entity -Attribute-Value or
Subject-Predicate-Object graph)
Make the Description of Named Things of Interest discoverable
on the Web by implicitly binding the aforementioned to Documents
that hold their descriptions (technically: metadata documents or
information resources)
What's the basic value proposition of the Linked Data meme ?
Enabling more productive use of the Web by users and developers
alike. All of which is achieved by tweaking the Web's Hyperlinking
feature such that it now includes Hypertext and Hyperdata as link types.
Note: Hyperdata Linking is simply what an HTTP URI
facilitates.
Examples problems solved by injecting Linked Data into the
Web:
Federated Identity by enabling Individuals to unambiguously
Identify themselves (Profiles++) courtesy of existing Internet and Web protocols (e.g., FOAF +SSL's WebIDs which combine Personal
Identity with X.509 certificates and HTTPs based client side
certification)
Security and Privacy challenge alleviation by delivering a
mechanism for policy based data access that feeds off federated
individual identity and social network (graph) traversal
Spam Busting via the above.
Increasing the Serendipitous Discovery Quotient (SDQ) of Web
accessible resources by embedding Rich Metadata into (X)HTML
Documents e.g., structured descriptions of your "WishLists" and
"OfferLists" via a common set of terms offered by vocabularies such
as GoodRelations and SIOC
Coherent integration of disparate data across the Web and/or
within the Enterprise via "Data Meshing" rather than "Data
Mashing"
Moving beyond imprecise statistically driven "Keyword Search"
(e.g. Page Rank) to "Precision Find" driven by typed link based
Entity Rank plus Entity Type and Entity
Property filters.
Conclusion
If all of the above still falls into the technical mumbo-jumbo
realm, then simply consider Linked Data as delivering Open Data
Access in granular form to Web accessible data -- that goes beyond
data containers (documents or files).
The value proposition of Linked Data is inextricably linked to
the value proposition of the World Wide Web . This is true, because the
Linked Data meme is ultimately about an enhancement of the current
Web; achieved by reintroducing its architectural essence -- in new
context -- via a new level of link
abstraction, courtesy of the Identity and Access duality of HTTP
URIs.
As a result of Linked Data, you can now have Links on the Web
for a Person, Document, Music, Consumer Electronics, Products &
Services, Business Opening & Closing Hours, Personal
"WishLists" and "OfferList", an Idea, etc.. in addition to links
for Properties (Attributes & Values) of the aforementioned.
Ultimately, all of these links will be indexed in a myriad of ways
providing the substrate for the next major period of Internet &
Web driven innovation, within our larger human-ingenuity driven
innovation continuum.
Related
07/23/2009 20:17 GMT-0500
Modified: 07/24/2009 08:20
GMT-0500
Important Things to Note about the World Wide Web
Based on the prevalence of confusion re. the Linked Data meme , here are a few important
points to remember about the World Wide Web .
Its an HTTP based Network Cluster within the Internet (remember: Networks are about meshes
of Nodes connected by Links)
Its underlying data model is that of a
Network (we've had Network Data models for eons. EAV /CR is an example)
Links are facilitated via URIs
Until recently the granularity of Networking on the Web was scoped to
Data Containers (documents) (due to prevalence of URL style links
The Linked Data meme adds Data Item (Datum) level granularity
to World Wide Web networking via HTTP URIs
Data Items become Web Reference-able when you Identify/Name
them using HTTP based URIs
An HTTP URI implicitly binds a Web Reference-able
Data Item (Entity , Datum, Data Object, Resource) to its
Web Accessible Metadata
Web Accessible Metadata resides within Data Containers
(documents or information resources)
The representation of a Web Accessible Metadata container is
negotiable
I am able to write and dispatch this blog
post courtesy of the Web features listed above
You are able to explore the many dimensions to data exposed by
this blog should you decide to explore the Linked Data mesh exposed by this post's HTTP
URI (via its permalink permalink)
The HTTP URI is the secret sauce of the Web that is powerfully
and unobtrusively reintroduced via the Linked Data meme (classic
back to the future act). This powerful sauce possess a unique power
courtesy of its inherent duality i.e., how it uniquely combines
Data Item Identity (think keys in traditional DBMS parlance) with
Data Access (e.g. access to negotiable representations of
associated metadata).
As you can see, I've made no mention of RDF or SPARQL , and I can still articulate the
inherent value of the "Linked Data " dimension that the "Linked Data"
meme adds to the World Wide Web.
As per usual this post is a live demonstration of Linked Data
(dog-food style) :-)
Related
07/23/2009 09:27 GMT-0500
Modified: 07/23/2009 10:33
GMT-0500
Library of Congress & Reasonable Linked Data
While exploring the Subject Headings Linked Data Space (LCSH)
recently unveiled by the Library of Congress , I noticed that the
URI for the subject heading: World Wide Web , exposes an "owl:sameAs" link
to resource URI: "info:lc/authorities/sh95000541" -- in fact, a
URI.URN that isn't HTTP protocol scheme based.
The observations above triggered a discussion thread on Twitter that
involved: @edsu , @iand , and moi .
Naturally, it morphed into a live demonstration of: human vs
machine, interpretation of claims expressed in the RDF graph.
What makes this whole thing interesting?
It showcases (in Man vs Machine style) the issue of
unambiguously discerning the meaning of the owl:sameAs claim
expressed in the LCSH Linked Data Space .
Perspectives & Potential Confusion
From the Linked Data perspective, it may spook a few people to
see owl:sameAs values such as: "info:lc/authorities/sh95000541",
that cannot be de-referenced using HTTP.
It may confuse a few people or user agents that see URI
de-referencing as not necessarily HTTP specific, thereby attempting
to de-reference the URI.URN on the assumption that it's associated
with a "handle system ", for instance.
It may even confuse RDFizer / RDFization middleware that use
owl:sameAs as a data
provider attribution mechanism via hint/nudge URI values derived
from original content / data URI.URLs that de-reference to nothing
e.g., an original resource URI.URL plus "#this" which produces URI.URN-URL
-- think of this pattern as "owl:shameAs" in a sense :-)
Unambiguously Discerning Meaning
Simply bring OWL reasoning (inference rules and reasoners) into
the mix, thereby negating human dialogue about interpretation which
ultimately unveils a mesh of orthogonal view points. Remember, OWL
is all about infrastructure that ultimately enables you to express
yourself clearly i.e., say what you mean, and mean what you
say.
Path to Clarity (using Virtuoso , its in-built Sponger Middleware,
and Inference Engine):
GET the data into the Virtuoso Quad store -- what the sponger
does via its URIBurner Service (while following
designated predicates such as owl:sameAs in case they point to
other mesh-able data sources)
Query the data in Quad Store with "owl:sameAs" inference rules
enabled
Repeat the last step with the inference rules excluded.
Actual SPARQL Queries:
Observations:
The SPARQL queries against the Graph generated and
automatically populated by the Sponger reveal -- without human
intervention-- that: "info:lc/authorities/sh95000541", is just an
alternative name for < xmlns="http" id.loc.gov="id.loc.gov"
authorities="authorities" sh95000541="sh95000541"
concept="concept">, and that the graph produced by LCSH is
self-describing enough for an OWL reasoner to figure this all out
courtesy of the owl:sameAs property :-).
Hopefully, this post also provides a simple example of how
OWL facilitates "Reasonable Linked Data".
Related
05/05/2009 13:53 GMT-0500
Modified: 05/06/2009 14:26
GMT-0500
Time for RDBMS Primacy Downgrade is Nigh! (No Embedded Images Edition - Update 1)
As the world works it way through a "once in a generation"
economic crisis, the long overdue downgrade of the RDBMS , from its pivotal position at the
apex of the data access and data management pyramid is
nigh.
What is the Data Access, and Data Management Value
Pyramid?
As depicted below, a top-down view of the data access and data
management value chain. The term: apex, simply indicates value
primacy, which takes the form of a data access API based entry
point into a DBMS realm -- aligned to an underlying data model.
Examples of data access APIs include: Native Call Level Interfaces
(CLIs), ODBC , JDBC , ADO .NET, OLE-DB , XMLA , and Web Services.
See:
AVF Pyramid Diagram.
The degree to which ad-hoc views of data managed by a DBMS can
be produced and dispatched to relevant data consumers (e.g.
people), without compromising concurrency, data durability, and
security, collectively determine the "Agility Value Factor" (AVF)
of a given DBMS. Remember, agility as the cornerstone of
environmental adaptation is as old as the concept of evolution, and
intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to
which DBMS technology affects the ability to effectively implement
"Market Leadership Discipline" along the following pathways:
innovation, operation excellence, or customer intimacy.
Why has RDBMS Primacy has Endured?
Historically, at least since the late '80s, the RDBMS genre of
DBMS has consistently offered the highest AVF relative to other
DBMS genres en route to primacy within the value pyramid. The
desire to improve on paper reports and spreadsheets is basically
what DBMS technology has fundamentally addressed to date, even
though conceptual level interaction with data has never been its
forte.
See:
RDBMS Primacy Diagram.
For more then 10 years -- at the very least -- limitations of
the traditional RDBMS in the realm of conceptual level interaction
with data across diverse data sources and schemas (enterprise, Web,
and Internet ) has been crystal clear to many
RDBMS technology practitioners, as indicated by some of the quotes
excerpted below:
"Future of Database Research is excellent, but what is the
future of data?"
"..it is hard for me to disagree with the conclusions in this
report. It captures exactly the right thoughts, and should be a
must read for everyone involved in the area of databases and
database research in particular."
-- Dr. Anant Jingran , CTO, IBM Information Management Systems, commenting on
the 2007 RDBMS technology retreat attended by a
number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come
and gone
They are direct descendants of System R and Ingres and were architected more than 25
years ago
They are advocating "one size fits all"; i.e. a single
engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker , one of the founding
fathers of the RDBMS industry.
Until this point in time, the requisite confluence of
"circumstantial pain" and "open standards" based technology
required to enable an objective "compare and contrast" of RDBMS
engine virtues and viable alternatives hasn't occurred. Thus, the
RDBMS has endured it position of primacy albeit on a "one size fits
all basis".
Circumstantial Pain
As mentioned earlier, we are in the midst of an economic crisis
that is ultimately about a consistent inability to connect dots
across a substrate of interlinked data sources that transcend
traditional data access boundaries with high doses of schematic
heterogeneity. Ironically, in a era of the dot-com, we haven't been
able to make meaningful connections between relevant "real-world
things" that extend beyond primitive data hosted database tables
and content management style document containers; we've struggled
to achieve this in the most basic sense, let alone evolve our
ability to connect inline with the exponential rate at which the Internet & Web
are spawning "universes of discourse" (data spaces) that emanate
from user activity (within the enterprise and across the
Internet & Web). In a nutshell, we haven't been able to upgrade
our interaction with data such that "conceptual models" and
resulting "context lenses" (or facets) become concrete;
by this I mean: real-world entity interaction making its way into the
computer realm as opposed to the impedance we all suffer today when
we transition from conceptual model interaction (real-world) to
logical model interaction (when dealing with RDBMS based data
access and data management).
Here are some simple examples of what I can only best describe
as: "critical dots unconnected", resulting from an inability to
interact with data conceptually:
Government (Globally) -
Financial regulatory bodies couldn't effectively discern that a
Credit Default Swap is an Insurance policy in
all but literal name. And in not doing so the cost of an
unregulated insurance policy laid the foundation for
exacerbating the toxicity of fatally flawed mortgage backed
securities. Put simply: a flawed insurance policy was the fallback
on a toxic security that financiers found exotic based on
superficial packaging.
Enterprises -
Banks still don't understand that capital really does exists in
tangible and intangible forms; with the intangible being the
variant that is inherently dynamic. For example, a tech companies
intellectual capital far exceeds the value of fixture, fittings,
and buildings, but you be amazed to find that in most cases this
vital asset has not significant value when banks get down to the
nitty gritty of debt collateral; instead, a buffer of flawed
securitization has occurred atop a borderline static asset class
covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip
and replace" existing technology without ever effectively
addressing the timeless inability to connect data across disparate
data silos generated by internal enterprise applications, let alone
the broader need to mesh data from the inside with external data
sources. No correlations made between the growth of buzzwords and
the compounding nature of data integration challenges. It's 2009
and only a miniscule number of executives dare fantasize about
being anywhere within distance of the: relevant information at your
fingertips vision.
Looking more holistically at data interaction in general,
whether you interact with data in the enterprise space (i.e., at
work) or on the Internet or Web, you ultimately are delving into a
mishmash of disparate computer systems, applications, service (Web
or SOA), and databases (of the RDBMS variety in a majority of
cases) associated with a plethora of disparate schemas. Yes, but
even today "rip and replace" is still the norm pushed by most
vendors; pitting one mono culture against another as exemplified by
irrelevances such as: FOSS/LAMP vs Commercial or Web vs.
Enterprise, when none of this matters if the data access and
integration issues are recognized let alone addressed (see:
Applications are Like Fish and Data Like
Wine ).
Like the current credit-crunch, exponential growth of data
originating from disparate application databases and associated
schemas, within shrinking processing time frames, has triggered a
rethinking of what defines data access and data management value
today en route to an inevitable RDBMS downgrade within the value
pyramid.
Technology
There have been many attempts to address real-world modeling
requirements across the broader DBMS community from Object
Databases to Object-Relational Databases, and more recently the
emergence of simple Entity -Attribute-Value model DBMS engines. In
all cases failure has come down to the existence of one or more of
the following deficiencies, across each potential alternative:
Query language standardization - nothing close to SQL
standardization
Data Access API standardization - nothing close to ODBC, JDBC,
OLE-DB, or ADO.NET
Wire protocol standardization - nothing close to HTTP
Distributed Identity infrastructure - nothing close to the
non-repudiatable digital Identity that foaf +ssl accords
Use of Identifiers as network based pointers to data sources -
nothing close to RDF based Linked Data
Negotiable data representation - nothing close to Mime and HTTP
based Content Negotiation
Scalability especially in the era of Internet & Web
scale.
Entity-Attribute-Value with Classes & Relationships
(EAV /CR) data models
A common characteristic shared by all post-relational DBMS
management systems (from Object Relational to pure Object) is an
orientation towards variations of EAV/CR based data models.
Unfortunately, all efforts in the EAV/CR realm have typically
suffered from at least one of the deficiencies listed above. In
addition, the same "one DBMS model fits all" approach that lies at
the heart of the RDBMS downgrade also exists in the EAV/CR
realm.
What Comes Next?
The RDBMS is not going away (ever), but its era of primacy -- by
virtue of its placement at the apex of the data access and data
management value pyramid -- is over! I make this bold claim for the
following reasons:
The Internet aided "Global Village" has brought "Open World " vs "Closed World " assumption issues to the fore
e.g., the current global economic crisis remains centered on the
inability to connect dots across "Open World" and "Closed World"
data frontiers
Entity-Attribute-Value with Classes & Relationships
(EAV/CR) based DBMS models are more effective when dealing with
disparate data associated with disparate schemas, across disparate
DBMS engines, host operating systems, and networks.
Based on the above, it is crystal clear that a different kind of
DBMS -- one with higher AVF relative to the RDBMS -- needs to sit
atop today's data access and data management value pyramid. The
characteristics of this DBMS must include the following:
Every item of data (Datum/Entity/Object/Resource) has
Identity
Identity is achieved via Identifiers that aren't locked at the
DBMS, OS, Network, or Application levels
Object Identifiers and Object values are independent
(extricably linked by association)
Object values should be de-referencable via Object
Identifier
Representation of de-referenced value graph (entity,
attributes, and values mesh) must be negotiable (i.e. content
negotiation)
Structured query language must provide mechanism for Creation,
Deletion, Updates, and Querying of data objects
Performance & Scalability across "Closed World"
(enterprise) and "Open World" (Internet & Web) realms.
Quick recap, I am not saying that RDBMS engine technology is
dead or obsolete. I am simply stating that the era of RDBMS primacy
within the data access and data management value pyramid is
over.
The problem domain (conceptual model views over heterogeneous
data sources) at the apex of the aforementioned pyramid has simply
evolved beyond the natural capabilities of the RDBMS which is
rooted in "Closed World" assumptions re., data definition, access,
and management. The need to maintain domain based conceptual
interaction with data is now palpable at every echelon within our
"Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with
support for the seven items enumerated above, can trigger the long
anticipated RDBMS downgrade. Such a DBMS would be inherently
multi-model because you would need to the best of RDBMS and EAV/CR
model engines in a single product, with in-built support for HTTP
and other Internet protocols in order to effectively address data
representation and serialization issues.
EAV/CR Oriented Data Access & Management Technology
Examples of contemporary EAV/CR frameworks that provide concrete
conceptual layers for data access and data management currently
include:
The frameworks above provide the basis for a revised AVF
pyramid, as depicted below, that reflects today's data access and
management realities i.e., an Internet & Web driven global
village comprised of interlinked distributed data objects,
compatible with "Open World" assumptions.
See:
New EAV/CR Primacy Diagram.
Related
01/27/2009 19:19 GMT-0500
Modified: 03/17/2009 11:50
GMT-0500