Kingsley Idehen's Blog Data Space

Linked Data Workshop -- WWW2008

Thu, 10 Jan 2008 18:03:29 GMT

At the forth coming World Wide Web 2008 Conference there will be an entire workshop dedicated to the emerging Linked Data Web (aka Linked Data). The Linked Data Workshop will include: Presentations, Demonstrations, Tutorials, and Research Papers from a variety on organizations and individuals associated with this very exciting aspect of the Web.

The deadline for submitting papers, presentations, demo, and tutorial proposals is the 28th of January, 2008.

Linked Data, Meshups, Twitter, and Friendfeed

Fri, 01 Aug 2008 02:17:35 GMT

Here are some links from my Friendfeed and Twitter Data Spaces that expose a number of recent Linked Data "Meshup" examples:

Enjoy!

Business Of Linked Data: Data Quality Factors

Mon, 25 Oct 2010 21:09:02 GMT

Via my "context lenses" (i.e., my subjective view of the world) a unit of Data (or Datum) is like a cube of sugar, each side representing a value factor along the following dimensions:

Identity -- via Resolvable URIs based Names for everything
Data Representation Format Dexterity -- e.g., HTTP based Content Negotiation which loosens the coupling between Data Model Semantics and actual Data Representation (Syntax/Markup)
Platform Agnostic Data Access -- e.g. via ubiquitous HTTP
Change Sensitivity -- data warehouses are like real-world warehouses, goods rot and perish overtime
Provenance -- data about the data (metadata) that helps establish "Who", "What", "When", "Where", and at least approximate or guesstimate "Why"
Data Mesh Navigability -- delivered via inference rules.

The quality of service factors above nullify many of the typical concerns associated data driven business models, such as:

Wholesale Imports (crawls) - where your data is crawled and/or imported wholesale into a new data space with zero attribution to the source
Lossy Attribution -- attribution is delivered in literal form which doesn't deliver branding fidelity across many value chain layers or entire life cycle of a given data item
Service Provisioning -- effectively build any business model if you can align services with unambiguously identifiable consumers with actual data items or across entire data spaces.

OpenOffice.org, SPARQL, and the Linked Data Web

Tue, 05 Feb 2008 01:42:50 GMT

Question posed by Dan Brickley via a blog post: SQL, OpenOffice: would a JDBC driver for SPARQL protocol make sense?

Writing a JDBC Driver for SPARQL is a little overkill. OpenOffice.org simply needs to make XML or Web Data (HTML, XHTML, and XML) bonafide data sources within its "Pivot Table" functionality realm. Then all that would then be required is a SPARQL SELECT Query transported via the SPARQL Protocol with results sent back using the SPARQL XML results serialization format (all part of a single SPARQL Protocol URL).

Excel successfully consumes the following information resource URI: http://tinyurl.com/yvoccj (a tiny url for a SPARQL SELECT against my FOAF file).

Alternatively, and currently achievable, you could simply use SPASQL (SPARQL within SQL) using a DBMS engine that supports SQL, SPARQL, and SPARQL e.g. Virtuoso.

Virtuoso SPASQL support is exposed via it's ODBC and/or JDBC Drivers. Thus you can do things such as:

Use a SPARQL Query in the FROM CLAUSE of a SQL statement
Execute SPARQL via SQL processor by prepending SPARQL query text with the literals "sparql"

BTW - My News Years Resolution: get my act together and shrink the ever increasing list of "simple & practical Virtuoso use case demos" on my todo which now spans all the way back to 2006 :-(

Data Spaces

Tue, 01 Mar 2011 22:26:15 GMT

There is increasing coalescence around the idea that HTTP-based Linked Data adds a tangible dimension to the World Wide Web (Web). This Data Dimension grants end-users, power-users, integrators, and developers the ability to experience the Web not solely as a Information Space or Document Space, but now also as a Data Space.

Here is a simple What and Why guide covering the essence of Data Spaces.

What is a Data Space?

A Data Space is a point of presence on a network, where every Data Object (item or entity) is given a Name (e.g., a URI) by which it may be Referenced or Identified.

In a Data Space, every Representation of those Data Objects (i.e., every Object Representation) has an Address (e.g., a URL) from which it may be Retrieved (or "gotten").

In a Data Space, every Object Representation is a time variant (that is, it changes over time), streamable, and format-agnostic Resource.

An Object Representation is simply a Description of that Object. It takes the form of a graph, pictorially constructed from sets of 3 elements which are themselves named Subject, Predicate, and Object (or SPO); or Entity, Attribute, and Value (or EAV). Each Entity+Attribute+Value or Subject+Predicate+Object set (or triple), is one datum, one piece of data, one persisted observation about a given Subject or Entity.

The underlying Schema that defines and constrains the construction of Object Representations is based on Logic, specifically First-Order Logic. Each Object Representation is a collection of persisted observations (Data) about a given Subject, which aid observers in materializing their perception (Information), and ultimately comprehension (Knowledge), of that Subject.

Why are Data Spaces important?

In the real-world -- which is networked by nature -- data is heterogeneously (or "differently") shaped, and disparately located.

Data has been increasing at an alarming rate since the advent of computing; the interWeb simply provides context that makes this reality more palpable and more exploitable, and in the process virtuously ups the ante through increasingly exponential growth rates.

We can't stop data heterogeneity; it is endemic to the nature of its producers -- humans and/or human-directed machines. What we can do, though, is create a powerful Conceptual-level "bus" or "interface" for data integration, based on Data Description oriented Logic rather than Data Representation oriented Formats. Basically, it's possible for us to use a Common Logic as the basis for expressing and blending SPO- or EAV-based Object Representations in a variety of Formats (or "dialects").

The roadmap boils down to:

Assigning unambiguous Object Names to:
- Every record (or, in table terms, every row);
- Every record attribute (or, in table terms, every field or column);
- Every record relationship (that is, every relationship between one record and another);
- Every record container (e.g., every table or view in a relational database, every named graph, every spreadsheet, every text file, etc.);
Making each Object Name resolve to an Address through which Create, Read, Update, and Delete ("CRUD") operations can be performed against (can access) the associated Object Representation graph.

Linked Data -- Summing Up The Last 12 Months

Sun, 03 Feb 2008 22:17:17 GMT

Mike Bergman has just penned a post titled: Linked Data Comes of Age, that provides a nice 12 month summation of Linked Data and the Linking Open Data Community project's efforts to date.

Like most of us in the Linked Data community, he sees the upcoming Linked Data Conference by Jupiter as a watershed moment.

Shared Ontologies Linked Data Style!

Fri, 01 Jun 2007 23:54:05 GMT

As the Linked Data meme beams across the Web, it is important to note that Ontology / Schema sharing and reuse is critical to the overall vitality of the burgeoning Semantic Data Web.

The items that follow attempt to demonstrate the point by way of SIOC (Semantically-Interlinked Online Communities Ontology) and MO (Music Ontology) domain exploration:

Linked Data or Dynamic Data Web Pages:

Music Ontology Overview
SIOC Ontology Overview
SIOC Type Ontology Module (how you extend SIOC Concepts unobtrusively)
SIOC Services Ontology Module (how you extend SIOC in relation to Services Modeling).

Semantic Web Browser Sessions:

Music Ontology Overview via OpenLink RDF Browser
SIOC Ontology Overview via OpenLink RDF Browser
SIOC Type Ontology Module via OpenLink RDF Browser
SIOC Services Ontology Module via OpenLink RDF Browser.

Key point, if you are modeling People, Communities, Organizations, Documents, and other entities in the People, Organizations, Documents etc. Data Space, don't forget to : FOAF-FOAF-FOAF it Up! :-)

XTech Talks covering Linked Data

Mon, 05 May 2008 21:07:17 GMT

Courtesy a post by Chris Bizer to the LOD community mailing list, here is a list of Linked Data oriented talks at the upcoming XTech 2008 event (also see the XTech 2008 Schedule which is Linked Data friendly). Of course, I am posting this to my Blog Data Space with the sole purpose of adding data to the rapidly growing Giant Global Graph of Linked Data, basically adding to my collection of live Linked Data utility demos :-)

Here is the list:

Linked Data Deployment (Daniel Lewis, OpenLink Software)
The Programmes Ontology (Tom Scott, BBC and all)
SemWebbing the London Gazette (Jeni Tennison, The Stationery Office)
Searching, publishing and remixing a Web of Semantic Data (Richard Cyganiak, DERI Galway)
Building a Semantic Web Search Engine: Challenges and Solutions (Aidan Hogan, DERI Galway)
'That's not what you said yesterday!' - evolving your Web API (Ian Davis, Talis)
Representing, indexing and mining scientific data using XML and RDF: Golem and CrystalEye (Andrew Walkingshaw, University of Cambridge)

For the time challenged (i.e. those unable to view this post using it's permalink / URI as a data source via the OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, or Tabulator), the benefits of this post are as follows:

automatic URI generation for all linked items in this post
automatic propagation of tags to del.icio.us, Technorati, and PingTheSemanticWeb
automatic association of formal meanings to my Tags using the MOAT Ontology
automatic collation and generation of statistical data about my tags using the SCOT Ontology (*missing link is a callout to SCOT Tag Ontology folks to sort the project's home page URL at the very least*)
explicit typing of my Tags as SKOS Concepts.

Put differently, I cost-effectively contribute to the GGG across all Web interaction dimensions (1.0, 2.0, 3.0) :-)

RDF Browser View of My Hyperdata & Linked Data Post

Thu, 20 Sep 2007 01:26:02 GMT

Bearing in mind we are all time challenged, here are links to OpenLink and Zitgist RDF Browser views of my earlier blog post re. Hyperdata & Linked Data.

OpenLink RDF Browser view of Hyperdata & Linked Data post

Zitgist Browser view of Hyperdata & Linked Data post

Both browsers should lead you to the posts from Danny, Nova, and Tim. In both cases the URI < xmlns="http" www.openlinksw.com="www.openlinksw.com" dataspace="dataspace" kidehen="kidehen" openlinksw.com="openlinksw.com" weblog="weblog" s="s" blog="blog" b127="b127" d="d"> is a pointer to structured data (in my Blog Data Space) if your user agent (browser or other Web Client) requests an RDF representation of this post via its HTTP request payload (what the Browser are doing via the "Accept:" headers).

As you can see the Data Web is actually here! Without RDF generation upheaval (or Tax).

My Talis Podcast re. Semantic Web, Linked Data, and OpenLink Software

Fri, 16 May 2008 16:53:49 GMT

My podcast interview with Paul Miller of Talis is out. As I listened to the podcast (naturally awkward affair) I got a first hand sense of Paul's mastery of the art of interviewing, even when dealing with a fast talking data blitzers like me. Personally, I think I still talk a little too fast (the Nigerian in me), especially when the subject matter hones right into the epicenter of my professional passions: Open Data Access and Heterogeneous Data Integration (aka. Virtual Database Technology) -- so you may need to rewind every now and then during the interview :-)

During this particular podcast interview, I deliberately wanted to have an conversation about the practical value of Linked Data, rather than the technical innards. The fundamental utility of Linked Data remains somewhat mercurial, and I am certainly hoping to do my bit at the upcoming Linked Data Planet conference re. demonstrating and articulating linked data value across the blurring realms of "the individual" and "the enterprise".

Note to my old schoolmates on Facebook: when you listen to this podcast you will at least reconcile "Uyi Idehen" with "Kingsley Idehen". Unfortunately, Facebook refuses to let me Identify myself in the manner I choose. Ideally, I would like to have the name: "Kingsley (Uyi) Idehen" associated with my Facebook ID since this is the Identifier known to my personal network of friends, family, and old schoolmates. This Identity predicament is a long running Identity case study in the making.

Metcalfe, Einstein, and Linked Data

Tue, 02 Sep 2008 17:03:01 GMT

Metcalfe’s law states that the value of a telecommunications network is proportional to the square of the number of users of the system (n²), where the linkages between users (nodes) exist by definition. For information bases, the data objects are the nodes. Linked Data works to add the connections between the nodes.

I would tweak of the law modification expressed in Mike Bergman's post which states:

the value of a Linked Data network is proportional to the square of the number of links between the data objects.

By simply injecting "Context" which is what a high fidelity linked data mesh facilitates i.e. a mesh of weighted links endowed with specifically typed links (as opposed to a single ambiguous type unspecific link), you end up with an even more insight into the power of a Linked Data Web.

Channeling Einstein

How about Einstein's famous equaton: E=mc²? I am talking Energy (vitality) and Mass equivalence, where "E" is for Energy, "m" for Network Mesh base Mass ( where each entity network node contains sub-particles that are themselves dense network meshes all endowed with typed links and weightings), and "c" is for computer processing speed (processing speed is growing exponentially!). When you beam queries down a context rich mesh (a giant global graph comprised of named and dereferencable data sources), especially a mesh to which we are all connected, what do you get? Infrastructure for generating an unbelievable amount of intellectual energy (the result of exploding the sub-data-graphs within graph nodes) that is much better equipped to handle current and future challenges. Even better, we end up making constructive use of Einstein's findings (remember, we built a bomb the first time around!). TimBL articulates this fundamental value of the Web in slightly different language, but at the core, this is the essence of the Web as I believe he envisioned; the ability to connect us all in such a way that we exploit our collective manpower and knowledge constructively and unobtrusively, en route to making the world a much better place :-)

Note: None of this in incongruent with being compensated (i.e. making money) for contributing tangible value into, or around, the Mesh we know as the Web :-)

Business at the Speed of Thought - by Bill Gates
Blink - by Malcolm Gladwell

A Simple Linked Data Guide for the Enterprise

Mon, 23 Jun 2008 20:54:29 GMT

Mike Bergman has just published a nice Linked Data FAQ aimed at Enterprise audiences. His post draws on a collection of questions collated from a plethora of interactions with Enterprise oriented folks during last week's Linked Data Planet conference.

Enjoy!

Master Data Management (MDM) & RDF based Linked Data

Wed, 05 Nov 2008 23:19:02 GMT

It is getting clearer by the second that Master Data Management and RDF based Linked data are two realms separated by a common desire to provide "Entity Oriented Data Access" to heterogeneous data sources (within the enterprise and/or across the World Wide Web).

Here is how I see Linked Data providing tangible value to MDM tools vendors and users:

Open access to Entities across MDM instances served up by different MDM solutions acting as Linked Data publishers (i.e., expose MDM Entities as RDF resources endowed with de-referencable URIs thereby enabling Hyperdata-style linking)
Use of RDF-ization middleware to hook disparate data sources (SQL, XML, and other data sources) into existing MDM packages (i.e., the MDM solutions become consumers of RDF Linked Data).

Of course Virtuoso was designed and developed to deliver the above from day one (circa. 1998 re. the core and 2005 re. the use of RDF for the final mile) as depicted below:

Other MDM related posts

Exploring FOAF Linked Data Style!

Fri, 25 May 2007 18:36:47 GMT

Over the last few hours the FOAF project received a wakeup call via Dan Brickley's FOAF 0.9 "touch" effort.

Naturally, this triggered an obvious opportunity to demonstrate the prowess of Linked Data on the Semantic Web. What follows is a quick dump of what I sent to the foaf-dev mailing list:

Here are variety of FOAF Views built using:

OpenLink RDF Browser

Interactive SPARQL QBE

Raw SPARQL Endpoint

Enabling you to explore the following lines:

FOAF Overview via a Linked Data Page

FOAF Overview by Term Status via Linked Data Page

FOAF Overview SPARQL Query (.rq File)

FOAF Overview by Term Status

FOAF Overview via OpenLink RDF Browser

Virtuoso Installation Screencasts

Sun, 02 Nov 2008 21:20:21 GMT

As promised in an earlier post titled: Virtuoso, PHP 3.5 Runtime Hosting, phpBB3, and Linked Data, here are direct links to the "silent movies" mentioned in the past:

Installing Virtuoso on Vista with PHP Hosting
Installing Virtuoso on Mac OS X (Leopard) with PHP Hosting
EC2 Installation Part 1 (*AMIs take about 5 minutes to get assembled*)
EC2 Installation Part 2 (*post AMI creation part*)

Virtuoso is an extremely compact product that is very easy to install. The ease of installation carries over to the PHP runtime when bound to Virtuoso.

Conversation with Jon Udell: Are We There Yet Re. Web++ ?

Mon, 01 Feb 2010 13:58:04 GMT

Personally, I believe that we've actually reached a watershed moment re. the evolution of the Web from a mesh of Linked Data Containers (Web of Linked Documents) to a mesh of Linked Data Items (entities or real world objects).

The journey towards this watershed moment started with the Semantic Web Project, gained focus and pragmatism via the Linked Data meme, attained substance & credibility via efforts such as DBpedia and the resulting cloud of Open Linked Data Spaces, and finally arrived at the most important destination of all: broad comprehension and coherence, via RDFa.

Over the years, I've chronicled the journey above via entries in this particular data space (my blog) and most recently, via my rapid-fire comments and debates on Twitter (basically hastag #linkeddata account: kidehen).

On a parallel front re. my chronicles, I've periodically had conversations with Jon Udell, who has always provided a coherent sounding board and reconciliation framework for my world views and open data access vision; naturally, this has a lot to do with his holistic grasp of the big picture issues, associated technical details, and special communication prowess :-)

Against this backdrop, I refer you to my most recent podcast conversation with Jon, which is about how the tandem of HTML+RDFa and the GoodRelations vocabulary deliver the critical missing links re. broad comprehension of the Semantic Web vision en route to mass exploitation.

My Linked Data Planet Keynote (Updated with missing link)

Thu, 19 Jun 2008 13:48:14 GMT

I've finally found a second to drop a note about my keynote.

The keynote: Creating, Deploying, and Exploiting Linked Data, sought to achieve the fundamental goal of: Demystify the concept of "Linked Data" using anecdotal material that resonates with enterprise decision makers.

To my pleasure, 90% of the audience members confirmed familiarization with the "Data Source Name" concept of Open Database Connectivity (ODBC). Thus, all I had to do was map "Linked Data" to ODBC, and then unveil the fundamental add-ons that "Linked Data" delivers:

The ability to give database records names (Identifiers)
The use of HTTP in the database record naming mechanism - which expands a named database record's reference scope via the expanse of the Web (i.e HTTP based Identifiers called URIs).

I believe a majority of attendees came to realize that the combination above injects a new Web interaction dynamic: access to "Subject matter Concepts" and Named Entities contained within a page via HTTP base Data Source Names (URIs).

BTW - My presentation is a Linked Data Space in it's own right courtesy of the Bibliographic Ontology (which provides slide show modeling) and RDFa that allows me to embed annotations into my Slidy based presentation :-)

PowerPoint version of Presentation
Slideshare hosted version
Authorstream hosted version
Google Docs hosted version

Linked Data Trip Report - Part 1 (Update 2)

Tue, 29 Apr 2008 15:07:43 GMT

Typo cleansed edition :-)

Objectives

Meet LOD Community Members
Participate in Workshop

Meeting LOD Community Members

Although the Web continues to shrink the planet by removing the restrictions of geopgrahic location, meeting people face-to-face remains invaluable (*priceless in Mastercard AD speak*). Naturally, meeting and chatting with as many LOD community members as possible was high up on my agenda.

Participate in Workshop

As one of the co-chairs of the Linking Open Data Workshop (LODW), I had a 5 minute workshop opening slot during which I spoke about the following:

Where we are today:

We have DBpedia as a major hub on the burgeoning Linked Data Web. When OpenLink offered to host DBpedia (a combination of Virtuoso DBMS Software and sizable backend Hardware infrastructure), it did so knowing that such an effort would emphatically address the "chicken and egg" conundrum that, prior to this undertaking, stifled the ability to demonstrate practical utility of HTTP based Linked Data.

Today, the Linked Data bootstrap mission has been accomplished.

Where we go next:

Although DBpedia is a hub (ground zero of Linked Data), we have to put it into perspective in relation to a new set of needs and expectations moving forward. Today, DBpedia is a Sun at the heart of a Solar System within the Linked Data Galaxy. But unlike Space as we know it, in Cyberspace we can have connectivity and collaboration across Solar Systems -- life exists elsewhere and we are part of a collaborative collective unimpeded by constraints of space travel etc. Thus, expect to see the emergence of other Solar Systems accessible to DBpedia and its collections of planets (see. LOD diagram). Examples underway include UMBEL which will serve the Linked Data planets from OpenCyc (Subject Matter Concepts), Yago (Named Entities), and Bio2RDF (which provides powerful Bio Informatics based Linked Data planet).

I urged the community to veer more aggressively towards developing and demonstrating practical Linked Data driven solutions that are aligned to well known problems. Of course, I encouraged all presenters to make this an integral part of their presentations :-)

Workshop Summary:

The workshop was well attended and I found all the presentations engaging and full of enthusiasm.

As the sessions progressed, it became clear during a number of accompanying Q&A sessions that a new Linked Data exploitation frontier is emerging. The frontier in question takes the form of a Linked Data substrate capable of addressing the taxonomic needs of solutions aimed at automated Named Entity Extraction, Disambiguation, Subject matter Concept alignment, transparently integrated with existing Web Content. Thus, we are moving beyond the minting and deployment of of dereferencable URIs and RDF data sets to automagically associating existing Web Content with Named Entities (People, Organizations, Places, Events etc..) and Subject matter Concepts (Politics, Music, Sports, and others) while remaining true to the Linking Open Data Community creed i.e. ensuring the Named Entity and Subject matter Concept URIs are available to user agents or users seeking to produce alternative data views (i.e. Mesh-ups).

I will get to part 2 of this report once the actual workshop sessions slides go live (*these are different from the pre-event PDFs links*).

Virtuoso, Linked Data, and Linq2Rdf (Update 1)

Wed, 27 Aug 2008 11:51:23 GMT

There are many challenges that have dogged attempts to mesh the DBMS & Object Technology realms for years, critical issues include:

data access & manipulation impedance arising from Model mismatches between Relational Databases and Object Oriented & Object based Languages
Record / Data Object Referencing by ID.

The big deal about LINQ has been the singular focus on addressing point 1, in particular.

I've already written about the Linq2Rdf effort that meshes the best of .NET with the virtues of the "Linked Data Web".

Here is an architecture diagram that seeks to illustrate the powerful data access and manipulation options that the combination of Linq2RDF and Linked Data deliver:

What may not have been obvious to most in the past, is the fact that Mapping from Object Models to Relational Models wasn't really the solution to the problem at hand. Instead, the mapping should have been the other way around i.e., Relational to Object Model mapping. The emergence of RDF and RDBMS to RDF mapping technology is what makes this age-old headache addressable in very novel ways.

RDBMS to RDF Mapping - W3C Workshop Presentation
Virtuoso RDBMS to RDF Mapping - W3C Rdb2Rdf Incubator Group Presentation
Creating RDF Views over SQL Data Sources - Technology Tutorial

Explaining the Granular Social Network

Tue, 15 Apr 2008 21:22:42 GMT

Courtesy of Thomas Vander Wal's interesting blog post titled: Explaining the Granular Social Network, I found a nice video that highlights the Who + What you know aspect of Social Networking ad the GGG in general.

As I can't quite remix Videos on the spur of the moment (yet), I would encourage you to watch the video and then click on the link to my FOAF Profile, then follow the "Linked Data" tab to see how Linked Data oriented platforms (in my case OpenLink Data Spaces) that exist today actually deliver what's explained in the video.

"What You Know" (Data & Friend Networks) ultimately trumps "Who You Know" (Friend only Networks). The exploitation power of this reality is enhanced exponentially via the Linked Data Web once the implications of beaming SPARQL queries down specific URIs (entry points to Linked Data graphs) become clearer :-)

Semantic Web Killer Application?

Tue, 05 Feb 2008 01:32:42 GMT

In response to the ReadWriteWeb piece titled: Semantic Web: What is the Killer App. by Alex Iskold:

Information overload and Data Portability are two of the most pressing and imminent challenges affecting every individual connected to the global village exposed by the Internet and World Wide Web. I wrote an earlier post titled: Why We Need Linked Data that shed light on frequently overlooked realities about the Document Web.

The real Killer application of the Semantic Web (imho) is Linked Data (or Hyperdata), just as the killer application of the Document Web was Linked Documents (Hyperlinks). Linked Data enables human users (indirectly) and software agents (directly in response to human instruction) to traverse Web Data Spaces (Linked Data enclaves within the Giant Global Graph).

Semantic Web applications (conduits between humans and agents) that take advantage of Linked Data include:

DBpedia - General Knowledge sourced from Wikipedia and a host of other Linked Data Spaces.

Various Linked Data Browsers: Zitgist Data Viewer, OpenLink RDF Browser, DISCO Browser, and TimBL's Tabulator.

zLknks - Linked Data Lookup technology for Web Content Publishing systems (note: more to come on this in a future post).

OpenLink Data Spaces - a solution for Data Portability via a Linked Data Junction Box for Web 1.0 ((X)HTML Document Webs), 2.0 (XML Web Services based Content Publishing, Content Syndication, and Aggregation), and 3.0 (Linked Data) Data Spaces. Thus, via my URI (when viewed through a Linked Data Browser/Viewer) you can traverse my Data Space (i.e my Linked Data Graph) generated by the following activities:

Blog Posts publishing

My RSS & Atom Content Subscriptions (what used to be called a "Blogroll")

My Bookmarks (from my Desktop and Del.icio.us)

and other things I choose to share with the public via the Web

Virtuoso - a Universal Server Platform that includes RDF Data Management, RDFization Middleware, SQL-RDF Mapping, RDF Linked Data Deployment, alongside a hybrid/multi-model, virtual/federated data service in a single product offering.

BTW - There is a Linked Data Workshop at this years World Wide Web conference. Also note the Healthcare & Life Science Workshop which is a related Linked Data technology and Semantic Web best practices realm.

CrunchBase gets hooked up with the Linked Data Web!

Wed, 30 Jul 2008 01:43:27 GMT

It's getting really hot in Linked Data land! Two days ago Benjamin Nowack pinged the LOD community about his RDFization of Crunchbase (sample (X)HTML view: http://cb.semsol.org/company/opera-software) courtesy of Crounchbase releasing an API. As you know, I've always equated Web Service API to Database CLIs (ODBC, JDBC, ADO.NET etc.) as both offer code level hooks into Data Spaces.

Naturally, we've decided to join the Crunchbase RDFization party, and have just completed a Virtuoso Sponger Cartridge (an RDFizer) for Crouncbase. What we add in our particular cartridge is additional meshing with DBpedia and Wikicompany Linked Data Spaces, plus RDFizaton of the Crunchbase (X)HTML pages :-)

As I've postulated for a while, Linked Data is about data "Meshing" and "Meshups". This isn't a buzzword play. I am pointing out an important distinction between "Mashups" and "Meshpus". Which goes as follows: "Mashups" are about code level joining devoid of structured modelling, hence the revelation of code as opposed to data when you look behind a "Mashup". "Meshups" on the other hand, are about joining disparate structured data sources across the Web. And when you look behind a "Meshup" you see structured data (preferably Linked Data) that enables further "Meshing".

I truly believe that we are now inches away from critical mass re. Linked Data, and because we are dealing with data, the network-effect will be sky-high! I shudder to think about the state of the Linked Data Web in 12 months time. Yes, I am giving the explosion 12 months (or less). These are very exciting times.

Demo Links:

For best experience I encourage you to look at the OpenLink Data Explorer extension for Firefox (2.x - 3.x). This enables you to go to Crunchbase (X)HTML pages (and other sites on the Web of course), and then simply use the "View | Linked Data Sources" main or context menu sequence to unveil the Linked Data Sources associated with any Web Page.

Of course there is much more to come!

Missing Bits from semanticweb.com Interview

Fri, 13 Jun 2008 13:01:40 GMT

Yikes! I've just discovered that the final part of the semanticweb.com's interview with Jim Hendler and I, includes critical paragraphs that omit my example links :-( As you can imagine, this is a quite excruciating, bearing in mind that "Literals" are of marginal value in a Linked Data world.

Anyway, thanks to the Blogosphere, I can attempt to fix this problem myself -- via this post :-)

Q. If you wanted to provide a bewildered but still curious novice a public example of Linked Data at work in their everyday life, what would it be?

Kingsley Idehen: Any one of the following:

My Linking Open Data community Profile Page - the Linked Data integration is exposed via the "Explore Data" Tab My Linked Data Space - viewed via OpenLink's AJAR (Asynchronous Javascript and RDF) based Linked Data Brower My Events Calendar Tag Cloud - a Linked Data view of my Calendar Space using an RDF-aware browser In all cases, you have the ability to explore my data spaces by simply clicking on the links, which on the surface appear to be standard hypertext links, although in reality you are dealing with hyperdata links (i.e., links to entities that result in the generation of entity description pages that expose entity properties via hyperdata links). Thus, you have a single page that describes me in a very rich way since it encompasses all data associated with me, covering: personal profile, blog posts, bookmarks, tag clouds, social networks etc.

Q. What would you show the CEO or CTO of a company outside the tech industry?

Kingsley Idehen: A link to the Entity ALFKI, from the popular Northwind Database associated with Microsoft Access and SQL Server database installations. This particular link exposes a typical enterprise data space (orders, customers, employees, suppliers ...) in a single page. The hyperdata links represent intricate data relationships common to most business systems that will ultimately seek to repurpose existing legacy data sources and SOA services as Linked Data. Alternatively, I would show the same links via the Zitgist Data Viewer (another Linked Data-aware browser). In both cases, I am exploiting direct access to entities via HTTP due to the protocols incorporation into the Data Source Naming scheme.

Important Things to Note about the World Wide Web

Thu, 23 Jul 2009 14:33:58 GMT

Based on the prevalence of confusion re. the Linked Data meme, here are a few important points to remember about the World Wide Web.

Its an HTTP based Network Cluster within the Internet (remember: Networks are about meshes of Nodes connected by Links)
Its underlying data model is that of a Network (we've had Network Data models for eons. EAV/CR is an example)
Links are facilitated via URIs
Until recently the granularity of Networking on the Web was scoped to Data Containers (documents) (due to prevalence of URL style links
The Linked Data meme adds Data Item (Datum) level granularity to World Wide Web networking via HTTP URIs
Data Items become Web Reference-able when you Identify/Name them using HTTP based URIs
An HTTP URI implicitly binds a Web Reference-able Data Item (Entity, Datum, Data Object, Resource) to its Web Accessible Metadata
Web Accessible Metadata resides within Data Containers (documents or information resources)
The representation of a Web Accessible Metadata container is negotiable
I am able to write and dispatch this blog post courtesy of the Web features listed above
You are able to explore the many dimensions to data exposed by this blog should you decide to explore the Linked Data mesh exposed by this post's HTTP URI (via its permalink permalink)

The HTTP URI is the secret sauce of the Web that is powerfully and unobtrusively reintroduced via the Linked Data meme (classic back to the future act). This powerful sauce possess a unique power courtesy of its inherent duality i.e., how it uniquely combines Data Item Identity (think keys in traditional DBMS parlance) with Data Access (e.g. access to negotiable representations of associated metadata).

As you can see, I've made no mention of RDF or SPARQL, and I can still articulate the inherent value of the "Linked Data" dimension that the "Linked Data" meme adds to the World Wide Web.

As per usual this post is a live demonstration of Linked Data (dog-food style) :-)

State of the Linked Data Web

Sun, 28 Mar 2010 22:25:19 GMT

The evolution of the Web into a federated database, information space, and knowledge-base hybrid continues at frenetic pace.

As more Linked Data is injected into the Web from the Linking Open Data community and other initiatives, it's important to note that "Linked Data" is available in a variety of forms such as:

Data Model Definition oriented Linked Data (aka. Data Dictionary)
Data Model Instance Data (aka. Instance Data)
Linked Data oriented solutions that leverage the smart data substrate that Models and Instance Data meshes deliver.

Note: The common glue across the different types of Linked Data remains the commitment to data object (entity) identification and access via de-referencable URIs (aka. record / entity level data source names).

As stated in my recent post titled: Semantic Web: Travails to Harmony Illustrated. Harmonious intersections of instance data, data dictionaries (schemas, ontologies, rules etc.) provide a powerful substrate (smart data) for the development and deployment of "People" and/or "Machine" oriented solutions. Of course, others have commented on these matters and expressed similar views (see related section below).

The clickable venn diagram below, provides a simple exploration path that exposes the linkage that already exists, across the different Linked Data types, within the burgeoning Linked Data Web.

Anant Jingran's insightful LDP Conference Trip report
Anant's recent post about the future of Data
Mike Bergman - A New Constellation in the Linking Open Data (LOD) Sky
Frederick Giasson - Exploding DBpedia Domain using UMBEL

Is Linked Data Always Relevant?

Wed, 31 Dec 2008 17:57:41 GMT

I pose the question above because I stumbled across an interesting claim about OpenLink Software and its representatives expressed in the ReadWriteWeb post titled: XBRL: Mashing Up Financial Statements, where the following claim is made:

"..There is evidence that they promote LINKED DATA at any expense without understanding the rationale behind other approaches...".

To answer the question above, Linked Data is always relevant as long as we are actually talking about "Data" which is simply the case all of the time, irrespective of interaction medium.

If XBRL can be disconnected in anyway from Linked Data, I desperately would like to be enlightened (as per my comments to the post). Why wouldn't anyone desire the ability to navigate the linked data inherent in any financial report? Every entity in an XBRL instance document is an entity, directly or indirectly related to other entities. Why "Mash" the data when you can harmonize XBRL data via a Generic Financial Dictionary (schema or ontology) such that descriptions of Balance Sheet, P&L, and other entities are navigable via their attributes and relationships? In short, why "Mash" (code based brute force joining across disparately shaped data) when you can "Mesh" (natural joining of structured data entities)?

"Linked Data" is about the ability to connect all our observations (data)? , perceptions (information), and inferences / conclusions (knowledge) across a spectrum of interaction media. And it just so happens that the RDF data model (Entity-Attribute-Vaue + Class Relationships + HTTP based Object Identifiers), a range of RDF data model serialization formats, and SPARQL (Query Language and Web Service combo) actually make this possible, in a manner consistent with the essence of the global space we know as the World Wide Web.

BBC's Britain from Above (core message: Data is Everything).

Semantic Web: Travails to Harmony Illustrated (Updated)

Sun, 28 Sep 2008 19:18:53 GMT

All about Data Dictionary issues

Over emphasis on Description Logics (RDFS, OWL, Inference & Reasoning etc) matters without any actual real-world instance data (e.g., lot's of reasoning over RDF in zip files or local drives).

All about Linking Openly accessible RDF Data Sets

Over emphasis on Instance Data without Data Dictionary appreciation and utilization (e.g., Linked Data instance level linkage via "owl:sameAs").

All about Applications & Frameworks

Here we are dealing with numerous applications and frameworks that inextricably bind Instance Data Management and Data Dictionaries. Basically, an all or nothing proposition, if you want to delve into the RDF Linked Data solutions realm.

Often overlooked, is the fact that the Linked Data Web - as an aspect of the Semantic Web innovation continuum - is fundamentally about designing and constructing an "Open World" compatible DBMS for the Internet. Thus, erstwhile "Closed World" DBMS components such as Data Dictionaries (handlers of Data Definition, Referential Integrity etc.) and actual Instance Data, are now distributed and loosely coupled. Thus, your data could be in one Data Space while the data dictionary resides in another. In actual fact, you could have several loosely bound data dictionaries that serve the specific Inference and Reasoning needs of a variety of applications, services, or agents.

Linked Data Illustrated and a Virtuoso Functionality Reminder

Mon, 28 Apr 2008 18:47:06 GMT

Daniel Lewis has put together a nice collection of Linked Data related posts that illustrate the fundamentals of the Linked Data Web and the vital role that Virtuoso plays as a deployment platform. Remember, Virtuoso was architected in 1998 (see Virtuoso History) in anticipation of the eventual Internet, Intranet, and Extranet level requirements for a different kind of Server. At the time of Virtuoso's inception, many thought our desire to build a multi-protocol, multi-model, and multi-purpose, virtual and native data server was sheer craziness, but we pressed on (courtesy of our vision and technical capabilities). Today, we have a very sophisticated Universal Server Platform (in Open Source and Commercial forms) that is naturally equipped to do the following via very simple interfaces:

- More...

Simple Explanation of RDF and Linked Data Dynamics

Fri, 24 Apr 2009 21:14:41 GMT

What is RDF?

The acronym stands for: Resource Description Framework. And that's just what it is.

RDF is comprised of a Data Model (EAV/CR Graph) and Data Representation Formats such as: N3, Turtle, RDF/XML etc.

RDF's essence is about: "Entities" and "Attributes" being URI based, while "Values" may be URI or Literals (typed or untyped) based.

URIs are Entity Identifiers.

What is Linked Data?

Short for "Web of Linked Data" or "Linked Data Web".

A term coined by TimBL that describes an HTTP based "data access by reference pattern" that uses a single pointer or handle for "referring to" and "obtaining actual data about" an entity.

Linked Data uses the deceptively simple messaging scheme of HTTP to deliver a granular entity reference and access mechanism that transcends traditional computing boundaries such as: operating system, application, database engines, and networks.

How are Linked Data & RDF Related?

Linked Data simply mandates the following re. RDF:

URIs should be HTTP based so that you can "refer to" (Reference) an Entity, its Attributes, or URI based Attribute values via the Web (infact any HTTP based network e.g., Intranets and Extranets)
URIs should also be HTTP based so that you can use them to de-reference resource descriptions via the Web (or Intranets and Extranets).

Note: by Entity I am also referring to: a resource (Web parlance), data item, data object, real-world object, or datum.

Linked Data is also about, using URIs and HTTP's content negotiation feature to separate: presentation, representation, access, and identity of data items. Even better, content negotiation can be driven by user agent and/or data server based quality of service algorithms (representation preference order schemes).

To conclude, Linked Data is ultimately about the realization that: Data is the new Electricity, and it's conductors are URIs :-)

Tip to governments of the world: we are in exponential times, the current downturn is but one side of the "exponential times ledger", the other side of the "exponential times ledger" is simply about unleashing "raw data" -- in structured form -- into the Web, so that "citizen analysts" can blossom and ultimately deliver the transparency desperately sought at every level of the economic value chain. Think: "raw data ready" whenever you ponder about "shovel ready" infrastructure projects!

Internet.com Interviews Jim Hendler & I

Thu, 12 Jun 2008 00:55:15 GMT

The build up to Linked Data Planet continues... Here is semanticweb.com's interview with Jim Hendler and *I* titled: Linked Data Leaders - The Semantic Web is Here.

The Linked Data Market via a BCG Matrix (Updated)

Fri, 26 Sep 2008 16:36:56 GMT

The sweet spot of Web 3.0 (or any other Web.vNext moniker) is all about providing Web Users with a structured and interlinked data substrate that facilitates serendipitous discovery of relevant "Things" i.e., a Linked Data Web -- a Web of Linkable Entities that goes beyond documents and other information resource (data containers) types.

Understanding potential Linked Data Web business models, relative to other Web based market segments, is best pursued via a BCG Matrix diagram, such as the one I've constructed below:

Notes:

Link Density

Web 1.0's collection of "Web Sites" have relatively low link density relative to Web 2.0's user-activity driven generation of semi-structured linked data spaces (e.g., Blogs, Wikis, Shared Bookmarks, RSS/Atom Feeds, Photo Galleries, Discussion Forums etc..)
Semantic Technologies (i.e. "Semantics Inside style solutions") which are primarily about "Semantic Meaning" culled from Web 1.0 Pages also have limited linked density relative to Web 2.0
The Linked Data Web, courtesy of the open-ended linking capacity of URIs, matches and ultimately exceeds Web 2.0 link density.

Relevance

Web 1.0 and 2.0 are low relevance realms driven by hyperlinks to information resources ((X)HTML, RSS, Atom, OPML, XML, Images, Audio files etc.) associated with Literal Labels and Tagging schemes devoid of explicit property based resource description thereby making the pursuit of relevance mercurial at best
Semantic Technologies offer more relevance than Web 1.0 and 2.0 based on the increased context that semantic analysis of Web pages accords
The Linked Data Web, courtesy of URIs that expose self-describing data entities, match the relevance levels attained by Semantic Technologies.

Serendipity Quotient (SDQ)

Web 1.0 has next to no serendipity, the closest thing is Google's "I'm Feeling Lucky" button
Web 2.0 possess higher potential for serendipitous discovery than Web 1.0, but such potential is neutralized by inherent subjectivity due to its human-interaction-focused literal foundation (e.g., tags, voting schemes, wiki editors etc.)
Semantic Technologies produce islands-of-relevance with little scope for serendipitous discovery due to URI invisibility, since the prime focus is delivering more context to Web search relative to traditional Web 1.0 search engines.
The Linked Data Web's use of URIs as the naming and resolution mechanism for exposing structured and interlinked resources provides the highest potential for serendipitous discovery of relevant "Things"

To conclude, the Linked Data Web's market opportunities are all about the evolution of the Web into a powerful substrate that offers a unique intersection of "Link Density" and "Relevance", exploitable across horizontal and vertical market segments to solutions providers. Put differently, SDQ is how you take "The Ad" out of "Advertising" when matching Web users to relevant things :-)

Exploring a Music Data Space via Linked Data

Tue, 05 Feb 2008 04:20:47 GMT

Frederick Giasson has put out a number of interesting posts (via his blog) about a conceptual Music Data Space (one of many Data Spaces that will ultimately permeate the Semantic Data Web). Anyway, While reading his initial post covering Music Domain URIs and Linked Data, it occurred to me that by only exposing the raw RDF instance data (RDF/XML format in this case) via URIs for: Diana Ross, Paul McCartney, The Beatles, and Madonna, the essence of the post may not be revealed to all, so I've knocked up a few demos to illustrate the core message:

Note: the enhanced hyperlink (typed data link) lookup presents options to perform an Explore (all data about subject across Domains in the data space i.e. data links to and from Subject), Dereference (specific data in the Subject's Domain i.e. data links originating from subject).

I built these Linked Data Pages by simply doing the following:

Open up our OAT based iSPARQL (Interactive SPARQL Query By Example) Tool
Paste a URI of Interest into the Data Source URI input field
Execute the Query (hitting the ">" button)
Saving the Query to WebDAV as a Linked Data Page (or what I initial called Dynamic Data Web pages in my Hello Data Web series of posts).
Share your Data, Information, Knowledge with others via URIs (as shown in the section above).

The Business Of Linked Data (BOLD) Discussion Space

Mon, 01 Feb 2010 14:02:27 GMT

I've created a new discussion space that's squarely focused on the business development and marketing aspects of "HTTP based Linked Data" (Linked Data). As its name indicates, It's a BOLD attempt to fill a VoiD. :-)

Background

A few months ago, Aldo Bucchi posted a message to the LOD mailing list seeking a discussion space for more business and marketing oriented topic, in relation to Linked Data. At the time, my assumption was that the existing LOD mailing list served that purpose absolutely fine, but in due course I came to realize that Aldo's request had a much lager foundation than I initially suspected.

Historic Oversight

Linked Data, like its umbrella Semantic Web Project, has suffered from an inadvertent oversight on the parts of many of its enthusiasts (myself included): 100% of the discussion spaces are created by, geared towards, or dominated by researchers (from Academia primarily) and/or developers. Thus, at the very least, we've been operating in an echo chamber that only feed the existing void between the core community and those who are more interested in discussing business and marketing related topics.

The new discussion space seeks to cover the following:

Brainstorming Value Proposition Articulation
War Story Exchanges
Case Studies and Use-cases
Market Research & Positioning (for instance Linked Data is killer technology that redefines Data Integration, but none of the major research firms currently make that connection)

How Do I Join The Conversation? Simply sign up on the Google hosted BOLD mailing list, introduce yourself (ideally), and then start conversing! :-)

The Business Of Linked Data (BOLD) Discussion Space

Sun, 31 Jan 2010 22:48:48 GMT

Background

Historic Oversight

The new discussion space seeks to cover the following:

Brainstorming Value Proposition Articulation
War Story Exchanges
Case Studies and Use-cases
Market Research & Positioning (for instance Linked Data is killer technology that redefines Data Integration, but none of the major research firms currently make that connection)

How Do I Join The Conversation? Simply sign up on the Google hosted BOLD mailing list, introduce yourself (ideally), and then start conversing! :-)

What is Linked Data, really?

Tue, 09 Nov 2010 18:53:01 GMT

Linked Data is simply hypermedia-based structured data.

Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.

The fundamental steps to creating Linked Data are as follows:

Choose a Name Reference Mechanism — i.e., URIs.
Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes
1. Subjects (also known as Entities)
2. Subject Attributes (also known as Entity Attributes), and
3. Attribute Values (also known as Subject Attribute Values or Entity Attribute Values).
Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, OData, OpenGraph, and many others.
Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.
Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:
1. Identify Subject(s) using Resolvable URI(s).
2. Identify Subject Attribute(s) using Resolvable URI(s).
3. Assign Attribute Values to Subject Attributes. These Values may be either Literals (e.g., STRINGs, BLOBs) or Resolvable URIs.

You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more.

Linked Data an Introduction -- simple introduction to Linked Data and its virtues
How Data Makes Corporations Dumb -- Jeff Jonas (IBM) interview
Hypermedia Types -- evolving information portal covering different aspects of Hypermedia resource types
URIBurner -- service that generates Linked Data from a plethora of heterogeneous data sources
Linked Data Meme -- TimbL design issues note about Linked Data
Data 3.0 Manifesto -- note about format agnostic Linked Data
DBpedia -- large Linked Data Hub
Linked Open Data Cloud -- collection of Linked Data Spaces
Linked Open Commerce Cloud -- commerce (clicks & mortar and/or clicks & clicks) oriented Linked Data Space
LOD Cloud Cache -- massive Linked Data Space hosting most of the LOD Cloud Datasets
LOD2 Initiative -- EU Co-Funded Project to develop global knowledge space from LOD

Linked Data Solution for Exposing OpenLink Product Portfolio

Mon, 25 Feb 2008 20:08:04 GMT

At OpenLink Software, we've had an immense problem explaining the depth and breadth of our product porfolio via traditional Document Web pages. Thanks to SPARQL and Linked Data, we are now able to use Web Data Object IDs (HTTP based URIs) to produce super SKUs for every item in our product portfolio. Even better, we are able to handle the additional challenge of exposing features and benefits which by their very nature are mercurial across an array of fronts (products releases, product formats, and supported platforms etc).

Now I can simply state the following using Linked Data (hyperdata) links:

OpenLink Software's product porfolio is comprised of the following product families:

Universal Data Access Drivers Suite (UDA) for ODBC, JDBC, ADO.NET, OLE-DB, and XMLA
OpenLink Data Spaces
Virtuoso

We no longer have to explain (repeatedly) why our drivers exist in Express, Lite, and Multi-Tier Edition formats, or why you ultimately need Multi-Tier Drivers over Single Tier Drivers (Express or Lite Editions) since you ultimately heed high-performance, data encryption, and policy based security across each of the data access driver formats.

Dog-fooding: Linked Data and OpenLink Product Portfolio

Fri, 24 Oct 2008 22:13:50 GMT

Thanks to RDF and Linked Data, it's becoming a lot easier for us to explain and reveal the depth of the OpenLink technology portfolio.

Here is a look at our offerings by product family:

As you explore the Linked Data graph exposed via our product portfolio, I expect you to experience, or at least spot, the virtuous potential of high SDQ (Serendipitous Discovery Quotient) courtesy of Linked Data, which is Web 3.0's answer to SEO. For instance, how Database, Operating System, and Processor family paths in the product portfolio graph (data network) unveil a lot more about OpenLink Software than meets the proverbial "eye" :-)

Personal and/or Service Specific Linked Data Spaces in the Cloud: DBpedia 3.4

Mon, 01 Feb 2010 13:58:14 GMT

We have just released an Amazon EC2 based public Snapshot of DBpedia 3.4. Thus, you can now instantiate a personal and/or service specific variant of the DBpedia 3.4 Linked Data Space. Basically, you can replicate what we host, within minutes (as opposed to days). In addition, you no longer need to squabble --on an unpredictable basis with others-- for the infrastructure resources behind DBpedia's public instance, when using the SPARQL Endpoint, Faceted Search & Find Services, or HTML Browser Pages etc.

How Does It work?

Instantiate a Virtuoso EC2 AMI (paid variety, which is aggressively priced at $49.99 for setup and $19.99 per month thereafter)
Mount the shared DBpedia 3.4 public snapshot
Start Virtuoso Server
Start exploiting the DBpedia Linked Data Space.

What Interfaces are exposed?

SPARQL Endpoint
Linked Data Viewer Pages (as you see in the public DBpedia instance)
Faceted Search & Find UI and Web Services (REST or SOAP)
All the inference rules for UMBEL, SUMO, YAGO, OpenCYC, and DBpedia-OWL data dictionaries
Type Correlations Between DBpedia and Freebase

Enjoy!

Personal and/or Service Specific Linked Data Spaces in the Cloud: DBpedia 3.4

Mon, 16 Nov 2009 18:30:20 GMT

How Does It work?

Instantiate a Virtuoso EC2 AMI (paid variety, which is aggressively priced at $49.99 for setup and $19.99 per month thereafter)
Mount the shared DBpedia 3.4 public snapshot
Start Virtuoso Server
Start exploiting the DBpedia Linked Data Space.

What Interfaces are exposed?

SPARQL Endpoint
Linked Data Viewer Pages (as you see in the public DBpedia instance)
Faceted Search & Find UI and Web Services (REST or SOAP)
All the inference rules for UMBEL, SUMO, YAGO, OpenCYC, and DBpedia-OWL data dictionaries
Type Correlations Between DBpedia and Freebase

Enjoy!

Linked Data enters state of Evoluation

Tue, 29 Apr 2008 20:25:47 GMT

During a brief chat with Michael Hausenblas about a new Linked Data project he is championing called: LForum, I made a freudian slip, in the form of the typo: Evoluation, which at the time was supposed to have been: Evolution. Anyway, we had a chuckle and realized we were on to something, so I proceeded to formalize the definition:

Evoluation is evolution devoid of the randomness of mutation. A state of being in which it is possible to evaluate and choose evolutionary paths.

Evoluation actually describes where we are today in relation to the World Wide Web; to the Linking Open Data community (LOD), it's taking the path towards becoming a Giant Global Graph of Linked Data; to the Web 2.0 community, it's simply a collection of Web Services and associated APIs; and to many others, it remains an opaque collection of interlinked documents.

The great thing about the Web is that it allows netizens to explore a plethora of paths without adversely affecting the paths of others. That said, controlling one's path may take mutation out of evolution, but we are still left with the requirement to adapt and eventually survive in a competitive environment. Thus, although we can evaluate and choose from the many paths the Web's evolution offers us, the path that delivers the most benefits ultimately dominates. :-)

Nice Quote about Information Architecture & World Wide Web

Wed, 27 Aug 2008 15:03:39 GMT

Even with the marginal degrees of serendipitous discovery that the current document oriented Web offers, it's still possible to stumble across poignant gems such as this statement from InspireUX :

The statement above resonates with a lot of my fundamental views about the essence of Web. It also drives right at the core of what we are trying to address with the OpenLink Data Explorer (ODE) which simply isn't about Linked Data visualization, but the combination of visualization, user interaction, and unobtrusive exposure and exploitation of Linked Data Entities culled from the existing Web of Linked Documents. ODE consumes and processes URIs or URLs. Thus, as long as the (X)HTML container / host document keeps URIs or URLs in "agent view", ODE will give you the option to interact with the-data-behind Web information resources (e.g., Web Pages, Images, Audio etc..)

Do remember, "mission-critical" is no longer a corporate / enterprise theme. The lines of demarcation between the individual and enterprise are blurring at warp speed.

Simple Compare & Contrast of Web 1.0, 2.0, and 3.0 (Update 1)

Wed, 29 Apr 2009 17:21:25 GMT

Here is a tabulated "compare and contrast" of Web usage patterns 1.0, 2.0, and 3.0.

	Web 1.0	Web 2.0	Web 3.0
Simple Definition	Interactive / Visual Web	Programmable Web	Linked Data Web
Unit of Presence	Web Page	Web Service Endpoint	Data Space (named structured data enclave)
Unit of Value Exchange	Page URL	Endpoint URL for API	Resource / Entity / Object URI
Data Granularity	Low (HTML)	Medium (XML)	High (RDF)
Defining Services	Search	Community (Blogs to Social Networks)	Find
Participation Quotient	Low	Medium	High
Serendipitous Discovery Quotient	Low	Medium	High
Data Referencability Quotient	Low (Documents)	Medium (Documents)	High (Documents and their constituent Data)
Subjectivity Quotient	High	Medium (from A-list bloggers to select source and partner lists)	Low (everything is discovered via URIs)
Transclusence	Low	Medium (Code driven Mashups)	HIgh (Data driven Meshups)
What You See Is What You Prefer (WYSIWYP)	Low	Medium	High (negotiated representation of resource descriptions)
Open Data Access (Data Accessibility)	Low	Medium (Silos)	High (no Silos)
Identity Issues Handling	Low	Medium (OpenID)	High (FOAF+SSL)
Solution Deployment Model	Centralized	Centralized with sprinklings of Federation	Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia)
Data Model Orientation	Logical (Tree based DOM)	Logical (Tree based XML)	Conceptual (Graph based RDF)
User Interface Issues	Dynamically generated static interfaces	Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath)	Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF
Data Querying	Full Text Search	Full Text Search	Full Text Search + Structured Graph Pattern Query Language (SPARQL)
What Each Delivers	Democratized Publishing	Democratized Journalism & Commentary (Citizen Journalists & Commentators)	Democratized Analysis (Citizen Data Analysts)
Star Wars Edition Analogy	Star Wars (original fight for decentralization via rebellion)	Empire Strikes Back (centralization and data silos make comeback)	Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation")

Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)

Web 3.0 The Best Official Definition Imaginable -- Nova Spivack's

BBC Linked Data Meshup In 3 Steps

Fri, 12 Jun 2009 20:38:34 GMT

Situation Analysis:

Dr. Dre is one of the artists in the Linked Data Space we host for the BBC. He is also referenced in music oriented data spaces such as DBpedia, MusicBrainz and Last.FM (to name a few).

Challenge:

How do I obtain a holistic view of the entity "Dr. Dre" across the BBC, MusicBrainz, and Last.FM data spaces? We know the BBC published Linked Data, but what about Last.FM and MusicBrainz? Both of these data spaces only expose XML or JSON data via REST APIs?

Solution:

Simple 3 step Linked Data Meshup courtesy of Virtuoso's in-built RDFizer Middleware "the Sponger" (think ODBC Driver Manager for the Linked Data Web) and its numerous Cartridges (think ODBC Drivers for the Linked Data Web).

Steps:

Go to Last.FM and search using pattern: Dr. Dre (you will end up with this URL: http://www.last.fm/music/Dr.+Dre)
Go to the Virtuoso powered BBC Linked Data Space home page and enter: http://bbc.openlinksw.com/about/html/http://www.last.fm/music/Dr.+Dre
Go to the BBC Linked Data Space home page and type full text pattern (using default tab): Dr. Dre, then view Dr. Dre's metadata via the Statistics Link.

What Happened?

The following took place:

Virtuoso Sponger sent an HTTP GET to Last.FM
Distilled the "Artist" entity "Dr. Dre" from the page, and made a Linked Data graph
Inverse Functional Property and sameAs reasoning handled the Meshup (augmented graph from a conjunctive query processing pipeline)
Links for "Dr. Dre" across BBC (sameAs), Last.FM (seeAlso), via DBpedia URI.

The new enhanced URI for Dr. Dre now provides a rich holistic view of the aforementioned "Artist" entity. This URI is usable anywhere on the Web for Linked Data Conduction :-)

Related (as in NearBy)

Augmenting Last.fm Data with BBC data on the Talis Platform

Welcoming Freebase to the Linked Data Web

Fri, 31 Oct 2008 15:23:35 GMT

Finally! That's all I can say re. Freebase :-) They've now plugged their database and their community driven data curation efforts into the burgeoning Linked Data Web.

Here are some examples of how we distill Entities (People, Places, Music, and other things) from Freebase (X)HTML pages (meaning: we don't have to start from RDF information resources as data sources for the eventual RDF Linked Data we generate):

Tip: Install our OpenLink Data Explorer extension for Firefox. Once installed, simply browse through Freebase, and whenever you encounter a page about something of interest, simply use the following sequences to distill (via the Page Description feature) the entities from the page you are reading:

CTRL-Click (Mac OS X)
Right+Click (Windows & Linux)

What is Linked Data, really?

Tue, 15 Feb 2011 22:28:06 GMT

Linked Data is simply hypermedia-based structured data.

Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.

The fundamental steps to creating Linked Data are as follows:

Choose a Name Reference Mechanism — i.e., URIs.
Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes
1. Subjects (also known as Entities)
2. Subject Attributes (also known as Entity Attributes), and
3. Attribute Values (also known as Subject Attribute Values or Entity Attribute Values).
Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, and OData; there are many others.
Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.
Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:
1. Identify Subject(s) using Resolvable URI(s).
2. Identify Subject Attribute(s) using Resolvable URI(s).
3. Assign Attribute Values to Subject Attributes. These Values may be either Literals (e.g., STRINGs, BLOBs) or Resolvable URIs.

Hypermedia Types -- evolving information portal covering different aspects of Hypermedia resource types
URIBurner -- service that generates Linked Data from a plethora of heterogeneous data sources
Linked Data Meme -- TimbL design issues note about Linked Data
Data 3.0 Manifesto -- note about format agnostic Linked Data
DBpedia -- large Linked Data Hub
Linked Open Data Cloud -- collection of Linked Data Spaces
Linked Open Commerce Cloud -- commerce (clicks & mortar and/or clicks & clicks) oriented Linked Data Space
LOD Cloud Cache -- massive Linked Data Space hosting most of the LOD Cloud Datasets
LOD2 Initiative -- EU Co-Funded Project to develop global knowledge space from LOD

Time for Context Lenses (Update)

Mon, 04 Aug 2008 15:24:50 GMT

As the Linked Data meme continues on it's quest to unravel the mysteries of the Semantic Web vision, it's quite gratifying to see that data virtualization comprehension: creating "Conceptual Views" into logically organized "Disparate & Heterogeneous Data Sources" via "Context Lenses" is taking shape, as illustrated in the "note-to-self" post by David Provost.

Virtualization of heterogeneous data sources is only achievable if you have a dexterous data model based "Bus" into which the data sources are plugged. RDF has offered such a model for a long time.

When heterogeneous data sources are plugged into an RDF based integration bus e.g., customer records sourced from a variety of tables, across a plethora of databases, you can only end up with true value if the emergent entities from such an effort are coherently linked and (de)referencable; which is what Linked Data's fundamental preoccupation with dereferencable URIs is all about. Of course, Even when you have all of the above in place, you also need to be able to construct "Context Lenses" i.e., context driven views of the Linked Data Mesh (or Linked Data Spaces).

Additional Diagrams:

1. Clients of the RDF Bus
2. RDF Bus Server plugins: Scripts that emit RDF
3. RDF Bus Servers: RDF Data Managers (Triple or Quad Stores)
4. RDF Bus Servers: Relational to RDF Mappers (RDF Views, Semantic Covers etc.)
5. RDF Bus Server plugins: XML to RDF Mappers
6. RDF Bus Server plugins: GRDDL based XSLT stylesheets that emit RDF
7. RDF Bus Server plugins: Intelligent RDF Middleware

Comments about recent Semantic Gang Podcast

Tue, 06 May 2008 00:06:42 GMT

After listening to the latest Semantic Web Gang podcast, I found myself agreeing with some of the points made by Alex Iskold, specifically:

-- History is a great tutor, answers to many of today's problems always lie somewhere in plain sight of the past.

Of course, I also believe that Linked Data serves Web Data Integration across the Internet very well too, and the fact that it will be beneficial to businesses in a big way. No individual or organization is an island, I think the Internet and Web have done a good job of demonstrating that thus far :-) We're all data nodes in a Giant Global Graph.

Daniel lewis did shed light on the read-write aspects of the Linked Data Web, which is actually very close to the callout for a Wikipedia for Data. TimBL has been working on this via Tabulator (see Tabulator Editing Screencast), Bengamin Nowack also added similar functionality to ARC, and of course we support the same SPARQL UPDATE into an RDF information resource via the RDF Sink feature of our WebDAV and ODS-Briefcase implementations.

Revisiting HTTP based Linked Data (Update 1 - Demo Video Links Added)

Mon, 08 Mar 2010 14:59:37 GMT

Motivation for this post arose from a series of Twitter exchanges between Tony Hirst and I, in relation to his blog post titled: So What Is It About Linked Data that Makes it Linked Data™ ?

At the end of the marathon session, it was clear to me that a blog post was required for future reference, at the very least :-)

What is Linked Data?

"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.

Why is it Important?

Information makes the world tick!

Information doesn't exist without data to contextualize.

Information is inaccessible without a projection (presentation) medium.

All information (without exception, when produced by humans) is subjective. Thus, to truly maximize the innate heterogeneity of collective human intelligence, loose coupling of our information and associated data sources is imperative.

How is Linked Data Delivered?

Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources bearing structured representations of data object descriptions. Remember, you have a single Identifier abstraction (generic HTTP URI) that embodies: Data Object Name and Data Representation Location (aka URL).

How are Linked Data Object Representations Structured?

A structured representation of data exists when an Entity (Datum), its Attributes, and its Attribute Values are clearly discernible. In the case of a Linked Data Object, structured descriptions take the form of a hypermedia based Entity-Attribute-Value (EAV) graph pictorial -- where each Entity, its Attributes, and its Attribute Values (optionally) are identified using Generic HTTP URIs.

Examples of structured data representation formats (content types) associated with Linked Data Objects include:

text/html
text/turtle
text/n3
application/json
application/rdf+xml
Others

How Do I Create Linked Data oriented Hypermedia Resources?

You markup resources by expressing distinct entity-attribute-value statements (basically these a 3-tuple records) using a variety of notations:

(X)HTML+RDFa,
JSON,
Turtle,
N3,
TriX,
TriG,
RDF/XML, and
Others (for instance you can use Atom data format extensions to model EAV graph as per OData initiative from Microsoft).

You can achieve this task using any of the following approaches:

Notepad
WYSIWYG Editor
Transformation of Database Records via Middleware
Transformation of XML based Web Services output via Middleware
Transformation of other Hypermedia Resources via Middleware
Transformation of non Hypermedia Resources via Middleware
Use a platform that delivers all of the above.

Practical Examples of Linked Data Objects Enable

Describe Who You Are, What You Offer, and What You Need via your structured profile, then leave your HTTP network to perform the REST (serendipitous discovery of relevant things)
Identify (via map overlay) all items of interest based on a 2km+ radious of my current location (this could include vendor offerings or services sought by existing or future customers)
Share the latest and greatest family photos with family members *only* without forcing them to signup for Yet Another Web 2.0 service or Social Network
No repetitive signup and username and password based login sequences per Web 2.0 or Mobile Application combo
Going beyond imprecise Keyword Search to the new frontier of Precision Find - Example, Find Data Objects associated with the keywords: Tiger, while enabling the seeker disambiguate across the "Who", "What", "Where", "When" dimensions (with negation capability)
Determine how two Data Objects are Connected - person to person, person to subject matter etc. (LinkedIn outside the walled garden)
Use any resource address (e.g blog or bookmark URL) as the conduit into a Data Object mesh that exposes all associated Entities and their social network relationships
Apply patterns (social dimensions) above to traditional enterprise data sources in combination (optionally) with external data without compromising security etc.

How Do OpenLink Software Products Enable Linked Data Exploitation?

Our data access middleware heritage (which spans 16+ years) has enabled us to assemble a rich portfolio of coherently integrated products that enable cost-effective evaluation and utilization of Linked Data, without writing a single line of code, or exposing you to the hidden, but extensive admin and configuration costs. Post installation, the benefits of Linked Data simply materialize (along the lines described above).

Our main Linked Data oriented products include:

OpenLink Data Explorer -- visualizes Linked Data or Linked Data transformed "on the fly" from hypermedia and non hypermedia data sources
URIBurner -- a "deceptively simple" solution that enables the generation of Linked Data "on the fly" from a broad collection of data sources and resource types
OpenLink Data Spaces -- a platform for enterprises and individuals that enhances distributed collaboration via Linked Data driven virtualization of data across its native and/or 3rd party content manager for: Blogs, Wikis, Shared Bookmarks, Discussion Forums, Social Networks etc
OpenLink Virtuoso -- a secure and high-performance native hybrid data server (Relational, RDF-Graph, Document models) that includes in-built Linked Data transformation middleware (aka. Sponger).

Hypertext Transfer Protocol 1.1 RFC
Open Data Protocol Glossary
Simple Explanation of RDF and Linked Data Dynamics
Collection of post from the past about Linked Data
Are We There Yet Re. Web++? -- includes link to podcast conversation with Jon Udell
Web of Linked Data Pivoting Demo from TED -- by Microsoft's Gary Flake
Microsoft Pivot atop Virtuoso Quad Store's Faceted Browser Engine-- My Demonstration of EAV model transcending data representation variations (i.e., RDF's EAV data model data served up in Microsoft CXML data representation format).

Adding Wordpress Blogs into the Linked Data Web using Virtuoso

Thu, 10 Apr 2008 16:33:05 GMT

Wordpress is a Weblog platform comprised of the following:

User Interface - PHP
Application Logic - PHP
Data Storage (SQL RDBMS) - MySQL via PHP-MySQL
Application Server - Apache

In the form above (the norm), Wordpress data can be injected into the Linked Data Web via RDFization middleware such as theVirtuoso Sponger (built into all Virtuoso instances) and Triplr. The downside of this approach is that the blog owner doesn't necessary possess full control over their contributions to the emerging Giant Global Graph or Linked Data.

Another route to Linked Data exposure is via Virtuoso's Metaschema Language for producing RDF Views over ODBC/JDBC accessible Data Sources, that enables the following setup:

User Interface - PHP
Application Logic - PHP
Data Storage (SQL RDBMS) - MySQL via the PHP-MySQL data access interface
Virtual Database linkage of MySQL Tables into Virtuoso
RDF View generated over the Virtual SQL Tables
Application Server - Virtuoso which provides Linked Data Deployment such that RDF Linked Data is exposed when requested by Web User Agents.

Alternatively, you can also exploit Virtuoso as the SQL DBMS, RDF DBMS, Application Server, and Linked Data Deployment platform:

User Interface - PHP
Application Logic - PHP
Data Storage (SQL RDBMS) - Virtuoso via PHP-ODBC data access interface (* ODBC is Virtuoso's native SQL CLI/API *)
RDF View generated over the Native SQL Tables
Application Server - Virtuoso which provides Linked Data Deployment such that RDF Linked Data is exposed when requested by Web User Agents (e.g. OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, and Tabulator).

Benefits?

Each user account gets a proper Linked Data URI (ID) that can me meshed/smushed with other IDs (so you add data from this new blog space to other linked data sources associated with you other URIs/IDs)
Each post gets a proper URI All data is now query-able via SPARQL Discoverability increases exponentially (without drop in relevance in either direction i.e. discovering or being discovered)

How Do I map the WordPress SQL Schema to RDF using Virtuoso?

Determine the RDF Schema or Ontologies that define the Classes for which you will be producing instance data (e.g. SIOC and FOAF)
Declare URI/IRI generator functions (*special Virtuoso functions*)
Use SPARQL Graph patterns to apply URI/IRI generator functions to Tables, Views, Table Values mode Stored Procedures, Query Resultsets as part of RDBMS to RDF mapping

Read the Meta Schema Language guide or simply apply our "WordPress SQL Schema to RDF" script to your Virtuoso hosted instance. Of course, there are other mappings that cover other PHP applications deployed via Virtuoso:

phpBB3 SQL Schema to RDF
Drupal SQL Schema to RDF
MediaWiki SQL Schema to RDF

Live Demos?

Important Movie and Ultimate Linked Data Documentary (Update 3)

Sun, 15 Mar 2009 14:35:49 GMT

If you are still grappling with the "Semantic Web Project" and one of its more distinguished deliverables: Linked Data Web, then please make time to watch and digest the imminence of this 1990 documentary about Hypermedia titled: Hyperland, by the late Douglas Adams.

Hyperland Documentary -- Youtube
Hyperworlds - Ted Nelson Presentation
The Invention of the World Wide Web
The Web's Secret Stories - TED Presentation (basically about using the Web reveal [connections] commonality via [dots] individuality pre. Twitter
Pattie Mae demonstrates 6th sense - an example of what will be done with Linked Data re. user interaction.
TimBL's TED 2009 Linked Data Presentation

Injecting Facebook Data into the Semantic Data Web

Wed, 11 Feb 2009 12:40:11 GMT

I now have the first cut of a Facebook application called: Dynamic Linked Data Pages.

What is a Dynamic Linked Data Page (DLD)?

A dynamically generated Web Page comprised of Semantic Data Web style data links (formally typed links) and traditional Document Web links (generic links lacking type specificity).

Linked Data Pages will ultimately enable Facebook users to inject their public data into the Semantic Data Web as RDF based Linked Data. For instance, my Facebook Profile & Photo albums data is now available as RDF, without paying a cent of RDF handcrafting tax, thanks to the Virtuoso Sponger (middleware for producing RDF from non RDF data sources) which is now equipped with a new RDFizer Cartridger for the Facebook Query Language (FQL) and RESTful Web Service.

Demo Notes:

When you click on a link in DLD pages, you will be presented with a lookup that exposes the different interaction options associated with a given URI. Examples include:

Explore - find attributes and relationships that apply to the clicked URI
Dereference (get the attributes of the clicked URI)
Bookmark - store the URI for subsequent use e.g meshing with other URIs from across the Web
(X)HTML Page Open - traditional Document Web link (i.e. just opens another Web document as per usual)

Remember, the facebook URLs (links to web pages) are being converted, on the fly, into RDF based Structured Data ( graph model database) i.e Entity Sets that possess formally defined characteristics (attributes) and associations (relationships).

Dynamic Linked Data Pages

Saved RDF Browser Sessions

Saved SPARQL Query Definitions

Clearing Up RDF misrepresentation once again!

Wed, 30 Apr 2008 16:07:58 GMT

Daniel Lewis has penned a post titled: Clearing up some misconceptions..again, in response to Ben Werdmuller's post titled: Introducing the Open Data Definition.

The great thing about the Linked Data Web is that it's much easier to discovery and respond to these points of view before the ink dries :-) Ben certainly needs to take a look at the Semantic Web FAQ pre or post assimilation of Daniel's response.

Linked Data in Action: Library of Congress

Wed, 11 Jun 2008 17:16:31 GMT

As I start my countdown to the upcoming Linked Data Planet conference, here is the first of a series of posts geared towards showcasing practical use of the burgeoning Linked Data Web.

First up, the Library of Congress, take a look at the following pages which are "Human" and machine based "User Agent" friendly:

Key point: The pages above are served up in line with Linked Data deployment and publishing tenets espoused by the Linking Open Data Community (LOD) which include (in my preferred terminology):

Giving "Names" to things you observe (aka Data Source Names or "DSNs" for short)
Use HTTP URLs in your data source naming scheme so that "access by reference" to your data sources exploits the expanse of the HTTP driven Web i.e make your DSNs "Linked Data Source Names" (LDNS)
Remember that Documents / Pages are compound in nature, and they aren't the only data sources we would want to name; a document's LDSN must be distinct from the LDSNs used for the subject matter concepts and/or named entities associated with a document
Use the RDF Data Model to express structure within your data source(s)
Use LDSNs when constructing statements/claims/assertions/records (triples) inside your structured data sources
When publishing Web Pages related to your data sources; use at least one of the following to methods to guide user agents to data sources associated with your published page; the HTML LINK tag, RDFa, GRDDL, or Content Negotiation.

The items above are features that users and decision makers should start to hone into when seeking, and evaluating, platforms that facilitate cost-effective exploitation of the Linked Data Web.

What is the Linked Data Meme about?

Wed, 29 Apr 2009 20:31:10 GMT

The act of using URIs to "refer to" (reference) Web addressable data objects. It's also the act of using the same URI to de-reference the description of a referenced data object; in this case, the representation of the description is negotiated by a Web client and/or Web server. Thus, you can access the description of a data object via data representation formats such as: JSON, XML, (X)HTML, RDF/XML, N3, Turtle, TriX etc.

Note: In proper Web parlance, a data object is referred to as a resource.

Simple example (using DBpedia)

In the Linked Data realm, If you want to make a reference to the Linked Data meme in a blog post, you are better off using the resource URI: http://dbpedia.org/resource/Linked_Data, instead of the Web page URL: http://dbpedia.org/page/Linked_Data, which is the address of a physical document (an information conveying artifact) that at best visually presents the negotiated representation of a resource description.

Why is this valuable?

In the simplest sense, you only have one focal point for referencing (referring to) and de-referencing (retrieving data about) a given Web resource. It protects you from the impact of Web document location changes (amongst many other things).

Remember, a single URI is a conduit into a realm where the identity, access, representation, presentation, and storage of a resource (data object) are completely distinct. It's the mechanism for conducting data across network, machine, operating system, dbms engine, application, and service (API) boundaries. Thus, without "linked data meme" prescribed URI referencing and de-referencing, we are simply back to "business as usual" re. the industry at large, where networks, operating systems, dbms engines, applications, and services (APIs) become the basis for "data lock-in" and silo construction.

Going forward

Take a second to think about the profound virtues of the ubiquitous Web of Linked Document URLs that we have today, and then apply that thinking to the burgeoning Web of Linked Data URIs, that has just turned corner and heading in everyone's direction at full blast.

Note to "Social Media" players: Who you know isn't the canonical object of sociality. What you are i.e., your description and the data objects it exposes, are real objects of your sociality :-)

Other post in this Blog Data Space associated with "Linked Data".

Exploring the Value Proposition of Linked Data

Fri, 24 Jul 2009 12:20:01 GMT

What is Linked Data?

The primary topic of a meme penned by TimBL in the form of a Design Issues Doc (note: this is how TimBL has shared his thoughts since the Beginning of the Web).

There are a number of dimensions to the meme, but its primary purpose is the reintroduction of the HTTP URI -- a vital component of the Web's core architecture.

What's Special about HTTP URIs?

They possess an intrinsic duality that combines persistent and unambiguous Data Identity with platform & representation format independent Data Access. Thus, you can use a string of characters that look like a contemporary Web URL to unambiguously achieve the following:

Identity or Name Anything of Interest
Describe Anything of Interest by associating the Description Subject's Identity with a constellation of Attribute and Value pairs (technically: an Entity-Attribute-Value or Subject-Predicate-Object graph)
Make the Description of Named Things of Interest discoverable on the Web by implicitly binding the aforementioned to Documents that hold their descriptions (technically: metadata documents or information resources)

What's the basic value proposition of the Linked Data meme?

Enabling more productive use of the Web by users and developers alike. All of which is achieved by tweaking the Web's Hyperlinking feature such that it now includes Hypertext and Hyperdata as link types.

Note: Hyperdata Linking is simply what an HTTP URI facilitates.

Examples problems solved by injecting Linked Data into the Web:

Federated Identity by enabling Individuals to unambiguously Identify themselves (Profiles++) courtesy of existing Internet and Web protocols (e.g., FOAF+SSL's WebIDs which combine Personal Identity with X.509 certificates and HTTPs based client side certification)
Security and Privacy challenge alleviation by delivering a mechanism for policy based data access that feeds off federated individual identity and social network (graph) traversal
Spam Busting via the above
Increasing the Serendipitous Discovery Quotient (SDQ) of Web accessible resources by embedding Rich Metadata into (X)HTML Documents e.g., structured descriptions of your "WishLists" and "OfferLists" via a common set of terms offered by vocabularies such as GoodRelations and SIOC
Coherent integration of disparate data across the Web and/or within the Enterprise via "Data Meshing" rather than "Data Mashing"
Moving beyond imprecise statistically driven "Keyword Search" (e.g. Page Rank) to "Precision Find" driven by typed link based Entity Rank plus Entity Type and Entity Property filters.

Conclusion

If all of the above still falls into the technical mumbo-jumbo realm, then simply consider Linked Data as delivering Open Data Access in granular form to Web accessible data -- that goes beyond data containers (documents or files).

The value proposition of Linked Data is inextricably linked to the value proposition of the World Wide Web. This is true, because the Linked Data meme is ultimately about an enhancement of the current Web; achieved by reintroducing its architectural essence -- in new context -- via a new level of link abstraction, courtesy of the Identity and Access duality of HTTP URIs.

As a result of Linked Data, you can now have Links on the Web for a Person, Document, Music, Consumer Electronics, Products & Services, Business Opening & Closing Hours, Personal "WishLists" and "OfferList", an Idea, etc.. in addition to links for Properties (Attributes & Values) of the aforementioned. Ultimately, all of these links will be indexed in a myriad of ways providing the substrate for the next major period of Internet & Web driven innovation, within our larger human-ingenuity driven innovation continuum.

Recipes for Describing Your Business and its Offerings using the GoodRelations Vocabulary / Schema
Solving Real Problems with RDF based Linked Data
Other Linked Data Posts from this Blog oriented Linked Data Space (goes back a few years!)
Various practical Linked Data demo links from my Del.icio.us Bookmark oriented Data Space
My personal WebID which is conduit to a Linked Data mesh covering vast variety of things I've opted to share with others via the Web (best viewed using a Linked Data aware User Agent like ODE).

DBpedia 3.1 is now Live!

Thu, 14 Aug 2008 12:15:47 GMT

DBpedia 3.1 is now live. The release highlights are as follows:

116,7 million triples (27% increase over prior release)
better YAGO mapping (instances associated with YAGO classes)
Geo extractor code has been improved and is now run for all 14 languages
New (X)HTML based Resource/Entity Description Page (Example: Linked Data)

Enjoy!

Virtuoso + DBpedia 3.6 Installation Guide (Update 1)

Tue, 25 Jan 2011 19:46:26 GMT

What is DBpedia?

DBpedia is a community effort to provide a contemporary deductive database derived from Wikipedia content. Project contributions can be partitioned as follows:

Ontology Construction and Maintenance
Dataset Generation via Wikipedia Content Extraction & Transformation
Live Database Maintenance & Administration -- includes actual Linked Data loading and publishing, provision of SPARQL endpoint, and traditional DBA activity
Internationalization.

Why is DBpedia important?

Comprising the nucleus of the Linked Open Data effort, DBpedia also serves as a fulcrum for the burgeoning Web of Linked Data by delivering a dense and highly-interlinked lookup database. In its most basic form, DBpedia is a great source of strong and resolvable identifiers for People, Places, Organizations, Subject Matter, and many other data items of interest. Naturally, it provides a fantastic starting point for comprehending the fundamental concepts underlying TimBL's initial Linked Data meme.

How do I use DBpedia?

Depending on your particular requirements, whether personal or service-specific, DBpedia offers the following:

Datasets that can be loaded on your deductive database (also known as triple or quad stores) platform of choice
Live browsable HTML+RDFa based entity description pages
A wide variety of data formats for importing entity description data into a broad range of existing applications and services
A SPARQL endpoint allowing ad-hoc querying over HTTP using the SPARQL query language, and delivering results serialized in a variety of formats
A broad variety of tools covering query by example, faceted browsing, full text search, entity name lookups, etc.

What is the DBpedia 3.6 + Virtuoso Cluster Edition Combo?

OpenLink Software has preloaded the DBpedia 3.6 datasets into a preconfigured Virtuoso Cluster Edition database, and made the package available for easy installation.

Why is the DBpedia+Virtuoso package important?

The DBpedia+Virtuoso package provides a cost-effective option for personal or service-specific incarnations of DBpedia.

For instance, you may have a service that isn't best-served by competing with the rest of the world for ad-hoc query time and resources on the live instance, which itself operates under various restrictions which enable this ad-hoc query service to be provided at Web Scale.

Now you can easily commission your own instance and quickly exploit DBpedia and Virtuoso's database feature set to the max, powered by your own hardware and network infrastructure.

How do I use the DBpedia+Virtuoso package?

Pre-requisites are simply:

Functional Virtuoso Cluster Edition installation.
Virtuoso Cluster Edition License.
90 GB of free disk space -- you ultimately only need 43 gigs, but this our recommended free disk space size pre installation completion.

To install the Virtuoso Cluster Edition simply perform the following steps:

Download Software.
Run installer
Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell):

. /opt/virtuoso/virtuoso-enterprise.sh
Run the mkcluster.sh script which defaults to a 4 node cluster
Set VIRTUOSO_HOME environment variable -- if you want to start cluster databases distinct from single server databases via distinct root directory for database files (one that isn't adjacent to single-server database directories)
Start Virtuoso Cluster Edition instances using command:
virtuoso-start.sh
Stop Virtuoso Cluster Edition instances using command:
virtuoso-stop.sh

To install your personal or service specific edition of DBpedia simply perform the following steps:

Navigate to your installation directory
Download Installer script (dbpedia-install.sh)
Set execution mode on script using command:
chmod 755 dbpedia-install.sh
Shutdown any Virtuoso instances that may be currently running
Set your VIRTUOSO_HOME environment variable, e.g., to the current directory, via command (this may vary depending on your shell):
export VIRTUOSO_HOME=`pwd`
Run script using command:
sh dbpedia-install.sh

Once the installation completes (approximately 1 hour and 30 minutes from start time), perform the following steps:

Verify that the Virtuoso Conductor (HTML based Admin UI) is in place via:
http://localhost:[port]/conductor
Verify that the Precision Search & Find UI is in place via:
http://localhost:[port]/fct
Verify that DBpedia's Green Entity Description Pages are in place via:
http://localhost:[port]/resource/DBpedia

The Virtuous Web of Linked Data -- Business Perspective (Updated)

Fri, 24 Oct 2008 18:49:18 GMT

Orri Erling (Program Manager: OpenLink Virtuoso) has dropped a well explained reiteration of the essence of the "Linked Data Web" or "Data Web" with an emphasis on the business value. His post is titled: State of the Semantic Web (Part 1) - Sociology, Business, and Messaging.

Typically, Orri's post are targeted at the hard core RDF and SQL DBMS audiences, but in this particular post, he shoots straight at the business community revealing "Opportunity Cost" containment as the invisible driver behind the business aspects of any market inflection.

Remember, the Web isn't ubiquitous because its users mastered the mechanics and virtues of HTML and/or HTTP. Web ubiquity is a function of the opportunity cost of not being on the Web, courtesy of the network effects of hyperlinked documents -- i.e., the instant gratification of traversing documents on the Web via a single click action. In similar fashion, the Linked Data Web's ubiquity will simply come down to the opportunity cost of not being "inside the Web", courtesy of the network effects of hyperlinked entities (documents, people, music, books, and other "Things").

Here are some excerpts from Orri's post:

Every time there is a major shift in technology, this shift needs to be motivated by addressing a new class of problem. This means doing something that could not be done before. The last time this happened was when the relational database became the dominant IT technology. At that time, the questions involved putting the enterprise in the database and building a cluster of line of business applications around the database. The argument for the RDBMS was that you did not have to constrain the set of queries that might later be made, when designing the database. In other words, it was making things more ad hoc. This was opposed then on grounds of being less efficient than the hierarchical and network databases which the relational eventually replaced. Today, the point of the Data Web is that you do not have to constrain what your data can join or integrate with, when you design your database. The counter-argument is that this is slow and geeky and not scalable. See the similarity? A difference is that we are not specifically aiming at replacing the RDBMS. In fact, if you know exactly what you will query and have a well defined workload, a relational representation optimized for the workload will give you about 10x the performance of the equivalent RDF warehouse. OLTP remains a relational-only domain. However, when we are talking about doing queries and analytics against the Web, or even against more than a handful of relational systems, the things which make RDBMS good become problematic.

If we think about Web 1.0 as a period where the distinguishing noun was: "Author", and Web 2.0 the noun: "Journalist", we should be able to see that what comes next is the noun: "Analyst". This new generation analyst would be equipped with de-referencable Web Identity courtesy of their Person Entity URI. The analyst's URI would also be the critical component of Web based low cost attribution ecosystem; one that ultimately turns the URI into the analyst's brand emblem / imprint.

Paul Downey

Vanity of Demanding Attribution

State of the Semantic Web Presentation

Fri, 23 May 2008 10:53:08 GMT

Unfortunately a number of Linking Open Data (LOD) community / Linked Data tribe members (myself included) aren't at the Semantic Web Technologies conference in San Jose (we are in a busy period for Semantic Web Technology related Conferences). But all isn't lost as Ivan Herman (W3C Semantic Web Activity Lead) , LOD member, and SWEO colleague has carried the banner with aplomb.

Ivan's presentation titled: State of the Semantic Web, is a must view for those who need a quick update on where things are re. the Semantic Web in general.

I also liked the fact that in proper "Lead by example" manner, his presentation isn't PDF or PPT based, it's a Web Document :-)

Hint: as per usual, this post contains a Linked Data demo nugget. This time around, it's in the form of a shared calendar covering a large number of Semantic Web Technology events. All I had to do was subscribe to a number of WebDAV accessible iCal files from my Calendar Data Space and the platform did the rest i.e. produce Linked Data Objects for events associated with a plethora of conferences.

If you assimilate Ivan's presentation properly, you will note I've just generated, and shared, a large number of URIs covering a range of conference events. Thus, you can extend my contributions (thereby enriching the GGG) by simply associating additional data from your Linked Data Space with mine. All you have to do is use my calendar data objects URIs in your statements.

Web 1.0, 2.0, and 3.0 (Yet Again)

Mon, 15 Sep 2008 17:48:15 GMT

If your Web presence doesn't extend beyond (X)HTML web pages, you are only participating in Web usage Dimension 1.0.

If your Web presence goes beyond (X)HTML pages, via the addition of REST or SOAP based Web Services, then you re participating in Web usage dimension 2.0.

If you Web presence includes all of the above, with the addition of structured data interlinked with structured data across other points of presence on the Web, then you are participating in Web usage dimension 3.0 i.e., "Linked Data Web" or "Web of Data" or "Data Web".

BTW - If you've already done all of the above, and you have started building intelligent agents that exploit the aforementioned structured interlinked data substrate, then you are already in Web usage dimension 4.0.

Web usage pattern evolution

Business Value of Linked Data (Enterprise Angle)?

Thu, 11 Sep 2008 19:52:48 GMT

All enterprises run IS/MIS/EIS systems that are supposed to enable optimized exploitation of data, information, and knowledge. Unfortunately, applications, services (SOAP or REST), database engines, middleware, operating systems, programming languages, development frameworks, network protocols, network topologies, or some other piece of infrastructure, eventually lay claim (possessively) to the data.

Courtesy of Linked Data, we are now able to extend the "document to document" linking mechanism of the Web (Hypertext Linking) to more granular "entity to entity" level linking. And in doing so, we have a layer of abstraction that in one swoop alleviates all of the infrastructure oriented data access impediments of yore. I know this sounds simplistic, but be rest assured, imbibing Linked Data's value proposition is really just that simple, once you engage solutions (e.g. Virtuoso) that enable you to deploy Linked Data across your enterprise.

Example:

Microsoft ACCESS, SQL Server, and Virtuoso all use the Northwind SQL DB Schema as the basis of the demonstration database shipped with each DBMS product. This schema is comprised of common IS/MIS entities that include: Customers, Contacts, Orders, Products, Employees etc.

What we all really want to do as data, information, and knowledge consumers and/or dispatchers, is be no more than a single "mouse click" away from relevant data/information/knowledge data access and/or exploration. Even better (but not always so obvious), we also want anyone in our network (company, division, department, cube-cluster) to inherit these data access efficiencies.

In this example, the Web Page about the Customer "ALKI" provides me with a myriad of exploration and data access paths e.g., when I click on the foaf:primarytopic property value link.

This simple example, via a single Web Page, should put to rest any doubts about the utility of Linked Data. Of course this is an old demo, but this time around the UI is minimalist as my prior attempts skipped a few steps i.e., starting from within a Linked Data explorer/browser.

Important note: I haven't exported SQL into an RDF data warehouse, I am converting the SQL into RDF Linked Data on the fly which has two fundamental benefits:

No vulnerability to changes in the source DBMS
Superior performance over the RDF warehouse since the source schema is SQL based and I can leverage the optimization of the underlying SQL engine when translating between SPARQL and SQL.

Enjoy!

Politics, Old Media, and Linked Data

Mon, 07 Jan 2008 17:22:15 GMT

According to current media:

Senator Barack Obama is a beacon of change within the democratic party while Senator Hillary Clinton is status quo.

According to the data in the GovtTrack.us data space:

Senator Barack Obama is a rank-and-file Democrat according to GovTrack's analysis of his track record in congress. Whereas, Senator Hillary Clinton is a radical democrat, according to the same Govt. Track analysis of her track record in congress.

Who do we believe? The GovtTrack.us performance data, old media pundits, or postulations of the candidates? GovtTrack.us is a new approach to candidate vetting. It provides data in traditional Document Web and Linked Data Web forms, placing analytic power in the hands of the citizen.

Here are insights into the track records of Senators Hillary Clinton and Barack Obama via the Zitgist Linked Data Viewer:

Note: I am not aligned to any political party or candidate, this is just a demonstration of Linked Data that has a high degree of poignancy relative to US primary elections etc..

Your Personal Edition of DBpedia in the Clouds

Tue, 25 Nov 2008 23:55:55 GMT

We are just about done with an end-to-end workflow pattern that enables reconstitution of DBpedia 3.2 instances in the Clouds courtesy of Virtuoso and EC2.

Basically this is how it works.

Instantiate a Virtuoso EC2 AMI (paid variety)
Install the special EC2 extensions (ec2ext_dav.vad) VAD via the Conductor UI or iSQL
Restore the Virtuoso+DBpedia backup from our S3 bucket
After approx. 1 hr, you will have a complete DBpedia replica in your own data space on the Linked Data Web.

DBpedia replica implies:

SPARQL Endpoint
Linked Data Viewer Pages (as you see in the public DBpedia instance)
All requisite re-write rules for URI de-referencing and attribution (i.e., low cost triples that links back to main DBpedia using terms from our little Attribution Ontology)
All the inference rules for UMBEL, YAGO, OpenCYC, and DBpedia-OWL data dictionaries
All Full Text Indexes
All Bitmap Indexes.

Tomorrow is the official go live day (due to last minute price changes), but you can instantiate a paid Virtuoso AMI starting now :-)

To be continued...

FOAF-ing Linked Data is quite SIOC-ing

Fri, 01 Feb 2008 23:20:34 GMT

The title of this post is a "Tongue in cheek" expression of euphoria now that I have FOAF and SIOC (pronounced SHOCK) based data spaces exposed via my FOAF and my SIOC information resource (RDF files) URIs.

If you want to explore who I know, what I read, and what I've tagged (amongst other things), all you have to do is:

Beam a SPARQL query down my data space URIs which expose FOAF or SIOC based interconnected Linked Data graphs.
Walkthrough using an RDF Browser until you reach a beachhead and then beam your SPARQL from there (remember you only need the URI of the RDF Data Source, and while in my Data Space every data item has a proper URI).

Some Tools that help you comprehend what I am saying:

Browsers

Query Tools

SPARQL Demo

iSPARQL QBE

The Trouble with Labels

Tue, 16 Sep 2008 14:07:49 GMT

Unfortunately our fixation with "Labels" and the artificial link that exist between "Labels" and so-called "first mover advantage" continue to impede our progress to clarity about matters such as a fully functional Web of interlinked data.

A while back I watched Kevin Kelly's 5,000 days presentation at TED. During the presentation, I kept on scratching my head, wondering why phrases like "Linked Data", "Semantic Web", "Web of Data", "Data Web" where so unnaturally disconnected from his session narrative.

Yesterday I watched IMINDI's TechCrunch 50 presentation, and once again I saw the aforementioned pattern repeat itself. This time around, the poor founders of this "Linked Data Web" oriented company (which is what they are in reality) took a totally undeserved pasting from a bunch of panelist incapable of seeing beyond today (Web 2.0) and yesterday (initial Web bootstrap).

Anyway, thanks to the Web, this post will make a small contribution towards re-connecting the missing phrases to these "Linked Data Web" presentations.

Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5)

Wed, 19 Jan 2011 15:43:35 GMT

What is SPARQL?

A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).

SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.

Why is it important?

Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.

Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.

How do I use it, generally?

Locate a SPARQL endpoint (DBpedia, LOD Cloud Cache, Data.Gov, URIBurner, others), or;
Install a SPARQL compliant database server (quad or triple store) on your desktop, workgroup server, data center, or cloud (e.g., Amazon EC2 AMI)
Start the database server
Execute SPARQL Queries via the SPARQL endpoint.

How do I use SPARQL with Virtuoso?

What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:

Software Download and Installation
Data Loading from Data Sources exposed at Network Addresses (e.g. HTTP URLs) using very simple methods
Actual SPARQL query execution via SPARQL endpoint.

Installation Steps

Download Virtuoso Open Source or Virtuoso Commercial Editions
Run installer (if using Commercial edition of Windows Open Source Edition, otherwise follow build guide)
Follow post-installation guide and verify installation by typing in the command: virtuoso -? (if this fails check you've followed installation and setup steps, then verify environment variables have been set)
Start the Virtuoso server using the command: virtuoso-start.sh
Verify you have a connection to the Virtuoso Server via the command: isql localhost (assuming you're using default DB settings) or the command: isql localhost:1112 (assuming demo database) or goto your browser and type in: http://:[port]/conductor (e.g. http://localhost:8889/conductor for default DB or http://localhost:8890/conductor if using Demo DB)
Go to SPARQL endpoint which is typically -- http://:[port]/sparql
Run a quick sample query (since the database always has system data in place): select distinct * where {?s ?p ?o} limit 50 .

Troubleshooting

Ensure environment settings are set and functional -- if using Mac OS X or Windows, so you don't have to worry about this, just start and stop your Virtuoso server using native OS services applets
If using the Open Source Edition, follow the getting started guide -- it covers PATH and startup directory location re. starting and stopping Virtuoso servers.
Sponging (HTTP GETs against external Data Sources) within SPARQL queries is disabled by default. You can enable this feature by assigning "SPARQL_SPONGE" privileges to user "SPARQL". Note, more sophisticated security exists via WebID based ACLs.

Data Loading Steps

Identify an RDF based structured data source of interest -- a file that contains 3-tuple / triples available at an address on a public or private HTTP based network
Determine the Address (URL) of the RDF data source
Go to your Virtuoso SPARQL endpoint and type in the following SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM WHERE {?s ?p ?o}
All the triples in the RDF resource (data source accessed via URL) will be loaded into the Virtuoso Quad Store (using RDF Data Source URL as the internal quad store Named Graph IRI) as part of the SPARQL query processing pipeline.

Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:

Transformation of data from non RDF data sources (file content, hypermedia resources, web services output etc..) into RDF based 3-tuples (triples)
Cache Invalidation Scheme Construction -- thus, subsequent queries (without the define get:soft "replace" pragma will not be required bar when you forcefully want to override cache).
If you have very large data sources like DBpedia etc. from CKAN, simply use our bulk loader .

SPARQL Endpoint Discovery

Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.

Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:

dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for services instances
dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results in Zone File format

Using HTTP from Ruby -- you can just make SPARQL Protocol URLs re. SPARQL
Using SPARQL Endpoints via Ruby -- Ruby example using DBpedia endpoint
Interactive SPARQL Query By Example (QBE) tool -- provides a graphical user interface (as is common in SQL realm re. query building against RDBMS engines) that works with any SPARQL endpoint
Other methods of loading RDF data into Virtuoso
Virtuoso Sponger -- architecture and how it turns a wide variety of non RDF data sources into SPARQL accessible data
Using OpenLink Data Explorer (ODE) to populate Virtuoso -- locate a resource of interest; click on a bookmarklet or use context menus (if using ODE extensions for Firefox, Safari, or Chrome); and you'll have SPARQL accessible data automatically inserted into your Virtuoso instance.
W3C's SPARQLing Data Access Ingenuity -- an older generic SPARQL introduction post
Collection of SPARQL Query Examples -- GoodRelations (Product Offers), FOAF (Profiles), SIOC (Data Spaces -- Blogs, Wikis, Bookmarks, Feed Collections, Photo Galleries, Briefcase/DropBox, AddressBook, Calendars, Discussion Forums)
Collection of Live SPARQL Queries against LOD Cloud Cache -- simple and advanced queries.

Response to: Whole Data Post (Update 3)

Fri, 15 Aug 2008 22:31:48 GMT

This post is in response to Glenn McDonald's post titled: Whole Data, where he highlights a number of issues relating to "Semantic Web" marketing communications and overall messaging, from his perspective.

By coincidence, Glenn and I presented at this month's Cambridge Semantic Web Gathering.

I've provided a dump of Glenn's issues and my responses below:

Issue - RDF

Ingenious data decomposition idea, but:
too low-level; the assembly language of data, where we need Java or Ruby
"resource" is not the issue; there's no such thing as "metadata", it's all data; "meta" is a perspective
lists need to be effortless, not painful and obscure
nodes need to be represented, not just implied; they need types and literals in a more pervasive, integrated way.

Response:

RDF is a Graph based Data Model it stands for Resource Description Framework. The Metadata data angle comes from it's Meta Content Framework (MCF) origins. You can express and serialize data based on the RDF Data Model using: Turtle, N3, TriX, N-Triples, and RDF/XML.

Issue - SPARQL (and Freebase's MQL)

These are just appeasement:
- old query paradigm: fishing in dark water with superstitiously tied lures; only works well in carefully stocked lakes
- we don't ask questions by defining answer shapes and then hoping they're dredged up whole.

Response:

SPARQL, MQL, and Entity-SQL are Graph Model oriented Query Languages. Query Languages always accompany Database Engines. SQL is the Relational Model equivalent.

Issue - Linked Data

Noble attempt to ground the abstract, but:
- URI dereferencing/namespace/open-world issues focus too much technical attention on cross-source cases where the human issues dwarf the technical ones anyway
- FOAF query over the people in this room? forget it.
- link asymmetry doesn't scale
- identity doesn't scale
- generating RDF from non-graph sources: more appeasement, right where the win from actually converting could be biggest!

Response:

Innovative use of HTTP to deliver "Data Access by Reference" to the Linked Data Web.

When you have a Data Model, Database Engine, and Query Language, the next thing you need is a Data Access mechanism that provides "Data Access by Reference". ODBC and JDBC (amongst others) provide "Data Access by Reference" via Data Source Names. Linked Data is about the same thing (URIs are Data Source Names) with the following differences:

Naming is scoped to the entity level rather than container level
HTTP's use within the data source naming scheme expands the referencability of the Named Entity Descriptions beyond traditional confines such as applications, operating systems, and database engines.

Issue - Giant Global Graph

Hugely motivating and powerful idea, worthy of a superhero (Graphius!), but:
- giant and global parts are too hard, and starting global makes every problem harder
- local projects become unmanageable in global context (Cyc, Freebase data-modeling lists...). And my thus my plea, again. Forget "semantic" and "web", let's fix the database tech first:
- node/arc data-model, path-based exploratory query-model
- data-graph applications built easily on top of this common model; building them has to be easy, because if it's hard, they'll be bad
- given good database tech, good web data-publishing tech will be trivial!
- given good tools for graphs, the problems of uniting them will be only as hard as they have to be.

Response:

Giant Global Graph is just another moniker for a "Web of Linked Data" or "Linked Data Web".

Multi-Model Database technology that meshes the best of the Graph & Relational Models exist. In a nutshell, this is what Virtuoso is all about and it's existed for a very long time :-)

Virtuoso is also a Virtual DBMS engine (so you can see Heterogeneous Relational Data via Graph Model Context Lenses). Naturally, it is also a Linked Data Deployment platform (or Linked Data Sever).

The issue isn't the "Semantic Web" moniker per se., it's about how Linked Data (foundation layer of Semantic Web) gets introduced to users. As I said during the MIT Gathering: "The Web is experienced via Web Browsers primarily, so any enhancement to the Web must be exposed via traditional Web Browsers", which is why we've opted to simply add "View Linked Data Sources" to the existing set of common Browser options that includes:

View page in rendered form (default)
View page source (i.e., how you see the markup behind the page)

By exposing the Linked Data Web option as described above, you enable the Web user to knowingly transition from the traditional Rendered (X)HTML page view to the Linked Data View (i.e., structured data behind the page). This simple "User Interaction" tweak makes the notion of exploiting a Structured Web becomes somewhat clearer.

The Linked Data Web isn't a panacea. It's just an addition to the existing Web that enrichens the things you can do with the Web. It's predominance, like any application feature, will be subject to the degrees to which it delivers tangible value or matrializes internal and external opportunity costs.

Note: The Web isn't ubiquitous today becuase all it's users groked HTML Markup. It's ubquitity is a function of opportunity costs: there simply came a point in the Web boostrap when nobody could afford the opportunity costs associated with being off the Web. The same thing will play out with Linked Data and the broader Semantic Web vision.

Links:

Linked Data Journey part of my Linked Data Planet Presentation Remix(from slides 15 to 22 - which include bits from TimBL's presentation)
OpenLink Data Explorer
OpenLink Data Explorer Screenshots and examples.

Contd: Why we need Linked Data

Tue, 26 Feb 2008 13:16:43 GMT

Increasingly, I am encountering commentary from the ReadWriteWeb data space that highlights critical problems solved by a Linked Data Web. Unfortunately, most of the time, there is a disconnect between the problem and the solution. By this I mean: technology in the Semantic Web realm isn't seen as the solution.

A while back, I wrote a post titled:Why we need Linked Data. The aim of the post was to bring attention to the implications of exponential growth of User Generated Content (typically, semi-structured and unstructured data) on the Web. The growth in question is occurring within a fixed data & information processing timeframe (i.e. there will always be 24hrs in a day), which sets the stage for Information Overload as expressed in a recent post from ReadWriteWeb titled: Visualizing Social Media Fatigue.

The emerging "Web of Linked Data" augments the current "Web of Linked Documents", by providing a structured data corpus partitioned by containers I prefer to call: Data Spaces. These spaces enable Linked Data aware solutions to deliver immense value such as, complex data graph traversal, starting from document beachheads, that expose relevant data within a faction of the time it would take to achieve the same thing using traditional document web methods such as full text search patterns, scraping, and mashing etc.

Remember, our DNA based data & information system far exceeds that of any inorganic system when it comes to reasoning, but it remains immensely incapable of accurately and efficiently processing huge volumes of data & information -- irrespective of data model.

The Idea behind the Semantic Web has always been about an evolution of the Web into a structured data collective comprised of interlinked Data items and Data Containers (Data Spaces). Of course we can argue forever about the Semantics of the solution (ironically), but we can't shirk away from the impending challenges that "Information Overload" is about to unleash on our limited processing time and capabilities.

For those looking for a so called "killer application" for the Semantic Web, I would urge you to align this quest with the "Killer Problem" of our times, because when you do so you will that all routes lead to: Linked Data that leverages existing Web Architecture.

Once you understand the problem, you will hopefully understand that we all need some kind of "Data Junction Box" that provides a "Data Access Focal Point" for all of the data we splatter across the net as we sign up for the next greatest and latest Web X.X hosted service, or as we work on a daily basis with a variety of tools within enterprise Intranets.

BTW - these "Data Junction Boxes" will also need to be unobtrusively bound to our individual Identities.

Zero-based Cognition (Difference between Humans & Machines)

Fri, 17 Oct 2008 11:23:42 GMT

Human beings, courtesy of the gift of cognition, are capable of creating reusable data, information, knowledge from simple or complex observations in an abstract realm. A machine on the other hand can only discover and infere based on a substrate of structured and interlinked data, information, or knowledge in a concrete human created realm e.g., a Web of Linked Data.

As is quite common these days, Yihong Ding has written another great piece titled: A New Take on Internet-Based AI, that delves into this specific matter. Yihong expresses an vital insight as excerpted below:

"Artificial intelligence is supposed to let machines do things for people. The risk is that we may rely too much on them. Two months ago, for instance, writer Nicolas Carr asked whether Google is making us stupid. In my recent blog series "The Age of Google," I extended Carr’s discussion. Due to the success of Google, we are relying more on objective search than on active thinking to answer questions. In consequence, the more Google has advanced its service, the farther Google users have drifted from active thinking."

"But at least one form of human thinking cannot be replaced by machines. I am not talking about inference/discovery (which machines may be capable of doing) but about creation/generation-from-nothing (which I don’t believe machines may ever do)."

I tend to describe our ability to create/generate-from-nothing as "Zero-based Cognition", which is initially about "thought" and the eventually about "speed of thought dissemination" and "global thought meshing".

In a peculiar sense, Zero-based cognition is analogous to Zero-based budgeting from the accounting realm :-)

Data Spaces, User Identity, and Data Portability

Mon, 04 Feb 2008 15:06:43 GMT

If your Data Space was a Solar System, your personal Identity would be the Sun. I say this because your Identity is the conduit (access mechanism) to your data graph; the data you generate from various application interaction activities such as: Blogging, Bookmarking, Photo Sharing, Feed Aggregation etc.

Daniel Lewis has just published a nice blog post titled: The Data Space Philosophy, that puts the underlying Data Space concept in perspective.

The Linked Data Web is a Giant Global Graph of Data Spaces (meshes of data and identity exposed by graphs connecting data and identity)

Data Portability ultimately depends on platforms that provide unobtrusive generation of Linked Data (for data referencing) alongside support for a plethora of industry standard data formats -- which is what OpenLink Data Spaces has been about for a very long time :-)

Identity - Philosophy

Identity - Mathematics

Identity - Object Oriented Programming

Linked Data Rules Simplified

Sat, 27 Jun 2009 03:18:24 GMT

As a compliment to the most recent Linked Data Design Issues note by TimBL, I would like to add this subtle tweak to the enumerated rules:

Identify or Name things using HTTP URIs
Describe things using the RDF metadata model
Increase link data mesh density on the Web by linking (referring) to things in other data spaces using their HTTP URIs.

If you perform the steps above, on any HTTP network (e.g. World Wide Web), you implicitly bind the Names/Identifiers of things to negotiable representations of their metadata (description) bearing documents.

Also note, you can create and deploy the resulting RDF metadata using any of the following approaches:

RDFa within (X)HTML documents
N3, Turtle, TriX, RDF/XML etc. based documents
Programmatically generated variants of 1&2.

Linked Data Web Collaborators: Introducing Structured Dynamics

Sat, 03 Jan 2009 04:27:26 GMT

As indicated in posts from Fred Giasson and Mike Bergman, the Zitgist incubation effort that contributed to the delivery of vital Linked Data Web infrastructure components such as TalkDigger (discourse discovery and participation), PingTheSemanticWeb (ground-zero data source for most Semantic Web search engines), UMBEL (binding layer for Upper and Lower Ontologies amongst other things), Music Ontology (enabling meaningful description of Music), and Bibliographic Ontology (enabling meaningful description of Bibliographic content), is now ready to continue its business development and technology growth as a going concern known as Structured Dynamics.

With great joy and pride, I wish Structured Dynamics all the success they deserve. Naturally, the collaborations and close relationship between OpenLink Software and its latest technology partner will continue -- especially as we collectively work towards a more comprehendible and pragmatic Web of Linked Data for developers (across Web 1.0, 2.0, 3.0, and beyond), end-users (information- and knowledge-workers), and entrepreneurs (driven by quality and tangible value contribution).

Dynamic Linked Data Constellation

Fri, 17 Oct 2008 14:45:53 GMT

Now that the virtues of dynamic generation of RDF based Linked Data are becoming clearer, I guess it's time to unveil the Virtuoso Sponger driven Dynamic Linked Data constellation diagram.

Our diagram depicts the myriad of data sources from which RDF Linked Data is generated "on the fly" via our data source specific RDF-zation cartridges/drivers. It also unveils how the sponger leverages the Linked Data constellations of UMBEL, DBpedia, Bio2Rdf, and others for lookups.

Recent Data Portability, Linked Data, and Open Data Access Podcasts

Wed, 09 Apr 2008 17:22:23 GMT

I just listen to, and very much enjoyed (lots of chuckling) Dave Beckett's podcast interview on the Talis podcast network. Clearly Dave has a bent for funny project names etc.. He also introduced "Inter-Webs" (Web Data Spaces in my parlance) towards the end of the interview.

Trent Adams, Steve Greenberg, and I, also had a podcast chat about Web Data Portability and Accessibility (Linked Data). I also remixed Jon Breslin's "Data Portability & Me" presentation to produce: "Data Accessibility & Me".

The podcasts interviews and presentations provide contributions to the broadening discourse about Open Data Access / Connectivity on the Web.

Connecting Freebase, Wikipedia, DBpedia, and other Linked Data Spaces (Update 1)

Fri, 29 Aug 2008 18:57:02 GMT

Here are some demonstrations of (X)HTML based representations of resource descriptions from Freebase, DBpedia, BBC Music Beta, CrunchBase, OpenCyc, and UMBEL etc. What is really being demonstrated here is the use of Proxy / Wrapper URIs to expose powerful links across entities distilled from their container documents (or information resources). Of course, you see exactly the same technique in action whenever you visit DBpedia pages. Again, we are moving the concept of Linking from the document to document level, down to the document-entity to document-entity level. The evolution of network link focal points is illustrated in slides 15 to 22 of my Linked Data Planet presentation remix.

Live Examples

Abraham Lincoln - Freebase (note: link from Freebase to DBpedia via Wikipedia)
Amazon - CrunchBase (note: links from CruncBase to DBpedia)
Cold Play - BBC Music Beta (note: links to Musicbrainz)
Linked Data Planet Presentation - Also a Slidy, Bibo Ontology, and RDFa usage example
Music - OpenCyc Concept which exposes a Hyperdata link to its equivalent UMBEL Subject Concept and back

Virtuoso's RDFization Middleware & Linked Data Deployment Architecture Diagram

Note: You can substitute my examples using any Web resource URL. The underlying RDFization and Linked Data deployment functionality of the Virtuoso demo instance takes care of everything else. Also note that the HTML based resource description page capability is now deployed as part of the Virtuoso Sponger component of every Virtuoso installation starting with from version 5.0.8.

Virtuoso+Neurocommons EC2 AMI released! (Update - 1)

Thu, 11 Dec 2008 03:48:49 GMT

What is Neurocommons?

Excerpted from the project home page:

The NeuroCommons project seeks to make all scientific research materials - research articles, annotations, data, physical materials - as available and as useable as they can be. We do this by both fostering practices that render information in a form that promotes uniform access by computational agents - sometimes called "interoperability". We want knowledge sources to combine meaningfully, enabling semantically precise queries that span multiple information sources.

In a nutshell, a great project that makes practical use of Linked Data Web technology in the areas of computational biology and neuroscience.

What is Virtuoso and Neurocommons AMI for EC2?

A pre-installed and fully tuned edition of Virtuoso that includes a fully configured Neurocommons Knowledgebase (in RDF Linked Data form) on Amazon's EC2 Cloud platform.

Benefits?

Generally, it provides a no-hassles mechanism for instantiating personal-, organization-, or service-specific instances of a very powerful research knowledgebase within approximately 1.15 hours compared to a lengthy rebuild from RDF source data alternative that takes 14 hours or more, depending on machine hardware configuration and host operating system resources.

Features:

Neurocommons public instance functionality replica (re. RDF and (X)HTML resource description representations & SPARQL endpoint)
Local URI de-referencing (so no contention with public endpoint) as part of the RDF Linked Data Deployment
Fully tuned Virtuoso instance for neurocommons knowledgebase.

Installation Guide

Simply read the Virtuoso+NeuroCommons EC2 AMI installation guide.

Science Commons Video

What is Linked Data oriented RDF-ization?

Tue, 07 Oct 2008 21:35:24 GMT

RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.

If the RDF generated, results in an entity-to-entity level network (graph) in which each entity is endowed with a de-referencable HTTP based ID (a URI), we end up with an enhancement to the Web that adds Hyperdata linking across extracted entities, to the existing Hypertext based Web of linked documents (pages, images, and other information resource types). Thus, I can use the same URL linking mechanism to reference a broader range of "Things" i.e., documents, things that documents are about, or things loosely associated with documents.

The Virtuoso Sponger is an example of an RDF Middleware solution from OpenLink Software. It's an in-built component of the Virtuoso Universal Server, and deployable in many forms e.g., Software as Service (SaaS) or traditional software installation. It delivers RDF-ization services via a collection of Web information resource specific Cartridges/Providers/Drivers covering Wikipedia, Freebase, CrunchBase, WikiCompany, OpenLibrary, Digg, eBay, Amazon, RSS/Atom/OPML feed sources, XBRL, and many more.

RDF-ization alone doesn't ensure valuable RDF based Linked Data on the Web. The process of producing RDF Linked Data is ultimately about the art of effectively describing resources with an eye for context.

RDF-ization Processing Steps

Entity Extraction
Vocabulary/Schema/Ontology (Data Dictionary) mapping
HTTP based Proxy URI generation
Linked Data Cloud Lookups (e.g., perform UMBEL lookup to add "isAbout" fidelity to graph and then lookup DBpedia and other LOD instance data enclaves for Identical individuals and connect via "owl:sameAs")
RDF Linked Data Graph projection that uses the description of the container information resource to expose the URIs of the distilled entities.

The animation that follows illustrates the process (5,000 feet view), from grabbing resources via HTTP GET, to injecting RDF Linked Data back into the Web cloud:

Note: the Shredder is a Generic Cartridge, so you would have one of these per data source type (information resource type).

Getting The Linked Data Value Pyramid Layers Right (Update #2)

Sun, 31 Jan 2010 22:47:04 GMT

One of the real problems that pervades all routes to Linked Data value prop. incomprehension stems from the layering of its value pyramid; especially when communicating with -initially detached- end-users.

Note to Web Programmers: Linked Data is about Data (Wine) and not about Code (Fish). Thus, it isn't a "programmer only zone", far from it. More than anything else, its inherently inclusive and spreads its participation net widely across: Data Architects, Data Integrators, Power Users, Knowledge Workers, Information Workers, Data Analysts, etc.. Basically, everyone that can "click on a link" is invited to this particular party; remember, it is about "Linked Data" not "Linked Code", after all. :-)

Problematic Value Pyramid Layering

Here is an example of a Linked Data value pyramid that I am stumbling across --with some frequency-- these days (note: 1 being the pyramid apex):

SPARQL Queries
RDF Data Stores
RDF Data Sets
HTTP scheme URIs

Basically, Linked Data deployment (assigning de-referencable HTTP URIs to DBMS records, their attributes, and attribute values [optionally] ) is occurring last. Even worse, this happens in the context of Linked Open Data oriented endeavors, resulting in nothing but confusion or inadvertent perpetuation of the overarching pragmatically challenged "Semantic Web" stereotype.

As you can imagine, hitting SPARQL as your introduction to Linked Data is akin to hitting SQL as your introduction to Relational Database Technology, neither is an elevator-style value prop. relay mechanism.

In the relational realm, killer demos always started with desktop productivity tools (spreadsheets, report-writers, SQL QBE tools etc.) accessing, relational data sources en route to unveiling the "Productivity" and "Agility" value prop. that such binding delivered i.e., the desktop application (clients) and the databases (servers) are distinct, but operating in a mutually beneficial manner to all, courtesy of a data access standards such as ODBC (Open Database Connectivity).

In the Linked Data realm, learning to embrace and extend best practices from the relational dbms realm remains a challenge, a lot of this has to do with hangovers from a misguided perception that RDF databases will somehow completely replace RDBMS engines, rather than compliment them. Thus, you have a counter productive variant of NIH (Not Invented Here) in play, taking us to the dreaded realm of: Break the Pot and You Own It (exemplified by the 11+ year Semantic Web Project comprehension and appreciation odyssey).

From my vantage point, here is how I believe the Linked Data value pyramid should be layered, especially when communicating the essential value prop.:

HTTP URLs -- LINKs to documents (Reports) that users already appreciate, across the public Web and/or Intranets
HTTP URIs -- typically not visually distinguishable from the URLs, so use the Data exposed by de-referencing a URL to show how each Data Item (Entity or Object) is uniquely identified by a Generic HTTP URI, and how clicking on the said URIs leads to more structured metadata bearing documents available in a variety of data representation formats, thereby enabling flexible data presentation (e.g., smarter HTML pages)
SPARQL -- when a user appreciates the data representation and presentation dexterity of a Generic HTTP URI, they will be more inclined to drill down an additional layer to unravel how HTTP URIs mechanically deliver such flexibility
RDF Data Stores -- at this stage the user is now interested data sources behind the Generic HTTP URIs, courtesy of natural desire to tweak the data presented in the report; thus, you now have an engaged user ready to absorb the "How Generic HTTP URIs Pull This Off" message
RDF Data Sets -- while attempting to make or tweak HTTP URIs, users become curious about the actual data loaded into the RDF Data Store, which is where data sets used to create powerful Lookup Data Spaces (e.g., DBpedia) come into play such as those from the LOD constellation as exemplified by DBpedia (extractions from Wikipedia).

Getting The Linked Data Value Pyramid Layers Right (Update #2)

Mon, 01 Feb 2010 14:02:14 GMT

Problematic Value Pyramid Layering

Here is an example of a Linked Data value pyramid that I am stumbling across --with some frequency-- these days (note: 1 being the pyramid apex):

SPARQL Queries
RDF Data Stores
RDF Data Sets
HTTP scheme URIs

From my vantage point, here is how I believe the Linked Data value pyramid should be layered, especially when communicating the essential value prop.:

HTTP URLs -- LINKs to documents (Reports) that users already appreciate, across the public Web and/or Intranets
HTTP URIs -- typically not visually distinguishable from the URLs, so use the Data exposed by de-referencing a URL to show how each Data Item (Entity or Object) is uniquely identified by a Generic HTTP URI, and how clicking on the said URIs leads to more structured metadata bearing documents available in a variety of data representation formats, thereby enabling flexible data presentation (e.g., smarter HTML pages)
SPARQL -- when a user appreciates the data representation and presentation dexterity of a Generic HTTP URI, they will be more inclined to drill down an additional layer to unravel how HTTP URIs mechanically deliver such flexibility
RDF Data Stores -- at this stage the user is now interested data sources behind the Generic HTTP URIs, courtesy of natural desire to tweak the data presented in the report; thus, you now have an engaged user ready to absorb the "How Generic HTTP URIs Pull This Off" message
RDF Data Sets -- while attempting to make or tweak HTTP URIs, users become curious about the actual data loaded into the RDF Data Store, which is where data sets used to create powerful Lookup Data Spaces (e.g., DBpedia) come into play such as those from the LOD constellation as exemplified by DBpedia (extractions from Wikipedia).

Virtuoso Universal Server 5.0.4 Release Details

Tue, 05 Feb 2008 01:30:43 GMT

We've just released version 5.0.4 of the Virtuoso Universal Server platform for SQL, XML, and RDF. The new release includes the following enhancements:

Web Server:

- HTTP 1.1 compliant Transparent content-negotiation in URL-rewrite rules for Linked Data Deployment.

RDF Data Management:

- New providers for the Jena, Sesame and Redland frameworks

- support for SPARQL INSERT and UPDATE via HTTP POST

- New SPARQL-BI extenstions that make Business Intelligence feasible via SPARQL

- new "rdf_sink" folder for handling HTTP PUTs into WebDAV that automatically sync with Quad Store.

- There are new Sponger (RDFizer) cartridges that map Amazon book-search results to the Biliographic Ontology, supports production of Linked Data from OAI, XBRL, and Yahoo finance data sources.

- HTTPS protocol support added to Sponger

- performance optimizations for SPARQL `DESCRIBE' and `CONSTRUCT', alongside general performance enhancements for RDF data set loading.

Core DBMS Engine:

- PHP hosting a module re-implemented as a Virtuoso plugin inline with otherlanguage hosting modules

- improved deadlock condtion management

- enhanced POP and FTP server side protocol implementations that allow larger data transfers.

Additional Information

DBpedia URI

Product Home Page

Wikipedia Page

Virtuoso 5.0.4 Press Release

The Future of the Desktop

Thu, 21 Aug 2008 19:59:25 GMT

Jason Kolb (who initially nudged me to chime in), and then ReadWriteWeb, and of course Nova's Twine about the topic, have collectively started an interesting discussion about Web.vNext (3.0 and beyond) under the heading: The Future of the Desktop.

My contribution to the developing discourse takes the form of a Q&A session. I've taken the questions posed and provided answers that express my particular points of view:

Q: Is the desktop of the future going to just be a web-hosted version of the same old-fashioned desktop metaphors we have today?

A: No, it's going to be a more Web Architecture aware and compliant variant exposed by appropriate metaphors.

Q: The desktop of the future is going to be a hosted web service

A: A vessel for exploiting the virtues of the Linked Data Web.

Q: The Browser is Going to Swallow Up the Desktop

A: Literally, of course not! Metaphorically, of course! And then the Browser metaphor will decomposes into function specific bits of Web interaction amenable to orchestration by its users.

Q: The focus of the desktop will shift from information to attention

A: No! Knowledge, Information, and Data sharing courtesy of Hyperdata & Hypertext Linking.

Q: Users are going to shift from acting as librarians to acting as daytraders

A: They were Librarians at Web 1.0, Journalist at Web 2.0, and Analysts in Web 3.0 (i.e, analyze structured and interlinked data), and CEOs in Web 4.0 (i.e. get Agents to do stuff intelligently en route to making decisions).

Q: The Webtop will be more social and will leverage and integrate collective intelligence

A: The Linked Data Web vessel will only require you to fill in your profile (once) and then serendipitous discovery and meshing of relevant data will simply happen (the serendipity quotient will grow in line with Linked Data Web density).

Q: The desktop of the future is going to have powerful semantic search and social search capabilities built-in

A: It is going to be able to "Find" rather than "Search" for stuff courtesy of the Linked Data Web.

Q: Interactive shared spaces will replace folders

A: Data Spaces and their URIs (Data Source Names) replace everything. You simply choose the exploration metaphor that best suits you space interaction needs.

Q: The Portable Desktop

A: Ubiquitous Desktop i.e. do the same thing (all answers above) on any device connected to the Web.

Q: The Smart Desktop

A: Vessels with access to Smart Data (Linked Data + Action driven Context sprinklings).

Q: Federated, open policies and permissions

A: More federation for sure, XMPP will become a lot more important, and OAuth will enable resurgence of the federated aspects of the Web and Internet.

Q: The personal cloud

A: Personal Data Spaces plugged into Clouds (Intranet, Extranet, Internet).

Q: The WebOS

A: An operating system endowed with traditional Database and Host Operating system functionality such as: RDF Data Model, SPARQL Query Language, URI based Pointer mechanism, and HTTP based message Bus.

Q: Who is most likely to own the future desktop?

A: You! And all you need is a URI (an ID or Data Source Name for "Entity You") and a Profile Page (a place where "Entity You" is Describe by You).

One Last Thing

You can get a feel for the future desktop by downloading and then installing the OpenLink Data Explorer plugin for Firefox, which allows you to switch viewing modes between Web Page and Linked Data behind the page. :-)

OpenLink Data Spaces
Get Yourself a URI in 5 Minutes or Less
Linked Data Spaces & Data Portability
Linked Data Conference Keynote (RDFa based remix edition that includes vital bits from TimBL's Linked Data Planet presentation).

Where Are All the RDF-based Semantic Web Applications?

Thu, 02 Oct 2008 19:27:41 GMT

In response to the "Semantic Web Technology" application classification scheme espoused by ReadWriteWeb (RWW), emphasized in the post titled: Where are all the RDF-based Semantic Web Apps?, here is my attempt to clarify and reintroduce what OpenLink Software offers (today) in relation to Semantic Web technology.

From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).

Virtuoso Universal Server functionality summary:

Generation of RDF Linked Data Views of SQL, XML, and Web Services in general
Deployment of RDF Linked Data
"On the Fly" generation of RDF Linked Data from Document Web information resources (i.e. distillation of entities from their containers e.g. Web pages) via Cartridges / Drivers
SPARQL query language support
SPARQL extensions that bring SPARQL closer to SQL e.g Aggregates, Update, Insert, Delete Named Graph support (i.e. use of logical names to partition RDF data within Virtuoso's multi-model dbms engine)
Inference Engine (currently in use re. DBpedia via Yago and UMBEL)
Host and exposes data from Drupal, Wordpress, MediaWiki, phpBB3 as RDF Linked Data via in-built support for PHP runtime
Available as an EC2 AMI
etc..

OpenLink Data Spaces functionality summary:

Simple mechanism for Linked Data Web enabling yourself by giving you an HTTP based User ID (a de-referencable URI) that is linked to a FOAF based Profile page and OpenID
Binds all your data sources (blogs, wikis, bookmarks, photos, calendar items etc. ) to your URI so can "Find" things by only remembering your URI
Makes your profile page and personal URI the focal point of Linked Data Web presence
Delivers Data Portability (using data access by value or data access by reference) across data silos (e.g. Web 2.0 style social networks)
Allows you make annotations about anything in your own Data Space(s) on the Web without exposure to RDF markup
A Briefcase feature that provides a WebDAV driven RDF Linked Data variant of functionality seen in Mac OS X Spotlight and WinFS with the addition of SPARQL compliance
Automatically generates RDFa in its (X)HTML pages
Blog, Wiki, WebDAV File Server, Shared Bookmarks, Calendar, and other applications that look and feel like Web 2.0 counterparts but emitt RDF Linked Data amongst a plethora of data exchange formats
Available as an EC2 AMI
etc..

OpenLink Ajax Toolkit functionality summary:

Provides binding to SQL, RDF, XML, and Web Services via Ajax Database Connectivity Layer (you only need an ODBC, JDBC, OLE-DB, ADO.NET, XMLA Driver, or Web Service on the backend for dynamic data access from Javascript)
All controls are Ajax Database Connectivity bound (widgets get their data from Ajax Database Connectivity data sources)
Bundled with Virtuoso and ODS installations.
etc.

OpenLink Data Explorer functionality summary

Distills entities associated with information resource style containers (e.g. Web Pages or files) as RDF Linked Data
Exposes the RDF based Linked Data graph associated with information resources (see the Linked Data behind Web pages)
Ubiquity commands for invoking the above
Available as a Hosted Service or Firefox Extension
Bundled with Virtuoso and ODS installations
etc.

Note:

Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)

Linked Data Planet Keynote (RDFa based remix edition)
On The Cusp: A Global Review of the Semantic Web Industry.

2008, Facebook Data Portability, and the Giant Global Graph of Linked Data

Mon, 07 Jan 2008 16:44:42 GMT

As 2007 came to a close I repeatedly mulled over the idea of putting together a usual "year in review" and a set of predictions for the coming year etc. Anyway, the more I pondered, the smaller the list became. While pondering (as 2008 rolled around), the Blogosphere was set ablaze with the Robert Scoble's announcement of his account suspension by Facebook. Of course, many chimed in expressing views either side of the ensuing debate: Who is right -- Scoble or Facebook. The more I assimilated the views expressed about this event, the more ironic I found the general discourse, for the following reasons:

Web 2.0 is fundamentally about Web Services as the prime vehicle for interactions across "points of Web presence"
Facebook is a Web 2.0 hosted service for social networking that provides Web Services APIs for accessing data in the Facebook data space. You have to do so "on the fly" within clearly defined constraints i.e you can interact with data across your social network via Facebook APIs, but you cannot cache the data (perform an export style dump of the data)
Facebook is a main driver of the term: "social graph", but their underlying data model is relational and the Web Services response (data you get back) doesn't return a data graph, instead it returns an tree (i.e XML)
Scoble's had a number of close encounters with Linked Data Web | Semantic Data Web | Web 3.0 aficionados in various forms throughout 2007, but still doesn't quite make the connection between Web Services APIs as part of a processing pipeline that includes structured data extraction from XML data en route to producing Data Graphs comprised of Data Objects (Entities) endowed with: Unique Identifiers, Classification or Categorization schemes, Attributes, and Relationships prescribed by one or more shared Data Dictionaries/Schemas/Ontologies
A global information bus that exposes a Linked Data mesh comprised of Data Objects, Object Attributes, and Object Relationships across "points of Web presence" is what TimBL described in 1998 (Semantic Web Roadmap) and more recently in 2007 (Giant Global Graph)
The Linked Data mesh (i.e Linked Data Web or GGG) is anchored by the use of HTTP to mint Location, Structure, and Value independent Object Identifiers called URIs or IRIs. In addition, the Linked Data Web is also equipped with a query language, protocol, and results serialization format for XML and JSON called: SPARQL.

So, unlike Scoble, I am able to make my Facebook Data portable without violating Facebook rules (no data caching outside Facebook realm) by doing the following:

Use an RDFizer for Facebook to convert XML response data from Facebook Web Services into RDF "on the fly" Ensure that my RDF is comprised of Object Identifiers that are HTTP based and thereby dereferencable (i.e. I can use SPARQL to unravel the Linked Data Graph in my Facebook data space)
The act of data dereferencing enables me to expose my Facebook Data as Linked Data associated with my Personal URI
This interaction only occurs via my data space and in all cases the interactions with data work via my RDFizer middleware (e.g the Virtuoso Sponger) that talks directly to Facebook Web Services.

In a nutshell, my Linked Data Space enables you to reference data in my data space via Object Identifiers (URIs), and some cases the Object IDs and Graphs are constructed on the fly via RDFization middleware.

Here are my URIs that provide different paths to my Facebook Data Space:

Personal URI

My Facebook Data Space

Linked Data Browser/Viewer

My Facebook Photo Gallery -- WWW2007 Photo Collection

Linked Data Browser/Viewer

To conclude, 2008 is clearly the inflection year during which we will final unshackle Data and Identity from the confines of "Web Data Silos" by leveraging the HTTP, SPARQL, and RDF induced virtues of Linked Data.

2008 and the Rise of Linked Data
Scoble Right, Wrong, and Beyond
Scoble interviewing TimBL (note to Scoble: re-watch your interview since he made some specific points about Linked Data and URIs that you need to grasp)
Prior Blog posts my this Blog Data Space that include the literal patterns: Scoble Semantic Web

7 Things Brought to You by HTTP-based Hypermedia

Mon, 08 Nov 2010 20:29:43 GMT

There are some very powerful benefits that accrue from the use of HTTP based Hypermedia. 7 that come to mind immediately include:

Structured & Platform Independent Enterprise Data Virtualization -- concrete conceptual level access and provisioning of abstract domain entities such as Customers, Orders, Employees, Products, Countries, Competitors etc.
Distributed Application State (REST) -- application state transitions via links
Structured Data Representation (Linked Data) -- whole data data representation via links
Structured Identity (WebID) -- verifiable distributed identity
Structured Profiles (FOAF) -- platform independent profiles for people and organizations
Articulation of Structured Value Propositions (GoodRelations) -- Product & Service Offers, Business Entities, Locations, Business Hours, etc.
Structured Collaboration Spaces (SIOC) -- Blogs, Wikis, File Sharing, Discussion Forums, Aggregated Feeds, Statuses, Photo Galleries, Polls etc.

Linked Data & Identity

Fri, 01 May 2009 16:25:49 GMT

A person, organization, place, idea, subject matter topic/heading, and other real world things possess "identity" -- that is, a constellation of characteristics that distinguish them from any other identity. Associated with this abstraction can be a label used as a reference, or "identifier". This is the distinction between a thing and the name of the thing.

section from IETF's Domain Keys spec. (paraphrased by me)
.

The Linked Data meme is based on the use of HTTP based URIs as reference / identifier labels associated with the "identity abstraction" referred to above. Thus, when you de-reference (request information about) an HTTP based URI you ultimately end up with a resource URL that exposes the "constellation of characteristics" mentioned above, in a representation negotiated at request time -- between an HTTP client and server e.g., (X)HTML, JSON, XML, RDF/XML, N3, Turtle, Trix, others :-)

What is the Linked Data meme About?
Simple Explanation of RDF & Linked Data Dynamics.
Handle -- Internet wide Identity Scheme and Resolution System

Virtuoso+DBpedia AMI for EC2 now Live!

Fri, 12 Dec 2008 16:22:27 GMT

What is Virtuoso+DBpedia AMI for EC2?

A pre-installed and fully tuned edition of Virtuoso that includes a fully configured DBpedia instance on Amazon's EC2 Cloud platform.

Benefits?

Generally, it provides a no hassles mechanism for instantiating personal, organization, or service specific instances of DBpedia within approximately 1.5 hours as opposed to a lengthy rebuild from RDF source data that takes between 8 - 22 hours depending on machine hardware configuration and host operating system resources.

From a Web Entrepreneur perspective it offers all of the generic benefits of a Virtuoso EC2 AMI plus the following:

Instant bootstrap of a dense Lookup Hub for Linked Data Web oriented solutions
No exposure to any of the complexities and nuances associated with deployment of dereferencable URIs (you have a DBpedia replica)
Predictable performance and scalability due localization of query processing (you aren't sharing the public DBpedia server with the rest of the world).

Features:

DBpedia public instance functionality replica (re. RDF and (X)HTML resource description representations & SPARQL endpoint)
Local URI de-referencing (so no contention with public endpoint) as part of the Linked Data Deployment
Fully tuned Virtuoso instance for DBpedia data set hosting.

How Do I Get Started?

Simply read the Virtuoso-DBpedia EC2 AMI installation guide.

Here are a few live examples of DBpedia resource URIs deployed and de-referencable via one of my EC2 based personal data spaces:

Linked Data
Entity-Attribute-Value (aka. Triples) Model
Hyperdata Linking (aka. Object Hyperlinking)
Barack Obama

Linked Data, Ubiquity Commands, and Resource Descriptions (Update 3)

Mon, 08 Sep 2008 13:00:51 GMT

Ubiquity from Mozilla Labs, provides an alternative entry point for experiencing the "Controller" aspect of the Web's natural compatibility with the MVC development pattern. As I've noted (in various posts) Web Services, as practiced by the REST oriented Web 2.0 community or SOAP oriented SOA community within the enterprise, is fundamentally about the ("Controller" aspect of MVC.

Ubiquity provides a commandline interface for direct invocation of Web Services. For instance, in our case, we can expose the Virtuoso's in-built RDF Middleware ("Sponger") and Linked Data deployment services via a single command of the form: describe-resource

To experience this neat addition to Firefox you need to do the following:

Download and install the Ubiquity Extension for Firefox
Subscribe to the OpenLink Command for Resource Description
Click on CTRL+Space (Windows / Linux) or Option+Space (Mac OS X)
Type in: describe-resource

How to unsubscribe

At the current time, you need to do this if you've installed commands using ubiquity 0.1.0 and seek to use newer versions of the same commands after upgrading to ubiquity 0.1.1.

To unsubscribe use type "about:ubiquity" into browser
Click on unsubscribe links associated with you command subscription list

Enjoy!

Crunchbase & Semantic Web Interview (Remix - Update 1)

Thu, 28 Aug 2008 00:35:15 GMT

After reading Bengee's interview with CrunchBase, I decided to knock up a quick interview remix as part of my usual attempt to add to the developing discourse.

CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?

Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).

CrunchBase: And what inspired you to create it?

Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).

Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.

Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.

CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?

Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.

CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?

Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.

CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?

Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).

SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.

CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?

Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).

Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.

CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?

Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.

Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:

Virtuoso 5.0.2 Released!

Mon, 08 Oct 2007 14:27:27 GMT

A new release of Virtuoso is now available in both Open Source and Commercial variants. The main features and Enhancements associated with this release include:

* 64-bit Integer Support

* RDF Sink Folders for WebDAV - enabling RDF Quad Store population by simply dropping RDF files into WebDAV or via HTTP (meaning you can use CURL as an RDF in put mechanism for instance)

* Additional Sponger Cartridges from Audio binary files (i.e ID3 tag extraction and Music Ontology mapping which exposes the fine details of music as RDF based Structured Data; one for the DJs & Remixers out there!)

* New Sponger Cartridges for Facebook, Freebase, Wikipedia, GRDDL, RDFa, eRDF and more

* Support for PHP 5.2 runtime hosting (Virtuoso is a bona fide deployment platform for: Wordpress, MediaWiki, phpBB, Drupal etc.)

RDF Linked Data

SQL-RDF Views

* Tutorial Application includes Linked Data style SQL-RDF Views for the Northwind SQL DBMS schema (which is the same as the standard Virtuoso demo atabase schema)

* SQL-RDF Views implementation of the TPC-D benchmark (Yes, we can run this grueling SQL benchmark via RDF views of SQL Data!)

OpenLink Data Spaces

OpenLink Ajax Toolkit

Download Lnks:

Open Source Edition

Commercial Edition

So, What Does "HREF" Stand For, Anyway

Thu, 10 Apr 2008 20:13:50 GMT

As per usual I am writing this post with the aim of killing a number of meme-birds with a single post in relation to the emerging Linked Data Web.

*On* the ubiquitous Web of "Linked Documents", HREF means (by definition and usage): Hypertext Reference to an HTTP accessible Data Object of Type: "Document" (an information resource). Of course we don't make the formal connection of Object Type when dealing with the Web on a daily basis, but whenever you encounter the "resource not found" condition notice the message: HTTP/1.0 404 Object Not Found, from the HTTP Server tasked with retrieving and returning the resource.

*In* the Web of "Linked Data", a complimentary addition to the current Web of "Linked Documents", HREF is used to reference Data Objects that are of a variety of "Types", not just "Documents". And the way this is achieved, is by using Data Object Identifiers (URIs / IRIs that are generated by the Linked Data deployment platform) in the strict sense i.e. Data Identity (URI) is separated from Data Address (URL). Thus, you can reference a Person Data Object (aka an instance of a Person Class) in your HREF and the HTTP Server returns a Description of the Data Object via a Document (again, an information resource). A document containing the Description of a Data Object typically contains HREFs to other Data Objects that expose the Attributes and Relationships of the initial Person Data Object, and it this collection of Data Objects that is technically called a "Graph" -- which is what RDF models.

What I describe above is basic stuff for anyone that's familiar with Object Database or Distributed Objects technology and concepts.

URI and URL confusion

The Linked Document Web is a collection of physical resources that traverse the Web Information Bus in palatable format i.e documents. Thus, Document Object Identity and Document Object Data Address can be the same thing i.e. a URL can serve as the ID/URI of a Document Data Object.

The Linked Data Web on the other hand, is a Distributed Object Database, and each Data Object must be uniquely defined, otherwise we introduce ambiguity that ultimately taints the Database itself (making incomprehensible to reasoning challenged machines). Thus we must have unique Object IDs (URIs / IRIs) for People, Places, Events, and other things that aren't Documents. Once we follow the time tested rules of Identity, People can then be associated with the things they create (blog posts, web pages, bookmarks, wikiwords etc). RDF is about expressing these graph model relationships while RDF serialization formats enables the information resources to transport these data object link ladden information resources to requesting User Agents.

Put in more succinct terms, all documents on the Web are compound documents in reality (e.g. mast contain a least an image these days). The Linked Data Web is about a Web where Data Object IDs (URIs) enable us to distill source data from the information contained in a compound document.

Examples:

- the ID (URI minted from URL via addition of #this) of a Data Object of Type Person that Identifies me. The Person definition I use comes from the FOAF vocabulary/schema/ontology/data dictionary
- the URI (also a URL) of a FOAF file that contains a description of the Data Object ID: (me)
As an information resource can be dispatched from an HTTP server to a User Agent in (X)HTML, RDF/XML, N3/Turtle representations via HTTP Content Negotiation (note: Look at the "Linked Data" tab to see one example of what Data Links facilitate re. Data Discovery and Exploration)
If I choose an Object ID of instead of then the HTTP Server should not return an information resource (i.e provide 200 OK response) when a User Agent requests a resource via HTTP using the URI: , because a Data Object ID (URI) and the Data Object Address (URL) cannot be the same when my Data Object isn't of Type Document; the sever has to use response code 303 to redirect the user agent to the URL of an information resource that matches the Content-type designated in the HTTP Request or determine representation based on it's own quality of service rules for the information resource associated with the Object ID (URI).

The degree of unobtrusiveness of new technology, concepts, or new applications of existing technology, is what ultimately determines eventual uptake and meme virulence (network effects). For a while, the Semantic Web meme was mired in confusion and general misunderstanding due to a shortage of practical use case scenario demos.

The emergence of the SPARQL Query Language has provided critical infrastructure for a number of products, projects, and demos, that now make the utility of the Semantic Web vision mush clearly via the simplicity of Linked Data, as exemplified by the following:

Linking Open Data Community - collection of People and Linked Data Spaces (across a variety of domains)
DBpedia - Ground zero for experiencing and comprehending Linked Data
OpenLink Data Spaces - a simple solution for creating Linked Data Web presence via from existing Web Data Sources (Blogs, Wikis, Shared Bookmarks, Tag Spaces, Web Sites, Social Networking Services, Web Services, Discussion Forums etc..)
OpenLink Virtuoso - a Universal Server for generating, managing, and deploying RDF Linked Data from SQL, XML, Web Services based data sources

Why Is This Post a Linked Data Demo, Again? Place the permalink of this post in a Linked Data aware user agent (OpenLink RDF Browser1, OpenLink RDF Browser2, Zitgist, DISCO, Tabulator), and the you can see the universal of interlinked data exposed by this post. The Title of this post should not be the sole mechanism for determining that it is Linked to other posts about the same topic.

Ryan Tomayko

So, What Does "HREF" Stand For, Anyway

Elias Torre

The Web FTW

Cool URIs for the Semantic Web.

Cool URIs, Fish, and Wine

Fri, 23 Jan 2009 22:22:00 GMT

I've just read James Governor's insightful post titled: Why Applications Are Like Fish and Data is Like Wine, where he sums up the comparative value of applications (code containers) and data as follows:

"Only one improves with age. With apologies to the originator of the phrase - “Hardware is like fish, operating systems are like wine.”

Yes! Applications are like Fish and Data like Wine, which is basically what Linked Data is fundamentally about, especially when you inject memes such as "Cool URIs" into the mix. Remember, the essence of Linked Data is all about a Web of Linked Data Objects endowed with Identifiers that don't change i.e., they occupy one place in public (e.g. World Wide Web) or private (your corporate Intranet or Extranet) networks, keeping the data that they expose relevant (as in fresh), accessible, and usable in many forms courtesy of the data access & representation dexterity that HTTP facilitates, when incorporated into object identifiers.

Here is another excerpt from his post that rings true (amongst many others):

What am I talking about? Processes change, and need to change. Baking data into the application is a bad idea because the data can’t then be extended in useful, and “unexpected ways”. But not expecting corporate data to be used in new ways is kind of like not expecting the Spanish Inquisition. But… “NOBODY expects the Spanish Inquisition! Amongst our weaponry are such diverse elements as: fear, surprise, ruthless efficiency, an almost fanatical devotion to the Pope.” (sounds like Enterprise Architecture ...).

Master Data Management & RDF based Linked Data

Another Paper Discussing RDF Data Publishing

Wed, 25 Jul 2007 02:02:56 GMT

I stumbled across an article titled: Thoughts on Compound Documents, from the Open Archives initiative (OAI). The article discusses the increasingly popular topic of deploying structured data containers on the Web.

This article, like the one from Mike, and our soon to be released Linked Data Deployment white paper, collectively address the main topic without inadvertent distraction by the misnomer: non-information resource. For instance, the OAI article uses the term: Generic Resource instead of Non-informaton Resource.

The Semantic Data Web is here, but we need to diffuse this reality across a broader spectrum of Web communities, so as to avoid unnecessary uptake inertia that can arise due basic incomprehension of key concepts such as Linked Data deployment.

My Hopes for Linked Data in 2009 (Update #2)

Wed, 07 Jan 2009 02:35:19 GMT

Happy New Year!

In 2009 I hope the following happens re. "Linked Data":

We realize it's a Meme
We collectively connect the Meme to the concept of granular hyperlinks between data entities/objects (datum to datum linkage aka. Hyperdata Linking)
We generally connect the Meme to technology ancestry such as the Entity-Attribute-Value with Classes & Relationships (EAV/CR) data model (then broader commonality with erstwhile unrelated realms will be unveiled e.g., Entity Frameworks from Microsoft, Core Data from Apple, SimpleDB from Amazon, and the Freebase Graph Model DB amongst others)
We instinctively connect the Meme to the concept of Entity Oriented Data Access and Management (RDF based Linked Data is basically EAV/CR scheme that uses HTTP based Pointers for Entity, Attribute, and Relationship Identifiers)
We naturally connect the Meme with the notion that an identifier for a unit of data (aka. Datum) should be the conduit to a negotiable representation of said Datum's description (i.e., it's attribute and relationship properties in HTML, XHTML, RDFa, Turtle, N3, RDF/XML etc., for example)
We ultimately connect the Meme with a conceptual-level approach to data integration across disparate data sources (also known as Master Data Management (MDM) ).

2009 is about a reboot on a monumental scale. We need new thinking, new technology, new approaches, and new solutions. No matter what route we take, we can't negate the importance of "Data". When dealing with organic or inorganic computers systems -- Data is simply everything!

The ability of individuals and enterprises to access, mesh, and disseminate data to relevant nodes across public and private networks will ultimately determine the winners and losers in the new frontier, ushered in by 2009.

Do not take data access and data management technology for granted. User interfaces come and ago, application logic comes and goes, but your data stays with you forever. If you are mystified by data access technology then make 2009 the year of data access technology demystification :-)

Meshups Demonstrating How SPARQL-GEO Enhances Linked Data Exploitation (Update 2)

Wed, 24 Mar 2010 15:44:24 GMT

Deceptively simple demonstrations of how Virtuoso's SPARQL-GEO extensions to SPARQL lay critical foundation for Geo Spatial solutions that seek to leverage the burgeoning Web of Linked Data.

Setup Information

SPARQL Endpoint: Linked Open Data Cache (8.5 Billion+ Quad Store which includes data from Geonames and the Linked GeoData Project Data Sets) .

Live Linked Data Meshup Links:

LinkedGeoData things within 2km ORDER BY Dist LIMIT 10 (Use from iPhone only since its an iPhone oriented Linked Data driven application)
LinkedGeoData things within 2km of Trafalgar Square | ORDER By Distance - closest first | ORDER By Distance - most distant first .

One Technology That Will Rock 2010 (Update 1)

Mon, 01 Feb 2010 14:02:41 GMT

Thanks to the TechCrunch post titled: Ten Technologies That Will Rock 2010, I've been able to quickly construct a derivative post that condenses the ten item list down to a Single Technology That Will Rock 2010 :-)

Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:

The Tablet: a new form factor addition re. Internet and Web application hosts which is just another way of saying: Linked Data will be accessible from Tablet applications.
Geo: GPS chips are now standard features of mobile phones, so geolocation is increasingly becoming a necessary feature for any killer app. Thus, GeoSpatial Linked Data and GeopSpatial Queries are going to be a critical success factor for any endeavor that seeks to engage mobile applications developers and ultimately their end-users. Basiacally, you want to be able to perform Esoteric Search from these devices of the form: Find Vendors of a Camcorder (e.g., with a Zoom Factor: Weight Ratio of X) within a 2km Radius of my current location. Or how many items from my WishList are available from a Vendor within a 2km radius of my current location. Conversely, provide Vendors with the ability to spot potential Customers within a 2km of a given "clicks & mortar" location (e.g. BestBuy store).
Realtime Search: Rich Structured Profiles that leverage standards such as FOAF and FOAF+SSL will enable Highly Personalized Realtime Search (HPRS) without compromisng privacy. Tecnically, this is about WebIDs securely bound to X.509 Certificates, providing access to verifiable and highly navigable Personal Profile Data Spaces that also double as personal search index entry points.
Chrome OS: Just another operating system for exploiting the burgeoning Web of Linked Data
HTML5: Courtesy of RDFa, just another mechanism for exposing Linked Data by making HTML+RDFa a bona fide markup for metadata (i.e., format for describing real world objects via their attribute-value graphs)
Mobile Video: Simplifies the production and sharing of Video annotations (comments, reviews etc.) en route to creating rich Linked Discourse Data Spaces.
Augmented Reality: Ditto
Mobile Transactions: As per points 1&2 above, Vendor Discovery and Transaction Conusmation will increasingly be driven by high SDQ applications. The "Funnel Effect" (more choices based on individual preferences) will be a critical success factor for any one operating in the Mobile Transaction realm. Note, without Linked Data you cannot deliver scalable solutions that handle the combined requirements of: SDQ, "Funnel Effect", and Mobile Device form factor, will simply maginify the importance of Web accessible Linked Data.
Android: An additional platform for items 1-8; basically, 2010 isn't going to be an iPhone only zone. Personally, this reminds me of a battle from the past i.e., Microsoft vs Apple, re. desktop computing dominance. Google has studied history very well :-)
Social CRM: this is simply about applying points 1-9 alongide the construction of Linked Data from eCRM Data Spaces.

As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:

Data Item or Object Identity
Data Structure -- Data Models
Data Representation -- Data Model Entity & Relationships Representation mechanism (as delivered by metadata oriented markup)
Data Storage -- Database Management Systems
Data Access -- Data Access Protocols
Data Presentation -- How you present Views and Reports from Structured Data Sources
Data Security -- Data Access Policies

The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.

Conclusion

I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.

The Cost of doing the Right Thing

Sat, 29 Mar 2008 04:50:07 GMT

One of the biggest impediments to the adoption of technology is the cost burden typically associated with doing the right thing. For instance, requirements for making the Linked Data Web (GGG) buzz would include the following (paraphrasing TimBL's original Linked Data meme):

-- Link to other Web accessible things using their URIs.

The list is nice, but actual execution can be challenging. For instance, when writing a blog post, or constructing a WikiWord, would you have enough disposable time to go searching for these URIs? Or would you compromise and continue to inject "Literal" values into the Web, leaving it to the reasoning endowed human reader to connect the dots?

Anyway, OpenLink Data Spaces is now equipped with a Glossary system that allows me to manage terms, meaning of terms, and hyper-linking of phrases and words matching associated with my terms. The great thing about all of this is that everything I do is scoped to my Data Space (my universe of discourse), I don't break or impede the other meanings of these terms outside my Data Space. The Glossary system can be shared with anyone I choose to share it with, and even better, it makes my upstreaming (rules based replication) style of blogging even more productive :-)

Remember, on the Linked Data Web, who you know doesn't matter as much as what your are connected to, directly or indirectly. Jason Kolb covers this issue in his post: People as Data Connectors, and so doesFrederick Giasson via a recent post titled: Networks are everywhere. For instance, this blog post (or the entire Blog) is a bona fide RDF Linked Data Source, you can use it as the Data Source of a SPARQL Query to find things that aren't even mentioned in this post, since all you are doing is beaming a query through my Data Space (a container of Linked Data Graphs). On that note, let's re-watch Jon Udell's "On-Demand-Blogosphere" screencast from 2006 :-)

Context, Tagging, Semantic Web, and Linked Data (Updated)

Tue, 27 May 2008 22:36:37 GMT

Courtesy of Nova Spivack's post titled: Tagging and the Semantic Web: Tags as Objects, I stumbled across a related post by John Clarke titled: Tagging and the Semantic Web. Both of these posts use the common practice of tagging to shed light on the increasing realization that "The Pursuit of Context" is the fusion point between the current Web and its evolution into a structured Web of Linked Data.

How Semantic Tagging Works (from a 1000 feet)

When tagging a document, the semantic tagging service passes the content of a target document through a processing pipeline (a distillation process of sorts) that results in automagic extraction of the following:

Named Entities

Subject matter Entities

Subject matter Concepts

Once the extraction phase is completed, a user is presented with a list of "suggested tags" using a variety of user interaction techniques. The literal values of elected Tags are then associated with one or more Tag and Tag Meaning Data Objects, with each Object type endowed with a unique Identifier.

Issues to Note

Broad acceptance that: "Context is king", is gradually taking shape. That said, "Context" landlocked within Literal values offers little over what we have right now (e.g. at Del.icio.us or Technorati), long term. By this I mean: if the end product of semantically enhanced tagging leaves us with: Literal Tag values only, Tags associated with Tag Data Objects endowed with platform specific Identifiers, or Tag Data Objects with any other Identity scheme that excludes HTTP, the ability of Web users to discern or derive multiple perspectives from the base Context (exposed by semantically enhanced Tags) will be lost, or severely impeded at best.

The shape, form, and quality of the lookup substrate that underlies semantic tagging services, ultimately affects "context fidelity" matters such as Entity Disambiguation. The importance of quality lookup infrastructure on the burgeoning Linked Data Web is the reason why OpenLink Software is intimately involved with the DBpedia and UMBEL projects.

Conclusions

I am immensely happy to see that the Web 2.0 and Semantic Web communities are beginning to coalesce around the issue of "Context". This was the case at the WWW2008 Linked Data Workshop, I am feeling a similar vibe emerging from the Semantic Web Technologies conference currently nearing completion in San Jose. Of course, I will be talking about, and demonstrating practical utility of all of this, at the upcoming Linked Data Planet conference.

My Data Space Tag Cloud

New W3C Incubator Group: Relational Database to RDF Mapping

Tue, 11 Mar 2008 17:58:24 GMT

The new RDB2RDF Incubator Group is now official. The group is sponsored by Oracle, HP, PartnersHealth, and OpenLink Software.

Goals

The goal of this effort is standardization of approaches (syntax and methodology) for mapping Relational Data Model instance data to RDF (Graph Data Model).

Benefits

Every record in a relational table/view/stored procedure (Table Valued Functions/Procedures) is declaratively morphed into an Entity (instance of a Class associated with a Schema/Ontology). The derived entities become part of a graph that exposes relationships and relationship traversal paths that have lower JOIN Costs than attempting the same thing directly via SQL. In a nutshell, you end up with a conceptual interface atop a logical data layer that enables a much more productive mechanism for exploring homogeneous and/or heterogeneous data without confinement at the DB instance, SQL DBMS type, host operating system, local area network, or wide area network levels.

Just as we have to mesh the Linked Data and Document Webs, unobtrusively. It's also important that the same principles to apply to exposure of RDBMS hosted data as RDF based Linked Data.

We all know that a large amount of data driving the IT engines of most enterprises resides in Relational Databases. And contrary to recent RDBMS vs RDF database misunderstandings espoused (hopefully inadvertently) by some commentators, Relational Database engines aren't going away anytime soon. Meshing Relational (logical) and Graph (conceptual) data models a natural progression along an evolutionary path towards: Analysis for All. By the way, there is a parallel evolution occurring in others realms such as Microsoft's ADO.NET's Entity Framework.

How would I use RDB2RDF Mapping?

To Unobtrusively expose existing data sources as RDF Linked Data. The links that follow provide examples:

Zitgist View

OpenLink RDF Browser View

DISCO Browser View

Tabulator View

Drupal

Blog Posts as Linked Data

Wordpress

Blog Posts as Linked Data

MediaWiki

Wikiwords as Linked Data

Virtuoso's Meta Schema Language for Declaratively generating RDF Views of SQL Data (Presentation, White Paper, Tutorial, and Online Docs)
ESW Wiki's Collection of SQL-RDF Mapping Tools
What the Semantic Web means for your Business

Exploring The Semantic Web & SPARQL FAQs, Linked Data Style!

Thu, 31 May 2007 21:43:47 GMT

The recently released Semantic Web FAQ (authored by Ivan Herman) has some neat Rich Internet and Semantic Data Web embellishments contributed by Ivan and Lee Feigenbaum. As a result, we not only have a great Semantic Web FAQ document, we also inherit a coherent piece of "demo fodder" that aids the general (S)emantic (W)eb (E)ducation and (O)reach (SWEO) that is clearly in full swing.

Of course, this also enables me to provide yet another Semantic Data Web demo in the form of additional viewing perspectives for the aforementioned FAQ (just click to see):

Lee also embarked on a similar embellishment effort re. the SPARQL Query Language FAQ thereby enabling me to also offer alternative viewing perspectives along similar lines:

Semantic Web Advocate of Tribe Linked Data! (Updated)

Thu, 20 Mar 2008 20:29:47 GMT

These days I increasingly qualify myself and my Semantic Web advocacy as falling under the realm Linked Data. Thus, I tend to use the following introduction: I am Kingsley Idehen, of the Tribe Linked Data.

The aforementioned qualification is increasingly necessary for the following reasons:

The Semantic Web vision is broad and comprised of many layers
A new era of confusion is taking shape just as we thought we had quelled the prior AI dominated realm of confusion
None of the Semantic Web vision layers are comprehensible in practical ways without a basic foundation
Open Data Access is the foundation of the Semantic Web (in prior post I used the term: Semantic Web Layer 1)
URIs units of Open Data Access in Semantic Web parlance i.e.. each datum on the Web must have an ID (minted by the host Data Space).

The terms GGG, Linked Data, Data Web, Web of Data, and Web 3.0 (when I use this term) all imply URI driven Open Data Access for the Web Database (maybe call this ODBC for the Web) -- ability to point to records across data spaces without any adverse effect to the remote data spaces. It's really important to note that none of the aforementioned terms have nothing to do with the "Linguistic Meaning of blurb". Building a smarter document exposed via a URL without exposing descriptive data links doesn't provide open access to information data sources.

As human beings we are all endowed with reasoning capability. But we can't reason without access to data. Dearth of openly accessible structured data is the source of many ills in cyberspace and across society in general. Today we still have Subjectivity reigning over Objectivity due to the prohibitive costs of open data access.

We can't cost-effectively pursue objectivity without cost-effective infrastructure for creating alternative views of the data behind information sources (e.g. Web Pages). More Objectivity and less Subjectivity is what the next Web Frontier is about. At OpenLink we simply use the moniker: Analysis for All! Everyone becomes a data analyst in some form, and even better, the analysis are easily accessible to anyone connected to the Web. Of course, you will be able to share special analysis with your private network of friends and family, or if you so choose, not at all :-)

Recap, it's important to note that Linked Data is the foundation layer of the Semantic Web vision. It's not only facilitates open data access, it also enables data integration (Meshing as opposed to Mashing) across disparate data schemas

As demonstrated by DBpedia and the Linked Data Solar system emerging around it, if you URI everything, then everything is Cool.

Linked Data and Information Silos are mutually exclusive concepts. Thus, you cannot produce a web accessible Information Silo and then refer to it as "Semantic Web" technology. Of course, it might be very Semantic, but it's fundamentally devoid of critical "Semantic Web" essence (DNA).

My acid test for any Semantic Web solution is simply this (using a Web User Agent or Client):

go to the profile page of the service
ask for an RDF representation of my profile (by this I mean "get me the raw data in structured form")
attempt to traverse the structured data graph (RDF) that the service provides via live de-referncable URIs.

Here is the Acid test against my Data Space:

My Profile Page (HTML representation dispatched via an instance of OpenLink Data Spaces)
Click on the "Linked Data Tab" (HTML representation endowed with Data Links the link to information resources containing other structured descriptions of things).

In Perpetual Pursuit of Context

Sat, 03 May 2008 19:07:32 GMT

I've always been of the opinion that concise value proposition articulation shouldn't be the achilles of the Semantic Web. As the Linked Data wave climbs up the "value Appreciation and Comprehension chain", it's getting clearer by the second that "Context" is a point of confluence for Semantic Web Technologies and easy to comprehend value, from the perspectives of those outside the core community.

In today's primarily Document centric Web, the pursuit of Context is akin to pursuing a mirage in a desert of user generated content. The quest is labor intensive, and you ultimaely end up without water at the end of the pursuit :-)

Listening to the Christine Connor's podcast interview with Talis simply reinforces my strong belief that "Context, Context, Context" is the Semantic Web's equivalent of Real Estate's "Location, Location, Location" (ignore the subprime loans mess for now). The critical thing to note is that you cannot unravel "Context" from existing Web content without incorporating powerful disambiguation technology into an "Entity Extraction" process. Of course, you cannot even consider seriously pursing any entity extraction and disambiguation endeavor without a lookup backbone that exposes "Named Entities" and their relationships to "Subject matter Concepts" (BTW - this is what UMBEL is all about). Thus, when looking at the broad subject of the Semantic Web, we can also look at "Context" as the vital point of confluence for the Data oriented (Linked Data) and the "Linguistic Meaning" oriented perspectives.

I am even inclined to state publicly that "Context" may ultimately be the foundation for 4th "Web Interaction Dimension" where practical use of AI leverages a Linked Data Web substrate en route to exposing new kinds of value :-)

"Context" may also be the focal point of concise value proposition articulation to VCs as in: "My solution offers the ability to discover and exploit "Context" iteratively, at the rate of $X.XX per iteration, across a variety of market segments :-)

What do people have against URLs or URIs? (Updated)

Mon, 23 Jun 2008 13:37:57 GMT

Stumbled across a nice post titled: What do people have against URLs?. My answer: Everything, if they don't understand the inherent power of URLs when incorporated into the "Data Source Naming" mechanism of the Web called: URIs :-)

URIs are simple to use i.e you simply click on them via a user agents UI. However, URLs when incorporated into Data Source Naming en route to constructing HTTP based Identifiers, that deliver HTTP based pointers to the location / address of a Resource Descriptions, another matter.

I touched on this issue in my Linked Data Planet keynote last week, and I must say, it did set off a light.

I believe, we can only get the broader Web community to comprehend the utility of URIs (Web Data Source Names) by exposing said utility via the Web's Universal Client (Web Browser). For instance, how do URN based Identity / Naming schemes help in a world dominated by Web Browsers that only grok "http://"? From my vantage point, the practical solution is for data providers who already have "doi", "lsid" and other Handle based Identifiers in place, to embark upon http-to-native-naming-scheme-proxying.

In my usual "dog-fooding" and "practice what you preach" fashion, this is exactly what we do in the new Linked Data Web extension that we've decided to reveal to the public (albeit late beta). Thus, when you use an existing browser to view pages with "lsid" or "doi" URNs, you still enjoy the utility of getting at the "Raw Linked Data Sources" that these names expose.

Why Do We Put Stuff On The Web, Really?

Sat, 25 Jul 2009 01:00:21 GMT

As espoused by the Ubuntu philosophy, no Human is an Island. Thus, although the objects of our sociality are vast and varied; that said, the basic foundation still centers on the pursuit and/or delivery of products and services.

Today, the we put stuff on the Web because we want it do be discovered as part of a "sharing act". Likewise, we make regular use of Search Engine Services because we want to "Find" stuff in a productive manner.

Putting, the above in context, you don't need to be Einstein to figure out that to date the Web hasn't enabled vendors to describe their products and services clearly. Likewise, it hasn't enabled us to describe what we want, when we want it, and how much we are willing to pay etc. Basically, the SDQ of Web Content is excruciatingly low!

The Linked Data meme is about using the essence of the Web -- HTTP URIs -- as the mechanism for conducting data across the Web that unambiguously unveils basic things like:

Using a personal profile to describe exactly who I am, my interests, favorite things, what I want (wishlist), what I have to offer (offerlist) etc.
Using an company profile to describe my entire product catalog, inventory levels, store locations, distributor and reseller networks, feature specs, price specs, deal terms and duration, and even opening and closing hours.

Conclusions

A Web of Linked Data enables a complete redefinition of eCommerce, and that's just for starters :-)

The Essence of the Matter re. Information Overload

Thu, 28 Aug 2008 19:56:20 GMT

The title of this post is an expression of my gut reaction to the quotes below, which originate from Leo Sauermann's post about the Nepomuk Semantic Desktop for KDE:

Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk's coordinator, explains, "The basic problem that we all face nowadays is how to handle vast amounts of information at a sensible rate." According to Bernardi, Nepomuk takes a traditional approach by creating a meta-data layer with well-defined elements that services can be built upon to create and manipulate the information.

The comment above echoes my sentiments about the imminence of "information overload" due to the vast amounts of user generated content on the Internet as a whole. We are going to need to process more an more data within a fixed 24 hour timeframe, while attempting to balance our professional and personal lives. Be rest assured, this is a very serious issue, and you cannot event begin to address it without a Web of Linked Data.

"The first idea of building the semantic desktop arose from the fact that one of our colleagues could not remember the girlfriends of his friends," Bernard says, more than half-seriously. "Because they kept changing -- you know how it is. The point is, you have a vast amount of information on your desktop, hidden in files, hidden in emails, hidden in the names and structures of your folders. Nepomuk gives a standard way to handle such information."

If you get a personal URI for Entity "You", via a Linked Data aware platform (e.g. OpenLink Data Spaces) that virtualizes data across your existing Web data spaces (blogs, feed subscriptions, wikis, shared bookmarks, photo galleries, calendars, etc.), you then only have to remember your URI whenever you need to "Find" something, imagine that!

To conclude, "information overload" is the imminent challenge of our time, and the keys to challenge alleviation lie in our ability to construct and maintain (via solutions) few context lenses (URIs) that provide coherent conduits into the dense mesh of structured Linked Data on the Web.

Geonames marches foward with ontology v1.2

Mon, 23 Oct 2006 13:02:33 GMT

Geonames marches foward with ontology v1.2: "

Geonames announced the release of its Geonames ontology v1.2. The new ontology has few enhancements. It introduced the notion of linked data and made clear distinction between URI that intended for linking documents and for linking ontology concepts.

Different types of geospatial data are of different spatial granularity. Data of different spatial granularity may relate to each other by the containment relation. For example, countries contain states, states contains cities and so on. Some geospatial data are of the similar spatial granularity (e.g., two cities that are nearby each other, or two countries that are neighboring each other). To support the knowledge representation of these relationships, the ontology introduced three new properties: childreanFeatures, nearbyFeatures and neighbouringFeatures.

In the Semantic Web, both ontology concepts and physical web documents are linked by URI. Sometimes in applications, it’s useful to make clear whether the use of a URI is intended for linking documents or for linking ontology concepts. The new Geonames ontology introduced a URI convention for identifying the intended usage of a URI. This convention also simplifies the discovering of geospatial data using Geonames web services.

Here is an example:

URI for linking to the concept city Berlin: http://sws.geonames.org/2950159/
URI for linking to the descriptions about the city Berlin: http://sws.geonames.org/2950159/about.rdf
URI for linking to the descriptions of places that are nearby Berlin: http://sws.geonames.org/2950159/nearby.rdf

Other interesting ontology properties include wikipediaArticle and locationMap. The former links a Feature instance to a Web article on Wikipedia, and the latter links a Feature instance to a digital map Web page.

For additional information about Geonames ontology v1.2, see Marc’s post at the Geonames blog.

(Via Geospatial Semantic Web Blog.)

The URI, URL, and Linked Data Meme's Generic HTTP URI (Updated)

Sun, 28 Mar 2010 16:19:00 GMT

Situation Analysis

As the "Linked Data" meme has gained momentum you've more than likely been on the receiving end of dialog with Linked Open Data community members (myself included) that goes something like this:

"Do you have a URI", "Get yourself a URI", "Give me a de-referencable URI" etc..

And each time, you respond with a URL -- which to the best of your Web knowledge is a bona fide URI. But to your utter confusion you are told: Nah! You gave me a Document URI instead of the URI of a real-world thing or object etc..

What's up with that?

Well our everyday use of the Web is an unfortunate conflation of two distinct things, which have Identity: Real World Objects (RWOs) & Address/Location of Documents (Information bearing Resources).

The "Linked Data" meme is about enhancing the Web by unobtrusively reintroducing its core essence: the generic HTTP URI, a vital piece of Web Architecture DNA. Basically, its about so realizing the full capabilities of the Web as a platform for Open Data Identification, Definition, Access, Storage, Representation, Presentation, and Integration.

What is a Real World Object?

People, Places, Music, Books, Cars, Ideas, Emotions etc..

What is a URI?

A Uniform Resource Identifier. A global identifier mechanism for network addressable data items. Its sole function is Name oriented Identification.

URI Generic Syntax

The constituent parts of a URI (from URI Generic Syntax RFC) are depicted below:

What is a URL?

A location oriented HTTP scheme based URI. The HTTP scheme introduces a powerful and inherent duality that delivers:

Resource Address/Location Identifier
Data Access mechanism for an Information bearing Resource (Document, File etc..)

So far so good!

What is an HTTP based URI?

The kind of URI Linked Data aficionados mean when they use the term: URI.

An HTTP URI is an HTTP scheme based URI. Unlike a URL, this kind of HTTP scheme URI is devoid of any Web Location orientation or specificity. Thus, Its inherent duality provides a more powerful level of abstraction. Hence, you can use this form of URI to assign Names/Identifiers to Real World Objects (RWO). Even better, courtesy of the Identity/Address duality of the HTTP scheme, a single URI can deliver the following:

RWO Identfier/Name
RWO Metadata document Locator (courtesy of URL aspect)
Negotiable Representation of the Located Document (courtesy of HTTP's content negotiation feature).

What is Metadata?

Data about Data. Put differently, data that describes other data in a structured manner.

How Do we Model Metadata?

The predominant model for metadata is the Entity-Attribute-Value + Classes & Relationships model (EAV/CR). A model that's been with us since the inception of modern computing (long before the Web).

What about RDF?

The Resource Description Framework (RDF) is a framework for describing Web addressable resources. In a nutshell, its a framework for adding Metadata bearing Information Resources to the current Web. Its comprised of:

Entity-Attribute-Value (aka. Subject-Predictate-Object) plus Classes & Relationships (Data Dictionaries e.g., OWL) metadata model
A plethora of instance data representation formats that include: RDFa (when doing so within (X)HTML docs), Turtle, N3, TriX, RDF/XML etc.

What's the Problem Today?

The ubiquitous use of the Web is primarily focused on a Linked Mesh of Information bearing Documents. URLs rather than generic HTTP URIs are the prime mechanism for Web tapestry; basically, we use URLs to conduct Information -- which is inherently subjective -- instead of using HTTP URIs to conduct "Raw Data" -- which is inherently objective.

Note: Information is "data in context", it isn't the same thing as "Raw Data". Thus, if we can link to Information via the Web, why shouldn't we be able to do the same for "Raw Data"?

How Does the Link Data meme solve the problem?

The meme simply provides a set of guidelines (best practices) for producing Web architecture friendly metadata. Meaning: when producing EAV/CR model based metadata, endow Subjects, their Attributes, and Attribute Values (optionally) with HTTP URIs. By doing so, a new level of Link Abstraction on the Web is possible i.e., "Data Item to Data Item" level links (aka hyperdata links). Even better, when you de-reference a RWO hyperdata link you end up with a negotiated representations of its metadata.

Conclusion

Linked Data is ultimately about an HTTP URI for each item in the Data Organization Hierarchy :-)

History of how "Resource" became part of URI - historic account by TimBL
Linked Data Design Issues Document - TimBL's initial Linked Data Guide
Linked Data Rules Simplified - My attempt at simplifying the Linked Data Meme without SPARQL & RDF distraction
Linked Data & Identity - another related post
The Linked Data Meme's Value Proposition
So What Does "HREF" stand for anyway?
My Del.icio.us hosted Bookmark Data Space for Identity Schemes
TimBL's Ted Talk re. "Raw Linked Data"
Resource Oriented Architecture
More Famous Than Simon Cowell .

W3C's SPARQLing Data Access Ingenuity

Thu, 17 Jan 2008 20:41:04 GMT

The W3C officially unveiled the SPARQL Query Language today via a press release titled: W3C Opens Data on the Web with SPARQL.

What is SPARQL?

A query language for the burgeoning Structured & Linked Data Web (aka Semantic Web / Giant Global Graph). Like SQL, for the Relational Data Model, it provides a query language for the Graph based RDF Data Model.

It's also a REST or SOAP based Web Service that exposes SPARQL access to RDF Data via an endpoint.

In addition, it's also a Query Results Serialization format that includes XML and JSON support.

Why is it Important?

It brings important clarity to the notion of the "Web as a Database" by transforming existing Web Sites, Portals, and Web Services into bona fide corpus of Mesh-able (rather than Mash-able) Data Sources. For instance, you can perform queries that join one or more of the aforementioned data sources in exactly the same manner (albeit different syntax) as you would one or more SQL Tables.

Example:

-- SPARQL equivalent of SQL SELECT * against my personal data space hosted FOAF file

SELECT DISTINCT ?s ?p ?o FROM WHERE {?s ?p ?o}

-- SPARQL against my social network -- Note: My SPARQL will be beamed across all of contacts in the social networks of my contacts as long as they are all HTTP URI based within each data space

PREFIX foaf: SELECT DISTINCT ?Person FROM WHERE {?s a foaf:Person; foaf:knows ?Person}

Note: you can use the basic SPARQL Endpoint, SPARQL Query By Example, or SPARQL Query Builder Demo tool to experiment with the demonstration queries above.

How Do I use It?

SPARQL is implemented by RDF Data Management Systems (Triple or Quad Stores) just as SQL is implemented by Relational Database Management Systems. The aforementioned data management systems will typically expose SPARQL access via a SPARQL endpoint.

Where are it's implementations?

A SPARQL implementors Testimonial page accompanies the SPARQL press release. In addition the is a growing collection of implementations on the ESW Wiki Page for SPARQL compliant RDF Triple & Quad Stores.

Is this really a big deal?

Yes! SPARQL facilitates an unobtrusive manifestation of a Linked Data Web by way of natural extension of the existing Document Web i.e these Web enclaves co-exist in symbiotic fashion.

As DBpedia very clearly demonstrates, Linked Data makes the Semantic Web demonstrable and much easier to comprehend. Without SPARQL there would be no mechanism for Linked Data deployment, and without Linked Data there is no mechanism for Beaming Queries (directly or indirectly) across the Giant Global Graph of data hosted by Social Networks, Shard Bookmarks Services, Weblogs, Wikis, RSS/Atom/OPML feeds, Photo Galleries and other Web accessible Data Sources (Data Spaces).

Related items

Cool URIs

Publishing Linked Data Tutorial

Detailed SPARQL Query Examples using SIOC Data Spaces

Detailed SPARQL Query Examples using FOAF Data Spaces

Virtuoso Linked Data Deployment In 3 Simple Steps

Tue, 02 Nov 2010 15:55:31 GMT

Injecting Linked Data into the Web has been a major pain point for those who seek personal, service, or organization-specific variants of DBpedia. Basically, the sequence goes something like this:

You encounter DBpedia or the LOD Cloud Pictorial.
You look around (typically following your nose from link to link).
You attempt to publish your own stuff.
You get stuck.

The problems typically take the following form:

Functionality confusion about the complementary Name and Address functionality of a single URI abstraction
Terminology confusion due to conflation and over-loading of terms such as Resource, URL, Representation, Document, etc.
Inability to find robust tools with which to generate Linked Data from existing data sources such as relational databases, CSV files, XML, Web Services, etc.

To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.

Step 1 - RDF Data Generation

Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.

Many options allow you to easily and quickly generate RDF data from other data sources:

Install the Sponger Bookmarklet for the URIBurner service. Bind this to your own SPARQL-compliant backend RDF database (in this scenario, your local Virtuoso instance), and then Sponge some HTTP-accessible resources.
Convert relational DBMS data to RDF using the Virtuoso RDF Views Wizard.
Starting with CSV files, you can
- Place them at an HTTP-accessible location, and use the Virtuoso Sponger to convert them to RDF or;
- Use the CVS import feature to import their content into Virtuoso's relational data engine; then use the built-in RDF Views Wizard as with other RDBMS data.
Starting from XML files, you can
- Use Virtuoso's inbuilt XSLT-Processor for manual XML to RDF/XML transformation or;
- Leverage the Sponger Cartridge for GRDDL, if there is a transformation service associated with your XML data source, or;
- Let the Sponger analyze the XML data source and make a best-effort transformation to RDF.

Step 2 - Linked Data Deployment

Install the Faceted Browser VAD package (fct_dav.vad) which delivers the following:

Faceted Browser Engine UI
Dynamic Hypermedia Resource Generator
- delivers descriptor resources for every entity (data object) in the Native or Virtual Quad Stores
- supports a broad array of output formats, including HTML+RDFa, RDF/XML, N3/Turtle, NTriples, RDF-JSON, OData+Atom, and OData+JSON.

Step 3 - Linked Data Consumption & Exploitation

Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --

Load a page like this in your browser: http://[:]/describe/?uri=
- [:] gets replaced by the host and port of your Virtuoso instance
- gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
Follow the links presented in the descriptor page.
If you ever see a blank page with a hyperlink subject name in the About: section at the top of the page, simply add the parameter "&sp=1" to the URL in the browser's Address box, and hit [ENTER]. This will result in an "on the fly" resource retrieval, transformation, and descriptor page generation.
Use the navigator controls to page up and down the data associated with the "in scope" resource descriptor.

Sample Descriptor Page (what you see post completion of the steps in this post)
What is Linked Data, really?
Painless Linked Data Generation via URIBurner
How To Load RDF Data Into Virtuoso (various methods)
Virtuoso Bulk Loader Script for RDF
Bulk Loader Script for CSV
Wizard based generation of RDF based Linked Data from ODBC accessible Relational Databases

Web of Linked Data & Hyperdata

Tue, 05 Feb 2008 01:43:55 GMT

I've just read the extensive post by Nova Spivack titled: The Semantic Web, Collective Intelligence and Hyperdata, courtesy of a post by Danny Ayres titled: Confused about the Semantic Web , in response to a post by Tim O'Reilly titled: Economist Confused About the Semantic Web? .

My Comments:

Hyperdata is short for HyperLinked Data :-) The same applies to Linked Data. Thus, we have two literal labels for the same core Concept. HTTP is the enabling protocol for "Hyper-linking" Documents and associated Structured Data via the World Wide Web (Web for short). Data Links associated with Structured Data contained in, or hosted by, Documents on the Web.

RDFa, eRDF, GRDDL, SPARQL Query Language, SPARQL Protocol (SOAP or REST service), SPARQL Results Serializations (XML or JSON) collectively provide a myriad of unobtrusive routes to structured data embedded within, or associated with, existing Web Documents.

As Danny already states, ontologies are not prerequisites for producing structured data using the RDF Data Model. They simply aid the ability to express one's self clearly (i.e. no repetition or ambiguity) across a broad audience of machines (directly) and their human masters (indirectly).

Using the crux of this post as the anecdote: The Semantic Data Web would simplify the process of claiming and/or proving that Linked Data and Hyperdata describe the same concept. It achieves this by using Triples (Subject, Predicate, Object) expressed in various forms (N3, Turtle, RDF/XML etc.) to formalize claims in a form palatable to electronic agents (machines) operating on behalf of Humans. In a nutshell, this increases human productive by completely obliterates the erstwhile exponential costs of discovering data, information, and knowledge.

BTW - for full effect, view this post (i.e. cut and paste the Permalink URI of this post, below) into an RDF Browser such as:

Bio2Rdf EC2 AMI is now Ready! (Updated)

Wed, 24 Dec 2008 16:05:13 GMT

Adding to the collection of Amazon EC2 AMI based knowledgebases already unveiled for DBpedia and NeuroCommons, we now have a Bio2Rdf knowledgebase AMI.

What is Bio2Rdf?

A community developed knowledgebase comprised of Bio Informatics data from across 30 or so public data sources. The standard deployment of Bio2Rdf includes a a federation of SPARQL endpoints provided by project members and collaborators.

What is the Bio2Rdf EC2 AMI?

An Amazon EC2 hosted variant of the Bio2Rdf knowledgebase. In addition to providing a SPARQL endpoint, the data exposed by the Amazon AMI is published in compliance with Linked Data publishing best practices espoused by the Linking Open Data community (LOD).

Benefits?

The ability to instantiate a personal or service-specific variant of this powerful knowledgebase via the Amazon EC2 Cloud. Instead of a 22+ hour error prone odyssey - you simply get down to the task of data analysis and integration within 1.5 hrs (when setting up you AMI for the first time).

How do I get going?

Just follow the instructions in the Bio2Rdf EC2 AMI installation guide.

Virtuoso, PHP Runtime Hosting: phpBB, Wordpress, Drupal, MediaWiki, and Linked Data

Fri, 26 Mar 2010 01:19:59 GMT

Runtime hosting is functionality realm of Virtuoso that is sometimes easily overlooked. In this post I want to provide a simple no-hassles HOWTO guide for installing Virtuoso on Windows (32 or 64 Bit), Mac OS X (Universal or Native 64 Bit), and Linux (32 or 64 Bit). The installation guide also covers the instantiation of phpBB3 as verification of the Virtuoso hosted PHP 3.5 runtime.

What are the benefits of PHP Runtime Hosting?

Like Apache, Virtuoso is a bona-fide Web Application Server for PHP based applications. Unlike Apache, Virtuoso is also the following:

a Hybrid Native DBMS Engine (Relational, RDF-Graph, and Document models) that is accessible via industry standard interfaces (solely)
a Virtual DBMS or Master Data Manager (MDM) that virtualizes heterogeneous data sources (ODBC, JDBC, Web Services, Hypermedia Resources, Non Hypermedia Resources)
an RDF Middleware solution for RDF-zation of non RDF resources across the Web and enterprise Intranets and/or Extranets (in the form of Cartridges for data exposed via REST or SOA oriented SOAP interfaces)
an RDF Linked Data Server (meaning it can deploy RDF Linked Data based on its native and/or virtualized data)

As result of the above, when you deploy a PHP application using Virtuoso, you inherit the following benefits:

Use of PHP-iODBC for in-process communication with Virtuoso
Easy generation of RDF Linked Data Views atop the SQL schemas of PHP applications
Easy deployment of RDF Linked Data from virtualized data sources
Less LAMP monoculture (*there is no such thing as virtuous monoculture*) when dealing with PHP based Web applications.

As indicated in prior posts, producing RDF Linked Data from the existing Web, where a lot of content is deployed by PHP based content managers, should simply come down to RDF Views over the SQL Schemas and deployment / publishing of the RDF Views in RDF Linked data form. In a nutshell, this is what Virtuoso delivers via its PHP runtime hosting and pre packaged VADs (Virtuoso Application Distribution packages), for popular PHP based applications such as: phpBB3, Drupal, WordPress, and MediaWiki.

In addition, to the RDF Linked Data deployment, we've also taken the traditional LAMP installation tedium out of the typical PHP application deployment process. For instance, you don't have to rebuild PHP 3.5 (32 or 64 Bit) on Windows, Mac OS X, or Linux to get going, simply install Virtuoso, and then select a VAD package for the relevant application and you're set. If the application of choice isn't pre packaged by us, simply install as you would when using Apache, which comes dow to situating the PHP files in your Web structure under the Web Application's root directory.

Installation Guide

Download the Virtuoso installer for Windows (32 Bit msi file or 64 Bit msi file), Mac OS X (Universal Binary dmg file), or instantiate the Virtuoso EC2 AMI (*search for pattern: "Virtuoso when using the Firefox extension for EC2 as the AMI ID is currently: ami-7c31d515 and name: virtuoso-test/virtuoso-cloud-beta-9-i386.manifest.xml, for latest cut*)
Run the installer (or download the movies using the links in the related section below)
Go to the Virtuoso Conductor (*which will show up at the end of the installation process* or go to http://localhost:8890/conductor)
Go to the "Admin" tab within the (X)HTML based UI and select the "Packages" sub-menu item (a Tab)
Pick phpBB3 (or any other pre-packaged PHP app) and then click on "Install/Upgrase"
The watch one of my silent movies or read the initial startup guides for Virtuoso hosted phpBB3, Drupal, Wordpress, MediaWiki.

At the current time, I've only provided links to ZIP files containing the Virtuoso installation "silent movies". This approach is a short-term solution to some of my current movie publishing challenges re. YouTube and Vimeo -- where the compressed output hasn't been of acceptable visual quality. Once resolved, I will publish much more "Multimedia Web" friendly movies :-)

6 Things That Must Remain Distinct re. Data

Thu, 04 Nov 2010 15:01:39 GMT

Conflation is the tech industry's equivalent of macroeconomic inflation. Whenever it rears it head, we lose value courtesy of diminishing productivity.

Looking retrospectively at any technology failure -- enterprises or industry at large -- you will eventually discover -- at the core -- messy conflation of at least one of the following:

Data Model (Semantics)
Data Object (Entity) Names (Identifiers)
Data Representation Syntax (Markup)
Data Access Protocol
Data Presentation Syntax (Markup)
Data Presentation Media.

The Internet & World Wide Web (InterWeb) are massive successes because their respective architectural cores embody the critical separation outlined above.

The Web of Linked Data is going to become a global reality, and massive success, because it leverages inherently sound architecture -- bar conflationary distractions of RDF. :-)

Introducing Virtuoso Universal Server (Cloud Edition) for Amazon EC2

Fri, 28 Nov 2008 21:06:02 GMT

What is it?

A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.

What does it offer?

From a Web Entrepreneur perspective it offers:

Low cost entry point to a game-changing Web 3.0+ (and beyond) platform that combines SQL, RDF, XML, and Web Services functionality
Flexible variable cost model (courtesy of EC2 DevPay) tightly bound to revenue generated by your services
Delivers federated and/or centralized model flexibility for you SaaS based solutions
Simple entry point for developing and deploying sophisticated database driven applications (SQL or RDF Linked Data Web oriented)
Complete framework for exploiting OpenID, OAuth (including Role enhancements) that simplifies exploitation of these vital Identity and Data Access technologies
Easily implement RDF Linked Data based Mail, Blogging, Wikis, Bookmarks, Calendaring, Discussion Forums, Tagging, Social-Networking as Data Space (data containers) features of your application or service offering
Instant alleviation of challenges (e.g. service costs and agility) associated with Data Portability and Open Data Access across Web 2.0 data silos
LDAP integration for Intranet / Extranet style applications.

From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:

RDF Database (a Quad Store with SPARQL & SPARUL Language & Protocol support)
SQL Database (with ODBC, JDBC, OLE-DB, ADO.NET, and XMLA driver access)
XML Database (XML Schema, XQuery/Xpath, XSLT, Full Text Indexing)
Full Text Indexing.

From a Middleware perspective it provides:

RDF Views (Wrappers / Semantic Covers) over SQL, XML, and other data sources accessible via SOAP or REST style Web Services
Sponger Service for converting non RDF information resources into RDF Linked Data "on the fly" via a large collection of pre-installed RDFizer Cartridges.

From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering

HTTP Web Server
WebDAV Server
Web Application Server (includes PHP runtime hosting)
SOAP or REST style Web Services Deployment
RDF Linked Data Deployment
SPARQL (SPARQL Query Language) and SPARUL (SPARQL Update Language) endpoints
Virtuoso Hosted PHP packages for MediaWiki, Drupal, Wordpress, and phpBB3 (just install the relevant Virtuoso Distro. Package).

From the general System Administrator's perspective it provides:

Online Backups (Backup Set dispatched to S3 buckets, FTP, or HTTP/WebDAV server locations)
Synchronized Incremental Backups to Backup Set locations
Backup Restore from Backup Set location (without exiting to EC2 shell).

Higher level user oriented offerings include:

OpenLink Data Explorer front-end for exploring the burgeoning Linked Data Web
Ajax based SPARQL Query Builder (iSPARQL) that enables SPARQL Query construction by Example
Ajax based SQL Query Builder (QBE) that enables SQL Query construction by Example.

For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:

Point of presence on the Linked Data Web that meshes your Identity and your Data via URIs
System generated Social Network Profile & Contact Data via FOAF?
System generated SIOC (Semantically Interconnected Online Community) Data Space (that includes a Social Graph) exposing all your Web data in RDF Linked Data form
System generated OpenID and automatic integration with FOAF
Transparent Data Integration across Facebook, Digg, LinkedIn, FriendFeed, Twitter, and any other Web 2.0 data space equipped with RSS / Atom support and/or REST style Web Services
In-built support for SyncML which enables data synchronization with Mobile Phones.

How Do I Get Going with It?

A Linked Data Web Approach To Semantic "Search" & "Find" (Updated)

Sat, 10 Jan 2009 18:55:56 GMT

The first salvo of what we've been hinting about re. server side faceted browsing over Unlimited Data within configurable Interactive Time-frames is now available for experimentation at: http://b3s.openlinksw.com/fct/facet.vsp.

Simple example / demo:

Enter search pattern: Microsoft

You will get the usual result from a full text pattern search i.e., hits and text excerpts with matching patterns in boldface. This first step is akin to throwing your net out to sea while fishing.

Now you have your catch, what next? Basically, this is where traditional text search value ends since regex or xpath/xquery offer little when the structure of literal text is the key to filtering or categorization based analysis of real-world entities. Naturally, this is where the value of structured querying of linked data starts, as you seek to use entity descriptions (combination of attribute and relationship properties) to "Find relevant things".

Continuing with the demo.

Click on "Properties" link within the Navigation section of the browser page which results in a distillation and aggregation of the properties of the entities associated with the search results. Then use the "Next" link to page through the properties until to find the properties that best match what you seek. Note, this particular step is akin to using the properties of the catch (using fishing analogy) for query filtering, with each subsequent property link click narrowing your selection further.

Using property based filtering is just one perspective on the data corpus associated with the text search pattern; thus, you can alter perspectives by clicking on the "Class" link so that you can filter you search results by entity type. Of course, in a number of scenarios you would use a combination of entity types and entity properties filters to locate the entities of interest to you.

A Few Notes about this demo instance of Virtuoso:

Lookup Data Size (Local Linked Data Corpus): 2 Billion+ Triples (entity-attribute-value tuples)
This is a *temporary* teaser / precursor to the LOD (Linking Open Data Cloud) variant of our Linked Data driven "Search" & "Find" service; we decided to implement this functionality prior to commissioning a larger and more up to date instance based on the entire LOD Cloud
The browser is simply using a Virtuoso PL function that also exists in Web Service form for loose binding by 3rd parties that have a UI orientation and focus (our UI is deliberately bare boned).
The properties and entity types (classes) links expose formal definitions and dictionary provenance information materialized in an HTML page (of course your browser or any other HTTP user agent can negotiation alternative representations of this descriptive information)
UMBEL based inference rules are enabled, giving you a live and simple demonstration of the virtues of Linked Data Dictionaries for example: click on the description link of any property or class from the foaf (friend-of-a-friend vocabulary), sioc (semantically-interlinked-online-communities ontology), mo (music ontology), bibo (bibliographic data ontology) namespaces to see how the data between these lower level vocabularies or ontologies are meshed with OpenCyc's upper level ontology.

Terminology & Specificity

Tue, 05 Feb 2008 01:47:01 GMT

Terminology is a pain to construct, and an even bigger pain to diffuse effectively, when dealing with large collections of superficially heterogeneous, and factually homogeneous, interlinked individuals.

In my "Linked Data & Web Information BUS" post (plus a few LOD mailing list posts), I had the delight and displeasure (on the brain primarily) of attempting to get terminology right with regards to Information- and Non-Information Web Resources. I eventually settled for Data Sources instead of the simpler and more obvious term: Data Resources :-)

Thus, I redefine the URIs from earlier past as follows:

http://demo.openlinksw.com/Northwind/Customer/ALFKI (Information Resource)

http://demo.openlinksw.com/Northwind/Customer/ALFKI#this (Data Resource)

Thanks to today's internet connectivity, it took a simple Skype ping from Mike Bergman, and a 30 minute (or so) session that followed for us to arrive at "Data Resource" as a clearer term for Non Information Resources.

Mike has promised to write a detailed post covering our Linked Data and the Structured Web terminology meshing odyssey.

YODA & the Data FORCE

Tue, 20 Jul 2010 17:53:06 GMT

The original design document (by TimBL) that lead to the WWW (*an important read*) was very clear about the need to create an "information space" that connects heterogeneous data sources. Unfortunately, in trying to create a moniker to distinguish one aspect of the Web (the Linked Document Web) from the part that was overlooked (the Linked Data Web), we ended up with a project code name that's fundamentally a misnomer in the form of: "The Semantic Web".

If we could just take "The Semantic Web" moniker for what it was -- a code name for an aspect of the Web -- and move on, things will get much clearer, fast!

Basically, what is/was the "Semantic Web" should really have been code named: ("You" Oriented Data Access) as a play on: Yoda's appreciation of the FORCE (Fact ORiented Connected Entities) -- the power of inter galactic, interlinked, structured data, fashioned by the World Wide Web courtesy of the HTTP protocol.

As stated in a earlier post, the next phase of the Web is all about the magic of entity "You". The single most important item of reference to every Web user would be the Person Entity ID (URI). Just by remembering your Entity ID, you will have intelligent pathways across, and into, the FORCE that the Linked Data Web delivers. The quality of the pathways and increased density of the FORCE are the keys to high SDQ (tomorrows SEO). Thus, the SDQ of URIs will ultimately be the unit determinant of value to Web Users, along the following personal lines, hence the critical platform questions:

Does your platform give me Identity (a URI) with high SDQ?
Do the Data Source Names (URIs) in your Data Spaces deliver high SDQ?

While most industry commentators continue to ponder and pontificate about what "The Semantic Web" is (unfortunately), the real thing (the "FORCE") is already here, and self-enhancing rapidly.

Assuming we now accept the FORCE is simply an RDF based Linked Data moniker, and that RDF Linked Data is all about the Web as a structured database, we should start to move our attention over to practical exploitation of this burgeoning global database, and in doing so we should not discard knowledge from the past such as the many great examples available gratis from the Relational Database realm. For instance, we should start paying attention to the discovery, development, and deployment of high level tools such as query builders, report writers, and intelligence oriented analytic tools, none of which should -- at first point of interaction -- expose raw RDF or the SPARQL query language. Along similar lines of thinking, we also need development environments and frameworks that are counterparts to Visual Studio, ACCESS, File Maker, and the like.

Numerati & The Magic of You!

What is the DBpedia Project? (Updated)

Sun, 31 Jan 2010 22:46:10 GMT

The recent Wikipedia imbroglio centered around DBpedia is the fundamental driver for this particular blog post. At time of writing this blog post, the DBpedia project definition in Wikipedia remains unsatisfactory due to the following shortcomings:

inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.

Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.

What is DBpedia?

A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.

Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.

When was it Created?

As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.

Who's Behind It?

OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).

How is it Constructed?

The steps are as follows:

RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia

In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.

Why is it Important?

It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.

How Do I Use it?

In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.

What Can I Use it For?

Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:

List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
Tagging - as a compliment to existing practices
Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.

Pre-loaded and Pre-configured instances of DBpedia 3.4 - via publicly shared Amazon Elastic Block Storage Snapshots
Virtuoso & DBpedia Tunning Guide
What's In a Name & The Linked Data Police.

What is the DBpedia Project? (Updated)

Wed, 15 Sep 2010 22:10:51 GMT

inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.

Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.

What is DBpedia?

When was it Created?

Who's Behind It?

How is it Constructed?

The steps are as follows:

RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia

In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist without a fully populated SPARQL compliant Quad Store. Last but not least, it doesn't exist if you have a fully loaded SPARQL compliant Quad Store isn't up to the cocktail of challenges (query load and complexity) presented by live Web database accessibility.

Why is it Important?

How Do I Use it?

In the most basic sense, simply browse the HTML based resource decriptor pages en route to discovering erstwhile undiscovered relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records), comprised of HTTP URIs from both realms e.g., via owl:sameAs relations.

What Can I Use it For?

Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:

List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
Tagging - as a compliment to existing practices
Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.

Pre-loaded and Pre-configured instances of DBpedia 3.4 - via publicly shared Amazon Elastic Block Storage Snapshots
Virtuoso & DBpedia Tunning Guide
What's In a Name & The Linked Data Police.

ODBC & WODBC Comparison

Tue, 20 May 2008 19:46:11 GMT

ODBC delivers open data access (by reference) to a broad range of enterprise databases via a 'C' based API. Thanks to the iODBC and unixODBC projects, ODBC is available across broad range of platforms beyond Windows.

ODBC identifies data sources using Data Source Names (DSNs).

WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.

ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.

WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).

ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).

WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!

So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!

Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.

URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.

I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.

The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.

By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)

Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 5

Tue, 25 May 2010 21:10:28 GMT

After a long period of trying to demystify and unravel the wonders of standards compliant structured data access, combined with protocols (e.g., HTTP) that separate:

Identity,
Access,
Storage,
Representation, and
Presentation.

I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.

Some Related Work

Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).

It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.

Data 3.0 manifesto

An "Entity" is the "Referent" of an "Identifier."
An "Identifier" SHOULD provide a global, unambiguous, and unchanging (though it MAY be opaque!) "Name" for its "Referent".
A "Referent" MAY have many "Identifiers" (Names), but each "Identifier" MUST have only one "Referent".
Structured Entity Descriptions SHOULD be based on the Entity-Attribute-Value (EAV) Data Model, and SHOULD therefore take the form of one or more 3-tuples (triples), each comprised of:
- an "Identifier" that names an "Entity" (i.e., Entity Name),
- an "Identifier" that names an "Attribute" (i.e., Attribute Name), and
- an "Attribute Value", which may be an "Identifier" or a "Literal".
Structured Descriptions SHOULD be CARRIED by "Descriptor Documents" (i.e., purpose specific documents where Entity Identifiers, Attribute Identifiers, and Attribute Values are clearly discernible by the document's intended consumers, e.g., humans or machines).
Structured Descriptor Documents can contain (carry) several Structured Entity Descriptions
Stuctured Descriptor Documents SHOULD be network accessible via network addresses (e.g., HTTP URLs when dealing with HTTP-based Networks).
An Identifier SHOULD resolve (de-reference) to a Structured Representation of the Referent's Structured Description.

Referent, Identifier, and Descriptor/Sense (The Data Perception Trinity) illustration
Referent, Identifier, and Descriptor/Sense Trinity (as exploited in FOAF+SSL based Secure WebIDs) illustration
Demystifying Linked Data via EAV Model based Structured Descriptions
What do people have against URIs and URLs?
The URI, URL, and Linked Data Meme's Generic HTTP URI
Simple Explanation of RDF and Linked Data Dynamics
Linked Data and Identity
FOAF+SSL FAQ
LOD Community Thread (showing evolution of this manifesto based on feedback from members such as Richard Cyganiak).
Googlebase Data API Docs
Google Data Protocol (GData)
Microsoft's OData Protocol
Magic of De-referencable Names and actual Data via Binky Video
Social Objects Presentation (aka. Social Linked Data Objects) - by Jyri Engeström
What's a Reference?

How Linked Data will change Advertising

Wed, 25 Mar 2009 12:30:58 GMT

This post is a reply to Jason Kolb's post titled: Using Advertising to Take Over the World. Jason's post is a response to Robert Scoble's post titled: Why Facebook has never listened and why it definitely won’t start now.

Jason:

Scoble is sensing what comes next, but in my opinion, describes it using an old obtrusive advertising model anecdote.

I've penned a post or two about the "Magic of You" which is all about the new Web power broker (Entity: "You").

Personally, I've long envisaged a complete overhaul of advertising where obtrusive advertising simply withers away; ultimately replaced by an unobtrusive model that is driven by individualized relevance and high doses of serendipity. Basically, this is ultimately about "taking the Ad out of item placement in Web pages".

The fundamental ingredients of an unobtrusive advertising landscape would include the following Human facts:

We are social beings and need stuff from time to time
We know what we need and would like to "Find stuff" when we are in "I Need Stuff" mode.

Ideally, we would like to be able to simply state the following, via a Web accessible profile:

Here are my "Wants" or "Needs" (my Wish-List)
Here are the products and services that I "Offer" (my Offer-List).

Now put the above into the context of an evolving Web where data items are becoming more visible by the second, courtesy of the "Linked Data" meme. Thus, things that weren't discernable via the Web: "People", "Places", "Music", "Books", "Products", etc., become much easier to identify and describe.

Assuming the comments above hold true re. the Web's evolution into a collection of Linked Data Spaces, and the following occur:

Structured profile pages become the basic units of Web presence
Wish-Lists and Offer-Lists are exposed by profile pages

Wish-Lists and Offer-Lists will gradually start bonding with increasing degrees of serendipity courtesy of exponential growth in Linked Data Web density.

So based on what I've stated so far, Scoble would simply browse the Web or visit his profile page, and in either scenario enjoy a "minority report" style of experience albeit all under his control (since he is the one driving his Web user agent).

What I describe above simply comes down to "Wish-lists" and associated recommendations becoming the norm outside the confines of Amazon's data space on the Web. Serendipitous discovery, intelligent lookups, and linkages are going to be the fundamental essence of Linked Data Web oriented applications, services, agents.

Beyond Scoble, it's also important to note that access to data will be controlled by entity "You". Your data space on the Web will be something you will controll access to in a myriad of ways, and it will include the option to provide licensed access to commercial entities on your terms. Naturally, you will also determine the currency that facilitates the value exchange :-)

What's Up with Chrome?

Thu, 04 Sep 2008 12:39:02 GMT

Here are a few descriptions of pages covering Google's Chrome browser:

Daniel Lewis - Comparative Analysis
CrunchBase - Product Page
GigaOM - Industry Analysis
Wikipedia
CNET - Privacy Issues Analysis
ReadWriteWeb - Security Issues Analysis
OakLeaf - SaaS Terms Analysis

As per usual, this is part post and part Linked Data demo. This time around, I am showcasing Proxy/Wrapper based dereferencable URIs and a new "Page Description" feature that showcases the capabilities of Virtuoso's in-built RDFization Middleware. Also note, the resource descriptions (RDF) are presented using an HTML page.

Linked Data enabling PHP Applications

Thu, 10 Apr 2008 18:12:47 GMT

Daniel lewis has penned a variation of post about Linked Data enabling PHP applications such as: Wordpress, phpBB3, MediaWiki etc.

Daniel simplifies my post by using diagrams to depict the different paths for PHP based applications exposing Linked Data - especially those that already provide a significant amount of the content that drives Web 2.0.

If all the content in Web 2.0 information resources are distillable into discrete data objects endowed with HTTP based IDs (URIs), with zero "RDF handcrafting Tax", what do we end up with? A Giant Global Graph of Linked Data; the Web as a Database.

So, what used to apply exclusively, within enterprise settings re. Oracle, DB2, Informix, Ingres, Sybase, Microsoft SQL Server, MySQL, PostrgeSQL, Progress Open Edge, Firebird, and others, now applies to the Web. The Web becomes the "Distributed Database Bus" that connects database records across disparate databases (or Data Spaces). These databases manage and expose records that are remotely accessible "by reference" via HTTP.

As I've stated at every opportunity in the past, Web 2.0 is the greatest thing that every happened to the Semantic Web vision :-) Without the "Web 2.0 Data Silo Conundrum" we wouldn't have the cry for "Data Portability" that brings a lot of clarity to some fundamental Web 2.0 limitations that end-users ultimately find unacceptable.

In the late '80s, the SQL Access Group (now part of X/Open) addressed a similar problem with RDBMS silos within the enterprise that lead to the SAG CLI which is exists today as Open Database Connectivity.

In a sense we now have WODBC (Web Open Database Connectivity), comprised of Web Services based CLIs and/or traditional back-end DBMS CLIs (ODBC, JDBC, ADO.NET, OLE-DB, or Native), Query Language (SPARQL Query Language), and a Wire Protocol (HTTP based SPARQL Protocol) delivering Web infrastructure equivalents of SQL and RDA, but much better, and with much broader scope for delivering profound value due to the Web's inherent openness. Today's PHP, Python, Ruby, Tcl, Perl, ASP.NET developer is the enterprise 4GL developer of yore, without enterprise confinement. We could even be talking about 5GL development once the Linked Data interaction is meshed with dynamic languages (delivering higher levels of abstraction at the language and data interaction levels). Even the underlying schemas and basic design will evolve from Closed World (solely) to a mesh of Closed & Open World view schemas.

5 Very Important Things to Note about HTTP based Linked Data

Mon, 01 Feb 2010 14:00:56 GMT

It isn't World Wide Web Specific (HTTP != World Wide Web)
It isn't Open Data Specific
It isn't about "Free" (Beer or Speech)
It isn't about Markup (so don't expect to grok it via "markup first" approach)
It's about Hyperdata - the use of HTTP and REST to deliver a powerful platform agnostic mechanism for Data Reference, Access, and Integration.

When trying to understand HTTP based Linked Data, especially if you're well versed in DBMS technology use (User, Power User, Architect, Analyst, DBA, or Programmer) think:

Open Database Connectivity (ODBC) without operating system, data model, or wire-protocol specificity or lock-in potential
Java Database Connectivity (JDBC) without programming language specificity
ADO.NET without .NET runtime specificity and .NET bound language specificity
OLE-DB without Windows operating system & programming language specificity
XMLA without XML format specificity - with Tabular and Multidimensional results formats expressible in a variety of data representation formats.
All of the above scoped to the Record rather than Container level, with Generic HTTP scheme URIs associated with each Record, Field, and Field value (optionally)

Remember the need for Data Access & Integration technology is the by product of the following realities:

Human curated data is ultimately dirty, because:
- our thick thumbs, inattention, distractions, and general discomfort with typing, make typos prevalent
- database engines exist for a variety of data models - Graph, Relational, Hierarchical;
- within databases you have different record container/partition names e.g. Table Names;
- within a database record container you have records that are really aspects of the same thing (different keys exist in a plethora of operational / line of business systems that expose aspects of the same entity e.g., customer data that spans Accounts, CRM, ERP application databases);
- different field names (one database has "EMP" while another has "Employee") for the same record
Units of measurement is driven by locale, the UK office wants to see sales in Pounds Sterling while the French office prefers Euros etc.
All of the above is subject to context halos which can be quite granular re. sensitivity e.g. staff travel between locations that alter locales and their roles; basically, profiles matters a lot.

WUPnP Cheatsheet

Tue, 29 Jul 2008 17:06:40 GMT

WUPnP Cheatsheet: "

The Web Universal Plug and Play (WUPnP) Cheatsheet:

Essentially, if you build an application and use the technologies suggested in the ‘glue section’ then your web application/service (whether it’s front-end or back-end) will fit into many many other web applications/services… and therefore also more manageable for the future! This is WUPnP.

Key technologies for making your services/applications as sticky as possible:

Dereferenceable URI’s (which indicate HTTP networking)
OpenID
OAuth
SPARQL
Linked Data RDF (or RDFa) and OWL

Web-based plug and play fun!

(Via Daniel Lewis.)

.NET, LINQ, and RDF based Linked Data (Update 2)

Fri, 08 Aug 2008 12:54:01 GMT

At OpenLink, we've been investigating LinqToRdf, an exciting project from Andrew Matthews that seeks to expose the Semantic Web technology space to the large community of .NET developers.

The LinqToRdf project is about binding LINQ to RDF. It sits atop Joshua Tauberer's C# based Semantic Web/RDF library which has been out there for a while and works across Microsoft .NET and it's open source variant "Mono".

Historically, the Semantic Web realm has been dominated by RDF frameworks such as Sesame, Jena and Redland; which by their Open Source orientation, predominantly favor non-Windows platforms (Java and Linux). Conversely, Microsoft's .NET frameworks have sought to offer Conceptualization technology for heterogeneous Logical Data Sources via .NET's Entity Frameworks and ADO.NET, but without any actual bindings to RDF.

Interestingly, believe it or not, .NET already has a data query language that shares a number of similarities with SPARQL, called Entity-SQL, and a very innovative programming language called LINQ; that offers a blend of constructs for natural data access and manipulation across relational (SQL), hierarchical (XML), and graph (Object) models without the traditional object language->database impedance tensions of the past.

With regards to all of the above, we've just released a mini white paper that covers the exploitation of RDF-based Linked Data using .NET via LINQ. The paper offers a an overview of LinqToRdf, plus enhancements we've contributed to the project (available in LinqToRdf v0.8.). The paper includes real-world examples that tap into a MusicBrainz powered Linked Data Space, the Music Ontology, the Virtuoso RDF Quad Store, Virtuoso Sponger Middleware, and our RDfization Cartridges for Musicbrainz.

Enjoy!

DBpedia + BBC (combined) Linked Data Space Installation Guide

Tue, 29 Mar 2011 14:09:45 GMT

What?

The DBpedia + BBC Combo Linked Dataset is a preconfigured Virtuoso Cluster (4 Virtuoso Cluster Nodes, each comprised of one Virtuoso Instance; initial deployment is to a single Cluster Host, but license may be converted for physically distributed deployment), available via the Amazon EC2 Cloud, preloaded with the following datasets:

Why?

The BBC has been publishing Linked Data from its Web Data Space for a number of years. In line with best practices for injecting Linked Data into the World Wide Web (Web), the BBC datasets are interlinked with other datasets such as DBpedia and MusicBrainz.

Typical follow-your-nose exploration using a Web Browser (or even via sophisticated SPARQL query crawls) isn't always practical once you get past the initial euphoria that comes from comprehending the Linked Data concept. As your queries get more complex, the overhead of remote sub-queries increases its impact, until query results take so long to return that you simply give up.

Thus, maximizing the effects of the BBC's efforts requires Linked Data that shares locality in a Web-accessible Data Space — i.e., where all Linked Data sets have been loaded into the same data store or warehouse. This holds true even when leveraging SPARQL-FED style virtualization — there's always a need to localize data as part of any marginally-decent locality-aware cost-optimization algorithm.

This DBpedia + BBC dataset, exposed via a preloaded and preconfigured Virtuoso Cluster, delivers a practical point of presence on the Web for immediate and cost-effective exploitation of Linked Data at the individual and/or service specific levels.

How?

To work through this guide, you'll need to start with 90 GB of free disk space. (Only 41 GB will be consumed after you delete the installer archives, but starting with 90+ GB ensures enough work space for the installation.)

Install Virtuoso

Download Virtuoso installer archive(s). You must deploy the Personal or Enterprise Edition; the Open Source Edition does not support Shared-Nothing Cluster Deployment.
Obtain a Virtuoso Cluster license.
Install Virtuoso.
Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell and install directory):

. /opt/virtuoso/virtuoso-enterprise.sh
Optional: To keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME environment variable to a different directory, e.g.,

export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/

Note: You will have to adjust this setting every time you shift between this cluster setup and your single-server setup. Either may be made your environment's default through the virtuoso-enterprise.sh and related scripts.
Set up your cluster by running the mkcluster.sh script. Note that initial deployment of the DBpedia + BBC Combo requires a 4 node cluster, which is the default for this script.
Start the Virtuoso Cluster with this command:

virtuoso-start.sh
Stop the Virtuoso Cluster with this command:

virtuoso-stop.sh

Using the DBpedia + BBC Combo dataset

Navigate to your installation directory.
Download the combo dataset installer script — bbc-dbpedia-install.sh.
For best results, set the downloaded script to fully executable using this command:

chmod 755 bbc-dbpedia-install.sh
Shut down any Virtuoso instances that may be currently running.
Optional: As above, if you have decided to keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME environment variable appropriately, e.g.,

export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Run the combo dataset installer script with this command:

sh bbc-dbpedia-install.sh

Verify installation

The combo dataset typically deploys to EC2 virtual machines in under 90 minutes; your time will vary depending on your network connection speed, machine speed, and other variables.

Once the script completes, perform the following steps:

Verify that the Virtuoso Conductor (HTTP-based Admin UI) is in place via:

http://localhost:[port]/conductor
Verify that the Virtuoso SPARQL endpoint is in place via:

http://localhost:[port]/sparql
Verify that the Precision Search & Find UI is in place via:

http://localhost:[port]/fct
Verify that the Virtuoso hosted PivotViewer is in place via:

http://localhost:[port]/PivotViewer

BBC Linked Data Spaces Presentation
BBC Music Linked Dataset Snapshot -- PivotViewer Page Screenshot
BBC Programmes Linked Dataset Snapshot -- -- PivotViewer Page Screenshot
BBC Nature Linked Dataset Snapshot -- PivotViewer Page Screenshot
BBC Food Recipes Snapshot -- PivotViewer Page Screenshot
My Del.icio.us bookmark collection re. BBC Linked Data Demos
Amazon EC2 Snapshots for DBpedia 3.6 + BBC combo -- delivers the BBC and DBpedia dataset combo via a mountable Elastic Block Storage (EBS) device usable with an Amazon Machine Image (AMI)
Amazon EC2 Snapshots for DBpedia 3.6 & 3.5
Virtuoso Commercial Edition Download Page
Virtuoso Cluster Edition Guide

New Preconfigured Virtuoso AMI for Amazon EC2 Cloud comprised of Linked Data from BBC & DBpedia

Tue, 29 Mar 2011 13:52:17 GMT

What?

Introducing a new preloaded and preconfigured Virtuoso (Cluster Edition) AMI for the Amazon EC2 Cloud that hosts combined Linked Datasets from:

Why?

Predictably instantiate a powerful database with high quality data and cross links within minutes, for personal or service specific use.

How?

Simply follow the instructions in our Amazon EC2 guide for the BBC + DBpedia 3.6 Linked Dataset guide.

Your installation steps are as follows:

Instantiate a Virtuoso EC2 AMI
Mount the Amazon Elastic Block Storage (EBS) snapshot that hosts the preloaded Virtuoso Database.

BBC Linked Data Spaces Presentation
BBC Music Linked Dataset Snapshot -- PivotViewer Page Screenshot
BBC Programmes Linked Dataset Snapshot -- -- PivotViewer Page Screenshot
BBC Nature Linked Dataset Snapshot -- PivotViewer Page Screenshot
BBC Food Recipes Snapshot -- PivotViewer Page Screenshot
My Del.icio.us bookmark collection re. BBC Linked Data Demos
Amazon EC2 Snapshots for DBpedia 3.6 + BBC combo -- delivers the BBC and DBpedia dataset combo via a mountable Elastic Block Storage (EBS) device usable with an Amazon Machine Image (AMI)
Amazon EC2 Snapshots for DBpedia 3.6 & 3.5
Virtuoso Commercial Edition Download Page
Virtuoso Cluster Edition Guide

URIBurner: Painless Generation & Exploitation of Linked Data (Update 1 - Demo Links Added)

Thu, 11 Mar 2010 15:16:34 GMT

What is URIBurner?

A service from OpenLink Software, available at: http://uriburner.com, that enables anyone to generate structured descriptions -on the fly- for resources that are already published to HTTP based networks. These descriptions exist as hypermedia resource representations where links are used to identify:

the entity (data object or datum) being described,
each of its attributes, and
each of its attributes values (optionally).

The hypermedia resource representation outlined above is what is commonly known as an Entity-Attribute-Value (EAV) Graph. The use of generic HTTP scheme based Identifiers is what distinguishes this type of hypermedia resource from others.

Why is it Important?

The virtues (dual pronged serendipitous discovery) of publishing HTTP based Linked Data across public (World Wide Web) or private (Intranets and/or Extranets) is rapidly becoming clearer to everyone. That said, the nuance laced nature of Linked Data publishing presents significant challenges to most. Thus, for Linked Data to really blossom the process of publishing needs to be simplified i.e., "just click and go" (for human interaction) or REST-ful orchestration of HTTP CRUD (Create, Read, Update, Delete) operations between Client Applications and Linked Data Servers.

How Do I Use It?

In similar vane to the role played by FeedBurner with regards to Atom and RSS feed generation, during the early stages of the Blogosphere, it enables anyone to publish Linked Data bearing hypermedia resources on an HTTP network. Thus, its usage covers two profiles: Content Publisher and Content Consumer.

Content Publisher

The steps that follow cover all you need to do:

place a tag within your HTTP based hypermedia resource (e.g. within section for HTML )
use a URL via the @href attribute value to identify the location of the structured description of your resource, in this case it takes the form: http://linkeddata.uriburner.com/about/id/{scheme-or-protocol}/{your-hostname-or-authority}/{your-local-resource}
for human visibility you may consider adding associating a button (as you do with Atom and RSS) with the URL above.

That's it! The discoverability (SDQ) of your content has just multiplied significantly, its structured description is now part of the Linked Data Cloud with a reference back to your site (which is now a bona fide HTTP based Linked Data Space).

Examples

HTML+RDFa based representation of a structured resource description:

JSON based representation of a structured resource description:

N3 based representation of a structured resource description:

RDF/XML based representations of a structured resource description:

Content Consumer

As an end-user, obtaining a structured description of any resource published to an HTTP network boils down to the following steps:

go to: http://uriburner.com
drag the Page Metadata Bookmarklet link to your Browser's toolbar
whenever you encounter a resource of interest (e.g. an HTML page) simply click on the Bookmarklet
you will be presented with an HTML representation of a structured resource description (i.e., identifier of the entity being described, its attributes, and its attribute values will be clearly presented).

Examples

Description of a Book culled from an Amazon web page
Description of a product offering culled from a BestBuy web page
Description of a product (a camera) culled from a CNET web page
Description of the same CNET product as an Offer on eBay (exposed by the description above via seeAlso property value).

If you are a developer, you can simply perform an HTTP operation request (from your development environment of choice) using any of the URL patterns presented below:

HTML:

curl -I -H "Accept: text/html" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}

JSON:

curl -I -H "Accept: application/json" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl http://linkeddata.uriburner.com/about/data/json/{scheme}/{authority}/{local-path}

Notation 3 (N3):

curl -I -H "Accept: text/n3" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl http://linkeddata.uriburner.com/about/data/n3/{scheme}/{authority}/{local-path}

curl -I -H "Accept: text/turtle" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl http://linkeddata.uriburner.com/about/data/ttl/{scheme}/{authority}/{local-path}

RDF/XML:

curl -I -H "Accept: application/rdf+xml" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl http://linkeddata.uriburner.com/about/data/xml/{scheme}/{authority}/{local-path}

Conclusion

URIBurner is a "deceptively simple" solution for cost-effective exploitation of HTTP based Linked Data meshes. It doesn't require any programming or customization en route to immediately realizing its virtues.

If you like what URIBurner offers, but prefer to leverage its capabilities within your domain -- such that resource description URLs reside in your domain, all you have to do is perform the following steps:

download a copy of Virtuoso (for local desktop, workgroup, or data center installation) or
instantiate Virtuoso via the Amazon EC2 Cloud
enable the Sponger Middleware component via the RDF Mapper VAD package (which includes cartridges for over 30 different resources types)

When you install your own URIBurner instances, you also have the ability to perform customizations that increase resource description fidelity in line with your specific needs. All you need to do is develop a custom extractor cartridge and/or meta cartridge.

Virtuoso Sponger Middleware -- (technology behind URIBurner Service)
Animation demonstrating how the Virtuoso Sponger works.

5 Game Changing Things about the OpenLink Virtuoso + AWS Cloud Combo

Mon, 01 Feb 2010 13:59:36 GMT

Here are 5 powerful benefits you can immediately derive from the combination of Virtuoso and Amazon's AWS services (specifically the EC2 and EBS components):

Acquire your own personal or service specific data space in the Cloud. Think DBase, Paradox, FoxPRO, Access of yore, but with the power of Oracle, Informix, Microsoft SQL Server etc.. using a Conceptual, as opposed to solely Logical, model based DBMS (i.e., a Hybrid DBMS Engine for: SQL, RDF, XML, and Full Text)
Ability to share and control access to your resources using innovations like FOAF+SSL, OpenID, and OAuth, all from one place
Construction of personal or organization based FOAF profiles in a matter of minutes; by simply creating a basic DBMS (or ODS application layer) account; and then using this profile to create strong links (references) to all your Data silos (esp. those from the Web 2.0 realm)
Load data sets from the LOD cloud or Sponge existing Web resources (i.e., on the fly data transformation to RDF model based Linked Data) and then use the combination to build powerful lookup services that enrich the value of URLs (think: Web addressable reports holding query results) that you publish
Bind all of the above to a domain that you own (e.g. a .Name domain) so that you have an attribution-friendly "authority" component for resource URLs and Entity URIs published from your Personal Linked Data Space on the Web (or private HTTP network).

In a nutshell, the AWS Cloud infrastructure simplifies the process of generating Federated presence on the Internet and/or World Wide Web. Remember, centralized networking models always end up creating data silos, in some context, ultimately! :-)

Library of Congress & Reasonable Linked Data

Wed, 06 May 2009 18:26:15 GMT

While exploring the Subject Headings Linked Data Space (LCSH) recently unveiled by the Library of Congress, I noticed that the URI for the subject heading: World Wide Web, exposes an "owl:sameAs" link to resource URI: "info:lc/authorities/sh95000541" -- in fact, a URI.URN that isn't HTTP protocol scheme based.

The observations above triggered a discussion thread on Twitter that involved: @edsu, @iand, and moi. Naturally, it morphed into a live demonstration of: human vs machine, interpretation of claims expressed in the RDF graph.

What makes this whole thing interesting?

It showcases (in Man vs Machine style) the issue of unambiguously discerning the meaning of the owl:sameAs claim expressed in the LCSH Linked Data Space.

Perspectives & Potential Confusion

From the Linked Data perspective, it may spook a few people to see owl:sameAs values such as: "info:lc/authorities/sh95000541", that cannot be de-referenced using HTTP.

It may confuse a few people or user agents that see URI de-referencing as not necessarily HTTP specific, thereby attempting to de-reference the URI.URN on the assumption that it's associated with a "handle system", for instance.

It may even confuse RDFizer / RDFization middleware that use owl:sameAs as a data provider attribution mechanism via hint/nudge URI values derived from original content / data URI.URLs that de-reference to nothing e.g., an original resource URI.URL plus "#this" which produces URI.URN-URL -- think of this pattern as "owl:shameAs" in a sense :-)

Unambiguously Discerning Meaning

Simply bring OWL reasoning (inference rules and reasoners) into the mix, thereby negating human dialogue about interpretation which ultimately unveils a mesh of orthogonal view points. Remember, OWL is all about infrastructure that ultimately enables you to express yourself clearly i.e., say what you mean, and mean what you say.

Path to Clarity (using Virtuoso, its in-built Sponger Middleware, and Inference Engine):

GET the data into the Virtuoso Quad store -- what the sponger does via its URIBurner Service (while following designated predicates such as owl:sameAs in case they point to other mesh-able data sources)
Query the data in Quad Store with "owl:sameAs" inference rules enabled
Repeat the last step with the inference rules excluded.

Actual SPARQL Queries:

SPARQL Query against the HTTP based Subject Heading URI for WWW
SPARQL Query (with reasoning via inference rule for owl:sameAs) against the URN based Subject Heading URI for WWW
SPARQL Query (*without* reasoning via inference rule for owl:sameAs) against the URN based Subject Heading URI for WWW

Observations:

The SPARQL queries against the Graph generated and automatically populated by the Sponger reveal -- without human intervention-- that: "info:lc/authorities/sh95000541", is just an alternative name for < xmlns="http" id.loc.gov="id.loc.gov" authorities="authorities" sh95000541="sh95000541" concept="concept">, and that the graph produced by LCSH is self-describing enough for an OWL reasoner to figure this all out courtesy of the owl:sameAs property :-).

Hopefully, this post also provides a simple example of how OWL facilitates "Reasonable Linked Data".

Linked Data & Socially Enhanced Collaboration (Enterprise or Individual) -- Update 1

Thu, 04 Mar 2010 00:50:37 GMT

Socially enhanced enterprise and invididual collaboration is becoming a focal point for a variety of solutions that offer erswhile distinct content managment features across the realms of Blogging, Wikis, Shared Bookmarks, Discussion Forums etc.. as part of an integrated platform suite. Recently, Socialtext has caught my attention courtesy of its nice features and benefits page . In addition, I've also found the Mike 2.0 portal immensely interesting and valuable, for those with an enterprise collaboration bent.

Anyway, Socialtext and Mike 2.0 (they aren't identical and juxtaposition isn't seeking to imply this) provide nice demonstrations of socially enhanced collaboration for individuals and/or enterprises is all about:

Identifying Yourself
Identifying Others (key contributors, peers, collaborators)
Serendipitous Discovery of key contributors, peers, and collaborators
Serendipitous Discovery by key contributors, peers, and collaborators
Develop and sustain relationships via socially enhanced professional network hybrid
Utilize your new "trusted network" (which you've personally indexed) when seeking help or propagating a meme.

As is typically the case in this emerging realm, the critical issue of discrete "identifiers" (record keys in sense) for data items, data containers, and data creators (individuals and groups) is overlooked albeit unintentionally.

How HTTP based Linked Data Addresses the Identifier Issue

Rather than using platform constrained identifiers such as:

email address (a "mailto" scheme identifier),
a dbms user account,
application specific account, or
OpenID.

It enables you to leverage the platform independence of HTTP scheme Identifiers (Generic URIs) such that Identifiers for:

You,
Your Peers,
Your Groups, and
Your Activity Generated Data,

simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked Data Objects endowed with High SDQ (Serendipitious Discovery Quotient). For example my Personal WebID is all anyone needs to know if they want to explore:

My Profile (which includes references to data objects associated with my interests, social-network, calendar, bookmarks etc.)
Data generated by my activities across various data spaces (via data objects associated with my online accounts e.g. Del.icio.us, Twitter, Last.FM)
Linked Data Meshups via URIBurner (or any other Virtuoso instance) that provide an extend view of my profile

How FOAF+SSL adds Socially aware Security

Even when you reach a point of equilibrium where: your daily activities trigger orchestratestration of CRUD (Create, Read, Update, Delete) operations against Linked Data Objects within your socially enhanced collaboration network, you still have to deal with the thorny issues of security, that includes the following:

Single Sign On,
Authentication, and
Data Access Policies.

FOAF+SSL, an application of HTTP based Linked Data, enables you to enhance your Personal HTTP scheme based Identifer (or WebID) via the following steps (peformed by a FOAF+SSL compliant platform):

Imprint WebID within a self-signed x.509 based public key (certificate) associated with your private key (generated by FOAF+SSL platform or manually via OpenSSL)
Store public key components (modulous and exponent) into your FOAF based profile document which references your Personal HTTP Identifier as its primary topic
Leverage HTTP URL component of WebID for making public key components (modulous and exponent) available for x.509 certificate based authentication challenges posed by systems secured by FOAF+SSL (directly) or OpenID (indirectly via FOAF+SSL to OpenID proxy services).

Contrary to conventional experiences with all things PKI (Public Key Infrastructure) related, FOAF+SSL compliant platforms typically handle the PKI issues as part of the protocol implementation; thereby protecting you from any administrative tedium without compromising security.

Conclusions

Understanding how new technology innovations address long standing problems, or understanding how new solutions inadvertently fail to address old problems, provides time tested mechanisms for product selection and value proposition comprehension that ultimately save scarce resources such as time and money.

If you want to understand real world problem solution #1 with regards to HTTP based Linked Data look no further than the issues of secure, socially aware, and platform independent identifiers for data objects, that build bridges across erstwhile data silos.

If you want to cost-effectively experience what I've outlined in this post, take a look at OpenLink Data Spaces (ODS) which is a distributed collaboration engine (enterprise of individual) built around the Virtuoso database engines. It simply enhances existing collaboration tools via the following capabilities:

Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items (if missing)

Ability to integrate across a myriad of Data Source Types rather than a select few across RDBM Engines, LDAP, Web Services, and various HTTP accessible Resources (Hypermedia or Non Hypermedia content types)
Addition of FOAF+SSL based authentication
Addition of FOAF+SSL based Access Control Lists (ACLs) for policy based data access.

Get Yourself A WebID in 5 Minutes or Less via OpenLink Data Spaces (an application layer built atop Virtuoso)
How To Share Resources Securely Using FOAF+SSL
FOAF+SSL & WebID Demonstration
OpenLink Data Spaces & Data Portability.

My 5 Favorite Things about Linked Data on the Web

Sun, 09 Mar 2008 15:48:35 GMT

End to Buzzword Blur - how buzzwords are used to obscure comprehension of core concepts. Let SKOS, MOAT, SCOT reign!
End of Data Silos - you don't own me, my data, my data's mobility (import/export), or accessibility (by reference) just because I signed up for Yet Another Software as Service (ySaaS)
End of Misinformation - Sins of omission will no longer go unpunished the era of self induced amnesia due to competitive concerns is over, Co-opetition shall reign (Ray Noorda always envisoned this reality)
Serendipitous information and data discovery gets cheaper by the second - you're only a link away for a universe of relevant and accessible data
Rise of Quality - Contrary to historic president (due to all of the above) well engineered solutions will no longer be sure indicators of commercial failure

BTW - Benjamin Nowack penned an interesting post titled: Semantic Web Aliases, that covers a variety of labels used to describe the Semantic Web. The great thing about this post is that it provides yet another demonstration-in-the-making for the virtues of Linked Data :-)

Labels are harmless when their sole purpose is the creation of routes of comprehension for concepts. Unfortunately, Labels aren't always constructed with concept comprehension in mind, most of the time they are artificial inflectors and deflectors servicing marketing communications goals.

Anyway, irrespective of actual intent, I've endowed all of the labels from Bengee's post with URIs as my contribution important disambiguation effort re. the Semantic Web:

Semantic Web (timbl)
Web of Data (timbl)
lowercase semantic [wW]eb (tantek)
Semantic Web 2.0 (by stefandecker, IIRC)
Web 3.0 (by nova and others)
Semantic Graph (by nova and others)
Hyperdata (by danja) Linked Data (by timbl, and implemented by the Chris Bizer and Richard Cyganiak inspired, Linking Open Data Community and it's poster project DBpedia)
Linked Data Web (by kidehen)
Structured Web (by mkbergman)
Semantic Data Web (by kidehen)
SemWeb (by the developer community)
GGG - The Giant Global Graph (by timbl) Web 3G (by iand)

As per usual this post is best appreciated when processed via an Linked Data aware user agent.

View Plurality Deficiency & Programming Language Autism

Wed, 17 Sep 2008 14:54:48 GMT

I've just read a really nice post by Henry Story titled: Are OO Languages Autistic?

In typical style, Henry walks you through his point of view using simple but powerful illustrations. Here is a key statement in his post that really struck me:

"In order to be able to have a mental theory one needs to be able to understand that other people may have a different view of the world. On a narrow three dimensional understanding of 'view', this reveals itself in that people at different locations in a room will see different things. One person may be able to see a cat behind a tree that will be hidden to another. In some sense though these two views can easily be merged into a coherent description."

Opaque Web pages (e.g., generated by Semantic Technology inside offerings that will not expose or share data entity URIs), irrespective of how smart the underlying page generation and visualization technology may be, a fundamentally autistic and counter intuitive as we move toward a Web of Linked Data.

Preoccupation with the "V" aspect of the M-V-C trinity is inadvertently compounding and the problem of digital autism on the Web. Unbeknownst to the purveyors of data silos and proprietary service lock-in, digital autism on the Web ultimately implies Web business model autism.

In Response to: This is Not the Future (Update #3)

Thu, 22 Jan 2009 00:02:47 GMT

As I cannot post directly to Glenn's blog titled: This is Not the Near Future (Either), I have to basically respond to him here, in blog post form :-(

What is our "Search" and "Find" demonstration about? It is about how you use the "Description" of "Things" to unambiguously locate things in a database at Web Scale.

To our perpetual chagrin, we are trying to demonstrate an engine -- not UI prowess -- but the immediate response is to jump to the UI aesthetics.

Google, Yahoo etc.. offer a simple input form for full text search patterns, they have a processing window for completing full text searches across Web Content indexed on their servers. Once the search patterns are processed, you get a page ranked result set (collection of Web pages basically that claim/state: we found N pages out of a document corpus of about M indexed pages).

Note: the estimate aspect of traditional search results in like "advertising small print" the user lives with the illusion that all possible documents on the Web (or even Internet) have been searched whereas in reality: 25% of the possible total is a major stretch; since the Web and Internet are fractal networks and scale-free, inherently growing at exponential rates "ad infinitum" across boundless dimensions of human comprehension.

The power of Linked Data ultimately comes down to the fact that the user constructs the path to what they seek via the properties of the "Things" in question. The routes are not hardwired since URI de-referencing (follow your nose pattern) is available to Linked Data aware query engines and crawlers.

We are simply trying to demonstrate how you can combine the best of full text search with the best of structured querying while reusing familiar interaction patterns from Google/Yahoo. Thus, you start with full text search, find get all the entities associated with the pattern, then use the entity types or entity properties to find what you seek.

You state in your post:

"To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed."

Correct.

"On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?".

Not exactly correct. We are performing aggregates using a configurable interactive time factor. Example: tell me how many entities of type: Person, with interest: Semantic Web, exist in this database within 2 seconds. Also understand that you could retry the same query and get different numbers within the same interactive time factor. It isn't your basic "query cut-off".

"And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can."

Yes, "Microsoft" was a poor example for sure, the example could have been pattern: "glenn mcdonald", which should demonstrate the fundamental utility of what we are trying to demonstrate i.e., entity disambiguation courtesy of entity properties and/or entity type filtering.

Compare Googles results for: Glenn McDonald with those from our demo (which dissambiguate "Glenn McDonald" via associated properties and/or types), assuming we both agree that your Web Site or Blog Home isn't the center of your entity graph or personal data space (i.e., data about you); so getting your home page at the top of the Google page rank offers limited value, in reality.

What are we bragging about? A little more than what you attempt to explain. Yes, we are showing that we can find stuff within a processing window, but understand the following:

Processing Time Window (or interactive time) is configurable
Data Corpus is a Billion+ Triples (from Billion Triples Challenge Data Set)
SPARQL doesn't have Aggregation capabilities by default (we have implemented SPARQL-BI to deliver aggregates for analytics against large data sets, we even handle the TPC-H industry standard benchmark with SPARQL-BI)
Paging isn't possible without aggregates, and doing aggregates on a Billion+ triples as part of a query processing cycle isn't trivial stuff (otherwise it would be everywhere due to inherent and obvious necessity).

I hope I've clarified what's going on with our demo? If not, pose your challenge via examples and I will respond with solutions or simply cry out loud: "no mas!".

As for your "Mac OX X Leopard" comments, I can only say this: I emphasized that this is a demo, the data is pretty old, and the input data has issues (i.e. some of the input data is bad as your example shows). The purpose of this demo is not about the text per se., it's about the size of the data corpus and faceted querying. We are going to have the entire LOD Cloud loaded into the real thing, and in addition to that our Sponger Middleware will be enabled, and then you can take issue with data quality as per your reference to "Cyndi Lauper" (btw - it takes one property filter to find information about her quickly using "dbpprop:name" after filtering for properties with text values).

Of all things, this demo had nothing to do with UI and Information presentation aesthetics. It was all about combining full text search and structured queries (sparql behind the scenes) against a huge data corpus en route to solving challenges associated with faceted browsing over large data sets. We have built a service that resides inside Virtuoso. The Service is naturally of the "Web Service" variety and can be used from any consumer / client environment that speaks HTTP (directly or indirectly).

To be continued ...

Commercializing the Semantic Web

Fri, 16 May 2008 20:15:29 GMT

Unfortunately, I could only spend 4 days at the recent WWW2008 event in Beijing (I departed the morning following the Linked Data Workshop), so I couldn't take my slot on the "Commercializing the Semantic Web panel" etc.. Anyway, thanks to the Web I can still inject my points of view in the broad Web based discourse. Well so I hoped, when I attempted to post a comment to Paul Miller's ZDNet domain hosted blog thread titled: Commercialising the Semantic Web.

Unfortunately, the cost of completing ZDNet's unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I'll settle for a trackback ping instead.

What follows is the cut and paste of my intended comment contributions to Paul's post.

Paul,

As discussed earlier this week during our podcast session, commercialization of Semantic Web technology shouldn't be a mercurial matter at this stage in the game :-) It's all about looking at how it provides value :-)

From the Linked Data angle, the ability to produce, dispatch, and exploit "Context" across an array of "Perspectives" from a plethora of disparate data sources on the Web and/or behind corporate firewalls, offers immense commercial value.

Yahoo's Searchmonkey effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as "value consumption tickets" (Data Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to data encountered on the Web. Yahoo! is about to put this light on in a big way (imho).

The "self annotating" nature of the Web is what ultimately drives the manifestation of the long awaited Semantic Web. I believe I postulated about "Self Annotation & the Semantic Web" in a number of prior posts which, by the way, should be DataRSS compatible right now due to Yahoo's support of OpenSearch Data Providers (which this Blog Space has been for eons).

Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, Tag, Weblog, Shared Bookmark, Wikiword, Microformat, Microformat++ (eRDF or RDFa), GRDDL stylesheet, and RDFizer etc.. is a piece of structured data.

Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)

Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (information resource), and then you add Structure to the information resource (RSS, Atom, microformats, RDFa, eRDF, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to Linked Data) is a synch thanks to RDF Middleware (as per earlier RDF middleware posts).

Commercializing the Semantic Web

Sun, 18 May 2008 14:58:26 GMT

What follows is the cut and paste of my intended comment contributions to Paul's post.

Paul,

Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)

Linked Data & The Web Information BUS

Wed, 08 Aug 2007 22:26:55 GMT

Chris Bizer, Richard Cyganiak, and Tom Heath have just published a Linked Data Publishing Tutorial that provides a guide to the mechanics of Linked Data injection into the Semantic Data Web.

On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS.

What is an Information BUS?

Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol.

The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers).

What are Web Information Payloads?

HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc).

Examples of Information Resource and Data Source URIs:

http://demo.openlinksw.com/Northwind/Customer/ALFKI

http://demo.openlinksw.com/Northwind/Customer/ALFKI#this

Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL).

What about Structured Data?

Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following:

HTML - Web Resource with presentation focused structure (Web 1.0 dominant payload form)
XML - Web Resource with structure that separates presentation and data (Web 2.0's dominant payload form).

The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as an Information Resource. It's comprised of granular data items called "Entities", that expose fine grained properties values, individual and/or group characteristics (attributes), and relationships (associations) with other Entities.

Where is this all headed?

The Web is in the final stages of the 3rd phase of it's evolution. A phase characterized by the shuttling of structured data payloads (RDF) alongside less data oriented payloads (HTML, XHTML, XML etc.). As you can see, Linked Data and Structured Data are both terms used to describe the addition of more data centric payloads to the Web. Thus, you could view the process of creating a Structured Web of Linked Data as follows:

Identify or Create Structured Data Sources
Name these Data Sources using Data Source URIs
Expose Structured Data Sources to the Web as Linked Data using Information Resource (conduit) URIs

Conclusions

The Semantic Data Web is an evolution of the current Web (an Information Space) that adds structured data payloads (RDF) to current, less data oriented, structured payloads (HTML, XHTML, XML, and others).

The Semantic Data Web is increasingly seen as an inevitability because it's rapidly reaching the point of critical mass (i.e. network effect kick-in). As a result, Data Web emphasis is moving away from: "What is the Semantic Data Web?" To: "How will Semantic Data Web make our globally interconnected village an even better place?", relative to the contributions accrued from the Web thus far. Remember, the initial "Document Web" (Web 1.0) bootstrapped because of the benefits it delivered to blurb-style content publishing (remember the term electronic brochure-ware?). Likewise, in the case of the "Services Web" (Web 2.0), the bootstrap occurred because it delivered platform independence to Web Application Developers - enabling them to expose application logic behind Web Services. It is my expectation that the Data Integration prowess of the Data Web will create a value exchange realm for data architects and other practitioners from the database and data access realms.

Live Virtuoso instance hosting Linked Open Data (LOD) Cloud

Wed, 01 Apr 2009 18:26:22 GMT

We have reached a beachead re. the Virtuoso instance hosting the Linked Open Data (LOD) Cloud; meaning, we are not going to be performing any major updates and deletions short-term, bar incorporation of fresh data sets from the Freebase and Bio2RDF projects (both communities a prepping new RDF data sets).

At the current time we have loaded 100% of all the very large data sets from the LOD Cloud. As result, we can start the process of exposing Linked Data virtues in a manner that's palatable to users, developers, and database professionals across the Web 1.0, 2.0, and 3.0 spectrums.

What does this mean?

You can use the "Search & Find" or"URI Lookup" or SPARQL endpoint associated with the LOD cloud hosting instance to perform the following tasks:

Find entities associated with full text search patterns -- Google Style, but with Entity & Text proximity Rank instead of Page Rank, since we are dealing with Entities rather than documents about entities
Find and Lookup entities by Identifier (URI) -- which is helpful when locating URIs to use for identify entities in your own linked data spaces on the Web
View entity descriptions via a variety of representation formats (HTML, RDFa, RDF/XML, N3, Turtle etc.)
Determine uses of entity identifiers across the LOD cloud -- which helps you select preferred URIs based on usage statistics.

What does it offer Web 1.0 and 2.0 developers?

If you don't want to use the SPARQL based Web Service, or other Linked Data Web oriented APIs for interacting with the LOD cloud programmatically, you can simply use the powerful REST style Web Service that provides URL parameters for performing full text oriented "Search", entity oriented "Find" queries, and faceted navigation over the huge data corpus with results data returned in JSON and XML formats.

Next Steps:

Amazon have agreed to add all the LOD Cloud data sets to their existing public data sets collective. Thus, the data sets we are loading will be available in "raw data" (RDF) format on the public data sets page via Named Elastic Block Storage (EBS) Snapshots); meaning, you can make an EC2 AMI (e.g. a Linux, Windows, Solaris) and install an RDF quad or triple store of choice into your AMI, then simply load data from the LOD cloud based on your needs.

In addition to the above, we are also going to offer a Virtuoso 6.0 Cluster Edition based LOD Cloud AMI (as we've already done with DBpedia, MusicBrainz, NeuroCommons, and Bio2Rdf) that will enable you to simply instantiate a personal and service specific edition of Virtuoso with all the LOD data in place and fully tuned for performance and scalability; basically, you will simply press "Instantiate AMI" and a LOD cloud data space, in true Linked Data from, will be at your disposal within minutes (i.e. the time it takes the DB to start).

Work on the migration of the LOD data to EC2 starts this week. Thus, if you are interested in contributing an RDF based data set to the LOD cloud now is the time to get your archive links in place on the (see: ESW Wiki page for LOD Data Sets).

Response to: What is Web 3.0 and Why Should I Care?

Thu, 29 Jan 2009 18:45:11 GMT

Another post done in response to lost comments. This time, the comments relate to Robin Bloor's article titled: What is Web 3.0 and Why Should I Care?

Robin:

Web 3.0 is fundamentally about the World Wid Web becoming a structured database equipped with a formal data model (RDF which is a moniker for Entity-Attribute-Value with Classes & Relationships based Graph Model), query language, and a protocol for handling divrerse data representational requirements via negotiation

Web 3.0 is about a Web that facilitates serendipitous discovery of relevant things; thereby making serendipitous discovery quotient (SDQ), rather than search engine optimization (SEO), the critical success factor that drives how resources get published on the Web.

Personally, I believe we are on the cusp of a major industry inflection re. how we interact with data hosted in computing spaces. In a nutshell, the conceptual model interaction based on real-world entities such as people, places, and other things (including abstract subject matter) will usurp traditional logical model interaction based on rows and columns of typed and/or untyped literal values exemplified by relational data access and management systems.

Labels such as "Web 3.0", "Linked Data", and "Semantic Web", are simply about the aforementioned model transition playing out on the World Wide Web and across private Linked Data Webs such as Intranets & Extranets, as exemplified emergence of the "Master Data Management" label/buzzword.

What's the critical infrastructure supporting Web 3.0?

As was the case with Web Services re. Web 2.0, there is a critical piece of infrastructure driving the evolution in question, and in this case it comes down to the evolution of Hyperlinking.

We now have a new and complimentary variant of Hyperlinking commonly referred to as "Hyperdata" that now sits alongside "Hypertext". Hyperdata when used in conjunction with HTTP based URIs as Data Source Names (or Identifiers), delivers a potent and granular data access mechanism scoped down to the datum (object or record) level; which is much different from the document (record or entity container) level linkage that Hypertext accords.

In addition, the incorporation of HTTP into this new and enhanced granular Data Source Naming mechanism also addresses past challenges relating to separation of data, data representation, and data transmission protocols -- remember XDR woes familiar to all sockets level programmers -- courtesy of in-built content negotiation. Hence, via a simple HTTP GET --against a Data Source Name exposed by a Hyperdata link -- I can negotiate (from client or server sides) the exact representation of the description (entity-attribute-value graph) of an Entity / Data Object / Resource, dispatched by a data server.

For example, this is how a description of entity "Me" ends up being available in (X)HTML or RDF document representations (as you will observe when you click on that link to my Personal URI).

The foundation of what I describe above comes from:

Entity-Attribute-Value & Class Relationship Data Model (originating from LISP era with detours via the Object Database era. into the Triples approach in RDF)
Use of HTTP based Identifiers in the Entity ID construction process
SPARQL query language for the Data Model.

Some live examples from DBpedia:

http://dbpedia.org/resource/Linked_Data
http://dbpedia.org/resource/Hyperdata
http://dbpedia.org/resource/Entity-attribute-value_model
http://dbpedia.org/resource/Benjamin_Franklin

Entity Oriented Data Access

Tue, 04 Nov 2008 03:51:48 GMT

Recent perturbations in Data Access and Data Management technology realms are clear signs of an imminent inflection. In a nutshell, the focus of data access is moving from the "Logical Level" (what you see if you've ever looked at a DBMS schema derived from an Entity Data Model) to the "Conceptual Level" (i.e., the Entity Model becoming concrete).

In recent times I've stumbled across Master Data Management (MDM) which is all about entities that provide holistic views of enterprise data (or what I call: Context Lenses). I've also stumbled across emerging tensions in the .NET realm between Linq to Entities and Linq to SQL, where in either case the fundamental issues comes down to the optimal paths "Conceptual Level Access" over the "Logical Logical Level" when dealing with data access in the .NET realm.

Strangely, the emerging realm of RDF Linked Data, MDM, and .NET's Entity Frameworks, remain strangely disconnected.

Another oddity is the obvious, but barely acknowledged, blurring of the lines between the "traditional enterprise employee" and the "individual Web netizen". The fusion between these entities is one of the most defining characteristics of how the Web is reshaping the data landscape.

At the current time, I tend to crystalize my data access world view under the moniker: YODA ("You" Oriented Data Access), based on the following:

Entities are the new focal point of data access, management, and integration
"You" are the entry point (Data Source Name) into this new realm of inter connected Entities that the Web exposes
"You" the "Person" Entity is associated with many other "Things" such as "Organizations", "Other People", "Books", "Music", "Subject Matter" etc.
"You" the "Person" needs Identity in this new global database, which is why "You" need to Identify "Yourself" using an an HTTP based Entity ID (aka. URI)
When "You" have an ID for "Yourself" it becomes much easier for the essence of "You" to be discovered via the Web
When "Others" have IDs for "Themselves" on the Web it becomes much easier for "You" to serendipitously discover or explicitly "Find" things on the Web.

Is the Semantic Web necessary (and feasible)?

Fri, 29 Aug 2008 15:08:12 GMT

Here is another "Linked Discourse" effort via a blog post that attempts to add perspective to a developing Web based conversation. In this case, the conversation originates from Juan Sequeda's recent interview with Jana Thompson titled: Is the Semantic Web necessary (and feasible)?

Jana: What are the benefits you see to the business community in adopting semantic technology?

Me: Exposure, exploitation, of untapped treasure trove of interlinked data, information, and knowledge across disparate IT infrastructure via conceptual entry points (Entity IDs / URIs / Data Source Names) that refer to as "Context Lenses".

Jana: Do you think these benefits are great enough for businesses to adopt the changes?

Me: Yes, infrastructural heterogeneity is a fact of corporate life (growth, mergers, acquisitions etc). Any technology that addresses these challenges is extremely important and valuable. Put differently, the opportunity costs associated with IT infrastructural heterogeneity remains high!

Jana: How large do you think this impact will actually be?

Me: Huge, enterprise have been aware of their data, information, and knowledge treasure troves etc. for eons. Tapping into these via a materialization of the "information at your fingertips" vision is something they've simply been waiting to pursue without any platform lock-in, for as long as I've been in this industry.

Jana: I’ve heard, from contacts in the Bay Area, that they are skeptical of how large this impact of semantic technology will actually be on the web itself, but that the best uses of the technology are for fields such as medical information, or as you mentioned, geo-spatial data.

Me: Unfortunately, those people aren't connecting the Semantic Web and open access to heterogeneous data sources, or the intrinsic value of holistic exploration location of entity based data networks (aka Linked Data).

Jana: Are semantic technologies going to be part of the web because of people championing the cause or because it is actually a necessary step?

Me: Linked Data technology on the Web is a vital extension of the current Web. Semantic Technology without the "Web" component, or what I refer to as "Semantics Inside only" solutions, simply offer little or no value as Web enhancements based on their incongruence with the essence of the Web i.e., "Open Linkage" and no Silos! A nice looking Silo is still a Silo.

Jana: In the early days of the web, there was an explosion of new websites, due to the ease of learning HTML, from a business to a person to some crackpot talking about aliens. Even today, CSS and XHTML are not so difficult to learn that a determined person can’t learn them from W3C or other tutorials easily. If OWL becomes the norm for websites, what do you think the effects will be on the web? Do you think it is easy enough to learn that it will be readily adopted as part of the standard toolkit for web developers for businesses?

Me: Correction, learning HTML had nothing to do with the Web's success. The value proposition of the Web simply reached critical mass and you simply couldn't afford to not be part of it. The easiest route to joining the Web juggernaut was a Web Page hosted on a Web Site. The question right now is: what's the equivalent driver for the Linked Data Web bearing in mind the initial Web bootstrap. My answer is simply this: Open Data Access i.e., getting beyond the data silos that have inadvertently emerged from Web 2.0.

Jana: Following the same theme, do you think this will lead to an internet full of corporate-controlled websites, with sites only written by developers rather than individuals?

Me: Not at all, we will have an Internet owned by it's participants i.e., You and the agents that work on your behalf.

Jana: So, you are imagining technologies such as Drupal or Wordpress, that allow users to manage sites without a great deal of knowledge of the nuts and bolts of current web technologies?

Me: Not at all! I envisage simple forms that provide conduits to powerful meshes of interlinked data spaces associated with Web users.

Jana: Given all of the buzz, and my own familiarity with ontology, I am just very curious if the semantic web is truly necessary?

Me:This question is no different than saying: I hear the Web is becoming a Database, and I wonder if a Data Dictionary is necessary, or even if access to structured data is necessary. It's also akin to saying: I accept "Search" as my only mechanism for Web interaction even though in reality, I really want to be able to "Find" and "Process" relevant things at a quicker rate than I do today, relative to the amount of information, and information processing time, at my disposal.

Jana: Will it be worth it to most people to go away from the web in its current form, with keyword searches on sites like Google, to a richer and more interconnected internet with potentially better search technology?

Me: As stated above, we need to add "Find" to the portfolio of functions we seek to perform against the Web. "Finding" and "Searching" are mutually inclusive pursuits at different ends of an activity spectrum.

Jana: For our more technical readers, I have a few additional questions: If no standardization comes about for mapping relational databases to domain ontologies, how do you see that as influencing the decisions about adoption of semantic technology by businesses? After all, the success of technology often lives or dies on its ease of adoption.

Me: Standardization of RDBMS to RDF Mapping is not the critical success factor here (of course it would be nice). As stated earlier, the issue of data integration that arises from IT infrastructural heterogeneity has been with decision makers in the enterprise for ever. The problem is now seeping into the broader consumer realm via Web ubiquity. The mistakes made in the enterprise realm are now playing out in the consumer Web realm. In both realms the critical success factors are:

Scalable productivity relative to exponential growth of data generated across Intranets, Extranets, and the Internet
Concept based Context Lenses that transcend logical and physical data heterogeneity by putting dereferencable URIs in front of the Line of Business Application Data and/or Web Data Spaces such as Blogs, Wikis, Discussion Forums etc.).

Discussion: OpenLink Data Spaces

Sat, 01 Dec 2007 20:26:12 GMT

I've been a little busier than usual, of late. So busy, that even minimal blog based discourse participation has been a challenge. Anyway, during this quiet period, a number of interesting data streams have come my way that relate to OpenLink Data Spaces (ODS). Thus, in typical fashion, I'll use this post (via URIs) to contribute a few nodes to the Giant Global Graph that is the Web of Structured Linked Data, also known as the Data Web, Semantic Data Web, or Web of Data (also see prior Data Web posts).

Here goes:

Alan Wilensky recalls his early encounters with OpenLink Data Spaces (circa. 2004)
Daniel Lewis shares his "state of the Semantic Data Web" findings
Daniel Lewis experiences OpenLink Data Space first hand en route to creating Data Spaces in the Clouds (the Fourth Platform).

In addition, in one week, courtesy of the Web, UK Semnantic Web Gatherings in Bristol and Oxford, I discover, interview, and employ Daniel :-) Imagine how long this would have taken to pull off via the Document Web, assuming I would even discover Daniel.

As with all things these days, the Web and Internet change everything, which includes talent discovery and recruitment.

A Global Social graph that is a mesh of Linked Data enables the process of recruitment, marketing, and other elements of busines management to be condensed down to a sending powerful beams across the aforementioned Graph :-) The only variable pieces are the traversal paths exposed to your beam via the beam's entry point URI. In my case, I have a single URI that exposes a Graph of critical paths for the Blogosphere (i.e data spaces of RSS Atom Feeds). Thus, I can discover if your profile matches the requirements associated with an opening at OpenLink Software (most of the time) before you do :-)

BTW - I just noticed that John Breslin described ODS as social-graph++ in his recent post, titled: Tales from the SIOC-o-sphere, part 6. In a funny way, this reminds of a post from the early blogosphere days about platforms and Weblog APIs (circa. 2003) about ODS (then exposed via the Blog Platform realm of Virtuoso).

Re-introducing the Virtuoso Virtual Database Engine

Wed, 17 Feb 2010 21:46:53 GMT

In recent times a lot of the commentary and focus re. Virtuoso has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked is the sophisticated Virtual Database Engine that provides the foundation for all of Virtuoso's data integration capabilities.

In this post I provide a brief re-introduction to this essential aspect of Virtuoso.

What is it?

This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).

Why is it important?

In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools

In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.

How do I use it?

The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:

Relational Database Federation

You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!

Conceptual Level Data Access using the RDF Model

You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).

You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).

It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.

Conceptual Level Data Access using ADO.NET Entity Frameworks

As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.

1995

Fri, 06 Jun 2008 11:54:33 GMT

1995: "

1995 (and the early 90’s) must have been a visionaries time of dreaming… most of their dreams are happening today.

Watch Steve Jobs (then of NeXT) discuss what he thinks will be popular in 1996 and beyond at OpenStep Days 1995:

Heres a spoiler:

There is static web document publishing
There is dynamic web document publishing
People will want to buy things off the web: e-commerce

The thing that OpenStep propose is:

WebObjects: an Object Oriented representation of Data available in distributed form over the web

What Steve was suggesting was one of the beginnings of the Data Web! Yep, Portable Distributed Objects and Enterprise Objects Framework was one of the influences of the Semantic Web / Linked Data Web…. not surprising as Tim Berners-Lee designed the initial web stack on a NeXT computer!

I’m going to spend a little time this evening figuring out how much ‘distributed objects’ stuff has been taken from the OpenStep stuff into the Objective-C + Cocoa environment. (<- I guess I must be quite geeky ;-))

(Via Daniel Lewis.)

The Trouble with Labels (Contd.): Data Integration & SOA

Sun, 12 Oct 2008 22:54:22 GMT

I just stumbled across an post from ITBusines Edge titled: How Semantic Technology Can Help Companies with Integration. While reading the post I encountered the term: Master Data Manager (MDM), and wondered to myself, "what's that?" only to realize it's the very same thing I described as a Data Virtualization or Virtual Database technology (circa. 1998).

Now, if re-labeling can confuse me when applied to a realm I've been intimately involved with for eons (internet time). I don't want to imagine what it does for others who aren't that intimately involved with the important data access and data integration realms.

On the more refreshing side, the article does shed some light on the potency of RDF and OWL when applied to the construction of conceptual views of heterogeneous data sources.

"How do you know that data coming from one place calculates net revenue the same way that data coming from another place does? You’ve got people using the same term for different things and different terms for the same things. How do you reconcile all of that? That’s really what semantic integration is about."

BTW - I discovered this article via another titled: Understanding Integration And How It Can Help with SOA, that covers SOA and Integration matters. Again, in this piece I feel the gradual realization of the virtues that RDF, OWL, and RDF Linked Data bring to bear in the vital realm of data integration across heterogeneous data silos.

Conclusion

A number of events, at the micro and macro economic levels, are forcing attention back to the issue of productive use of existing IT resources. The trouble with the aforementioned quest is that it ultimately unveils the global IT affliction known as: heterogeneous data silos, and the challenges of pain alleviation, that have been ignored forever or approached inadequately as clearly shown by the rapid build up of SOA horror stories in the data integration realm.

Data Integration via conceptualization of heterogenous data sources, that result in concrete conceptual layer data access and management, remains the greatest and most potent application of technologies associated with the "Semantic Web" and/or "Linked Data" monikers.

InforWorld 2003 Innovator article
2006 Podcast Interview with Jon Udell
Enterprise Information Integration
One of several posts about our Virtuoso Universal Server and Conceptual Model based data integration
History of Virtuoso
Mike Bergman's post titled: WOA: A New Enterprise Partner for Linked Data

Semantic Web Patterns: A Guide to Semantic Technologies (Update 1)

Thu, 17 Jul 2008 01:43:04 GMT

ReadWriteWeb via Alex Iskold have delivered another iteration of their "Guide to Semantic Technologies".

If you look at the title of this post (and their article) they seem to be accurately providing a guide to Semantic Technologies, so no qualms there. If on the other hand, this is supposed to he a guide to the "Semantic Web" as prescribed by TimBL then they are completely missing the essence of the whole subject, and demonstrably so I may add, since the entities: "ReadWriteWeb" and "Alex Iskold" are only describable today via the attributes of the documents they publish i.e their respective blogs and hosted blog posts.

Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.

TimBL's vision as espoused via the "Semantic Web" vision is about the production, consumption, and sharing of Data Objects via HTTP based Identifiers called URIs/IRIs (Hyperdata Links / Linked Data). It's how we use the Web as a Distributed Database where (as Jim Hendler once stated with immense clarity): I can point to records (entity instances) in your database (aka Data Space) from mine. Which is to say that if we can all point to data entities/objects (not just data entities of type "Document") using these Location, Value, and Structure independent Object Identifiers (courtesy of HTTP) we end up with a much more powerful Web, and one that is closer to the "Federated and Open" nature of the Web.

As I stated in a prior post, if you or your platform of choice aren't producing de-referencable URIs for your data objects, you may be Semantic (this data model predates the Web), but there is no "World Wide Web" in what you are doing.

What are the Benefits of the Semantic Web?

Consumer

Enterprise

Simple demo:

I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.

As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.

Bottom-line, the Semantic Web is about adding the missing "Open Data Access & Connectivity" feature to the current Document Web (we have to beyond regex, grep, xpath, xquery, full text search, and other literal scrapping approaches). The Linked Data Web of de-referencable data object URIs is the critical foundation layer that makes this feasible.

Remember, It's not about "Applications" it's about Data and actually freeing Data from the "tyranny of Applications". Unfortunately, application inadvertently always create silos (esp. on the Web) since entity data modeling, open data access, and other database technology realm matters, remain of secondary interest to many application developers.

Final comment, RDF facilitates Linked Data on the Web, but all RDF isn't endowed with de-referencable URIs (a major source of confusion and misunderstanding). Thus, you can have RDF Data Source Providers that simply project RDF data silos via Web Services APIs if RDF output emanating from a Web Service doesn't provide out-bound pathways to other data via de-referencable URIs. Of course the same also applies to Widgets that present you with all the things they've discovered without exposing de-referencable URIs for each item.

BTW - my final comments above aren't in anyway incongruent with devising successful business models for the Web. As you may or may not know, OpenLink is not only a major platform provider for the Semantic Web (expressed in our UDA, Virtuoso, OpenLink Data Spaces, and OAT products), we are also actively seeding Semantic Web (tribe: Linked Data of course) startups. For instance, Zitgist, which now has Mike Bergman as it's CEO alongside Frederick Giasson as CTO. Of course, I cannot do Zitgist justice via a footnote in a blog post, so I will expand further in a separate post.

Additional information about this blog post:

I didn't spent hours looking for URIs used in my hyperlinks
The post is best viewed via an RDF Linked Data aware user agents (OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, Tabulator).

Reminder: Why We Need Linked Data!

Fri, 02 Nov 2007 22:52:34 GMT

"The phrase Open Social implies portability of personal and social data. That would be exciting but there are entirely different protocols underway to deal with those ideas. As some people have told me tonight, it may have been more accurate to call this "OpenWidget" - though the press wouldn't have been as good. We've been waiting for data and identity portability - is this all we get?"
[Source: Read/Write Web's Commentary & Analysis of Google's OpenSocial API]

..Perhaps the world will read the terms of use of the API, and realize this is not an open API; this is a free API, owned and controlled by one company only: Google. Hopefully, the world will remember another time when Google offered a free API and then pulled it. Maybe the world will also take a deeper look and realize that the functionality is dependent on Google hosted technology, which has its own terms of service (including adding ads at the discretion of Google), and that building an OpenSocial application ties Google into your application, and Google into every social networking site that buys into the Dream. Hopefully the world will remember. Unlikely, though, as such memories are typically filtered in the Great Noise....
[Source: Poignant commentary excerpt from Shelly Power's Blog (as always)]

The "Semantic Data Web" vision has always been about "Data & Identity" portability across the Web. Its been that and more from day one.

In a nutshell, we continue to exhibit varying degrees of Cognitive Dissonance re the following realities:

The Network is the Computer (Internet/Intranet/Extranet depending on your TCP/IP usage scenarios)
The Web is the OS (ditto) and it provides a communications subsystem (Information BUS) comprised of

HTTP

URI

HTTP based Interprocess (i.e Web Apps are processes when you discard the HTML UI and interact with the application logic containers called "Web Services" behind the pages) ultimately hit data
Web Data is best Modeled as a Graph (RDF, Containers/Items/Item Types, Property & Value Pairs associated with something, and other labels)
Network are Graphs and vice versa
Social Networks are graphs where nodes are connected via social connectors ( [x]--knows-->[y] )
The Web is a Graph that exposes a People and Data Network (to the degree we allude to humans not being data containers i.e. just nodes in a network, otherwise we are talking about a Data Network)
Data access and manipulation depends inherently on canonical Data Access mechanisms such as Data Source Identifiers / Names (time-tested practice in various DBMS realms)
Data is forever, it is the basis of Information, and it is increasing exponentially due to proliferation of Web Services induced user activities (User Generated Content)
Survival, Vitality, Longevity, Efficiency, Productivity etc.. are all depend on our ability to process data effectively in a shrinking time continuum where Data and/or Information overload is the alternative.

The Data Web is about Presence over Eyeballs due to the following realities:

Eyeballs are input devices for a DNA based processing system (Humans). The aforementioned processing system can reason very well, but simply cannot effectively process masses of data or information
Widgets offer little value long term re. the imminent data and information overload dilemma, ditto Web pages (however pretty), and any other Eyeballs-only centric Web Apps
Computers (machines) are equipped with inorganic (non DNA) based processing power, they are equipped to process huge volumes of data and/or information, but they cannot reason
To be effective in the emerging frontier comprised of a Network Computer and a Web OS, we need an effective mechanism that makes best use of the capabilities possessed by humans and machines, by shifting the focus to creation and interaction with points of "Data Web Presence" that openly expose "Structured Linked Data".

This is why we need to inject a mesh of Linked Data into the existing Web. This is what the often misunderstood vision of the "Semantic Data Web" or "Web of Data" or "Web or Structured Data" is all about.

As stated earlier (point 10 above), "Data is forever" and there is only more of it to come! Sociality and associated Social Networking oriented solutions are at best a spec in the Web's ocean of data once you comprehend this reality.

Note: I am writing this post as an early implementor of GData and an implementor of RDF Linked Data technology and a "Web Purist".

OpenSocial implementation and support across our relevant product families: Virtuoso (i.e the Sponger Middleware for RDF component), OpenLink Data Spaces (Data Space Controller / Services), and the OpenLink Ajaxt Toolkit (i.e OAT Widgets and Libraries), is a triviality now that the OpenSocial APIs are public.

The concern I have, and the problem that remains mangled in the vast realms of Web Architecture incomprehension, is the fact that GData and GData based APIs cannot deliver Structured Linked Data in line with the essence of the Web without introducing "lock-in" that ultimately compromises the "Open Purity" of the Web. Facebook and Google's OpenSocial response to the Facebook juggernaut (i.e. open variant of the Facebook Activity Dashboard and Social Network functionality realms, primarily), are at best icebergs in the ocean we know as the "World Wide Web". The nice and predictable thing about icebergs is that they ultimately melt into the larger ocean :-)

On a related note, I had the pleasure of attending the W3C's RDF and DBMS Integration Workshop, last week. The event was well attended by organizations with knowledge, experience, and a vested interested in addressing the issues associated with exposing none RDF data (e.g. SQL) as RDF, and the imminence of data and/or information overload covered in different ways via the following presentations:

RDF Views of SQL Data

Orri Erling

Computer Science 2.0

Experiences re. solving SPARQL Access to Distributed Data Sources

Semantic Web Patterns: A Guide to Semantic Technologies (Update 2)

Thu, 17 Jul 2008 01:43:36 GMT

For all the one-way feed consumers and aggregators, and readers of the original post, here is a variant equipped hyperlinked phrases as opposed to words. As I stated in the prior post, the post (like most of my posts) was part experiment / dog-fodding of automatic tagging and hyper-linking functionality in OpenLink Data Spaces.

ReadWriteWeb via Alex Iskold's post have delivered another iteration of their "Guide to Semantic Technologies".

Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.

What are the Benefits of the Semantic Web?

Consumer

Enterprise

Simple demo:

I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.

As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.

Additional information about this blog post:

I didn't spent hours looking for URIs used in my hyperlinks
The post is best viewed via an RDF Linked Data aware user agents (OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, Tabulator).

10 Reasons to use OpenLink Data Spaces (ODS)

Fri, 08 Feb 2008 22:08:43 GMT

Via post by Daniel Lewis, titled:10 Reasons to use OpenLink Data Spaces

There are quite a few reasons to use OpenLink Data Spaces (ODS). Here are 10 of the reasons why I use ODS:

Its native support of DataPortability Recommendations such as RSS, Atom, APML, Yadis, OPML, Microformats, FOAF, SIOC, OpenID and OAuth.

Its native support of Semantic Web Technologies such as: RDF and SPARQL/SPARUL for querying.

Everything in ODS is an Object with its own URI, this is due to the underlying Object-Relational Architecture provided by Virtuoso.

It has all the social media components that you could need, including: blogs, wikis, social networks, feed readers, CRM and a calendar.

It is expandable by installing pre-configured components (called VADs), or by re-configuring a LAMP application to use Virtuoso. Some examples of current VADs include: MediaWiki, Wordpress and Drupal.

It works with external webservices such as: Facebook, del.icio.us and Flickr.

Everything within OpenLink Data Spaces is Linked Data, which provides more meaningful information than just plain structural information. This meaningful information could be used for complex inferencing systems, as ODS can be seen as a Knowledge Base.

ODS builds bridges between the existing static-document based web (aka ‘Web 1.0‘), the more dynamic, services-oriented, social and/or user-orientated webs (aka ‘Web 2.0‘) and the web which we are just going into, which is more data-orientated (aka ‘Web 3.0’ or ‘Linked Data Web’).

It is fully supportive of Cloud Computing, and can be installed on Amazon EC2.

Its released free under the GNU General Public License (GPL). [note]However, it is technically dual licensed as it lays on top of the Virtuoso Universal Server which has both Commercial and GPL licensing[/note]

The features above collectively provide users with a Linked Data Junction Box that may reside with corporate intranets or "out in the clouds" (Internet). You can consume, share, and publish data in a myriad of formats using a plethora of protocols, without any programming. ODS is simply about exposing the data from your Web 1.0, 2.0, 3.0 application interactions in structured from, with Linking, Sharing, and ultimately Meshing (not Mashing) in mind.

Note: Although ODS is equipped with a broad array of Web 2.0 style Applications, you do not need to use native ODS apps in order to exploit it's power. It binds to anything that supports the relevant protocols and data formats.

Enterprise 0.0, Linked Data, and Semantic Data Web

Tue, 05 Feb 2008 04:19:26 GMT

Last week we officially released Virtuoso 5.0.1 (in Commercial and Open Source Editions). The press release provided us with an official mechanism and timestamp for the current Virtuoso feature set.

A vital component of the new Virtuoso release is the finalization of our SQL to RDF mapping functionality -- enabling the declarative mapping of SQL Data to RDF. Additional technical insight covering other new features (delivered and pending) is provided by Orri Erling, as part of a series of post-Banff posts.

Why is SQL to RDF Mapping a Big Deal?

A majority of the world's data (especially in the enterprise realm) resides in SQL Databases. In addition, Open Access to the data residing in said databases remains the biggest challenge to enterprises for the following reasons:

SQL Data Sources are inherently heterogeneous because they are acquired with business applications that are in many cases inextricably bound to a particular DBMS engine
Data is predictably dirty
DBMS vendors ultimately hold the data captive and have traditionally resisted data access standards such as ODBC (*trust me they have, just look at the unprecedented bad press associated with ODBC the only truly platform independent data access API. Then look at how this bad press arose..*)

Enterprises have known from the beginning of modern corporate times that data access, discovery, and manipulation capabilities are inextricably linked to the "Real-time Enterprise" nirvana (hence my use of 0.0 before this becomes 3.0).

In my experience, as someone whose operated in the data access and data integration realms since the late '80s, I've painfully observed enterprises pursue, but unsuccessfully attain, full control over enterprise data (the prized asset of any organization) such that data-, information-, knowledge-workers are just a click away from commencing coherent platform and database independent data drill-downs and/or discovery that transcend intranet, internet, and extranet boundaries -- serendipitous interaction with relevant data, without compromise!

Okay, situation analysis done, we move on..

At our most recent (12th June) monthly Semantic Web Gathering, I unveiled to TimBL and a host of other attendees a simple, but powerful, demonstration of how Linked Data, as an aspect of the Semantic Data Web, can be applied to enterprise data integration challenges.

Actual SQL to RDF Mapping Demo / Experiment

Hypothesis

A SQL Schema can be effectively mapped declaratively to RDF such that SQL Rows morph into RDF Instance Data (Entity Sets) based on the Concepts & Properties defined in a Concrete Conceptual Data Model oriented Data Dictionary (RDF Schema and/or OWL Ontology). In addition, the solution must demonstrate how "Linked Data in the Web" is completely different from "Data on the Web" or "Linked Data on the Web" (btw - Tom Heath eloquently unleashed this point in his recent podcast interview with Talis).

Apparatus

An Ontology - in this case we simply derived the Northwind Ontology from the XML Schema based CSDL (Conceptual Schema Definition Language) used by Microsoft's public Astoria demo (specifically the Northwind Data Services demo). SQL Database Schema - Northwind (comes bundled with ACCESS, SQL Server, and Virtuoso) comprised of tables such as: Customer, Employee, Product, Category, Supplier, Shipper etc. OpenLink Virtuoso - SQL DBMS Engine (although this could have been any ODBC or JDBC accessible Database), SQL-RDF Metaschema Language, HTTP URL-rewriter, WebDAV Engine, and DBMS hosted XSLT processor Client Tools - iSPARQL Query Builder, RDF Browser (which could also have been Tabulator or DISCO or a standard Web Browser)

Experiment / Demo

Declaratively map the Northwind SQL Schema to RDF using the Virtuoso Meta Schema Language (see: Virtuoso PL based Northwind_SQL_RDF script)
Start browsing the data by clicking on the URIs that represent the RDF Data Model Entities resulting from the SQL to RDF Mapping

Observations

Via a single Data Link click I was able to obtain specific information about the Customer represented by the URI "ALFKI" (act of URI Dereferencing as you would an Object ID in an Object or Object-Relational Database)
Via a Dynamic Data Page I was able to explore all the entity relationships or specific entity data (i.e Exploratory or Entity specific dereferencing) in the Northwind Data Space
I was able to perform similar exploration (as per item 2) using our OpenLink Browser.

Conclusions

The vision of data, information, or knowledge at your fingertips is nigh! Thanks to the infrastructure provided by the Semantic Data Web (URIs, RDF Data Model, variety of RDF Serialization Formats[1][2][3], and Shared Data Dictionaries / Schemas / Ontologies [1][2][3][4][5]) it's now possible to Virtualize enterprise data from the Physical Storage Level, through the Logical Data Management Levels (Relational), up to a Concrete Conceptual Model (Graph) without operating system, development environment or framework, or database engine lock-in.

Next Steps

We produce a shared ontology for the CRM and Business Reporting Domains. I hope this experiment clarifies how this is quite achievable by converting XML Schemas to RDF Data Dictionaries (RDF Schemas or Ontologies). Stay tuned :-)

Also watch TimBL amplify and articulate Linked Data value in a recent interview.

Other Related Matters

To deliver a mechanism that facilitates the crystallization of this reality is a contribution of boundless magnitude (as we shall all see in due course). Thus, it is easy to understand why even "her majesty", the queen of England, simply had to get in on the act and appoint TimBL to the "British Order of Merit" :-)

Note: All of the demos above now work with IE & Safari (a "remember what Virtuoso is epiphany") by simply putting Virtuoso's DBMS hosted XSLT engine to use :-) This also applies to my earlier collection of demos from the Hello Data Web and other Data Web & Linked Data related demo style posts.

Linked Data is vital to Enterprise Integration driven Agility

Sat, 22 Mar 2008 18:13:41 GMT

John Schmidt, from Informatica, penned an interesting post titled: IT Doesn't Matter - Integration Does.

Yes, integration is hard, but I do profoundly believe that what's been happening on the Web over the last 10 or so years also applies to the Enterprise, and by this I absolutely do not mean "Enterprise 2.0" since "2.0" and productive agility do not compute in my realm of discourse.

large collections of RSS feeds, Wikiwords, Shared Bookmarks, Discussion Forums etc.. when disconnected at the data level (i.e. hosted in pages with no access to the "data behind") simply offer information deluge and inertia (there are only so many hours for processing opaque information sources in a given day).

Enterprises fundamentally need to process information efficiently as part of a perpetual assessment of their relative competitive Strengths, Weaknesses, Opportunities, and Threats (SWOT), in existing and/or future markets. Historically, IT acquisitions have run counter intuitively to the aforementioned quest for "Ability" due to the predominance of "rip and replace" approach technology acquisition that repeatedly creates and perpetuates information silos across Application, Database, Operating System, Development Environment boundaries. The sequence of events typically occurs as follows:

applications are acquired on a problem by problem basis
back-end application databases are discovered once ad-hoc information views are sought by information workers
back-end database disparity across applications is discovered once holistic views are sought by knowledge workers (typically domain experts).

In the early to mid 90's (pre ubiquitous Web), operating system, programming language, operating system, and development framework independence inside the enterprise was technically achievable via ODBC (due to it's platform independence). That said, DBMS specific ODBC channels alone couldn't address the holistic requirements associated with Conceptual Views of disparate data sources, hence the need for Data Access Virtualization via Virtual Database Engine technology.

Just as is the case on the Web today, with the emergence of the "Linked Data" meme, enterprises now have a powerful mechanism for exploiting the Data Integration benefits associated with generating Data Objects from disparate data sources, endowed with HTTP based IDs (URIs).

Conceptualizing access to data exposed Databases APIs, SOA based Web Services (SOAP style Web Services), Web 2.0 APIs (REST style Web Services), XML Views of SQL Data (SQLX), pure XML etc.. is problem area addressed by RDF aware middleware (RDFizers e.g Virtuoso Sponger).

Here are examples of what SQL Rows exposed as RDF Data Objects (identified using HTTP based URIs) would look like outside or behind a corporate firewall:

What's Good for the Web Goose (Personal Data Space URIs) is good for the Enterprise Gander (Enterprise Data Space URIs).

Data Access - A Cultural or Technical Challenge?

Programming the Universe

Wed, 03 Sep 2008 11:56:50 GMT

I continue to be intrigued by Yihong Ding's shared insights as expressed in part 2 of his blog series titled: Programming the Universe. The blog series shares Yihong's thoughts and reflections stimulated by the book, also titled: Programming the Universe.

What strikes me the most, is how sharing his findings act as serendipitous connectors to related insights and points of view, that ultimately create deeper shared knowledge about the core subject matter, courtesy of the Web hosted Blogosphere.

Metcalfe, Einstein, and Linked Data.

Semantic Data Web Epiphanies: One Node at a Time

Fri, 18 Jan 2008 07:27:27 GMT

In 2006, I stumbled across Jason Kolb (online) via a 4-part series of posts titled: Reinventing the Internet. At the time, I realized that Jason was postulating about what is popularly known today as "Data Portability", so I made contact with him (blogosphere style) via a post of my own titled: Data Spaces, Internet Reinvention, and the Semantic Web. Naturally, I tried to unveil to Jason the connection between his vision and the essence of the Semantic Web. Of course, he was skeptical :-)

Jason recently moved to Massachusetts which lead to me pinging him about our earlier blogosphere encounter and the emergence of a Data Portability Community. I also informed him about the fact that TimBL, myself, and a number of other Semantic Web technology enthusiasts, frequently meet on the 2nd Tuesday of each month at the MIT hosted Cambridge Semantic Web Gatherings, to discuss, demonstrate, debate all aspects of the Semantic Web. Luckily (for both of us), Jason attended the last event, and we got to meet each other in person.

Following our face to face meeting in Cambridge, a number of follow-on conversations ensued covering, Linked Data and practical applications of the Semantic Web vision. Jason writes about our exchanges a recent post titled: The Semantic Web. His passion for Data Portability enabled me to use OpenID and FOAF integration to connect the Semantic Web and Data Portability via the Linked Data concept.

During our conversations, Jason also eluded to the fact that he had already encountered OpenLink Software while working with our ODBC Drivers (part of or UDA product family) for IBM Informix (Single-Tier or Multi-Tier Editions) a few years ago (interesting random connection).

As I've stated in the past, I've always felt that the Semantic Web vision will materialize by way of a global epiphany. The count down to this inevitable event started at the birth of the blogosphere, ironically. And accelerated more recently, through the emergence of Web 2.0 and Social Networking, even more ironically :-)

The blogosphere started the process of Data Space coalescence via RSS/Atom based semi-strucutured data enclaves, Web 2.0 RDFpropagated Web Service usage en route to creating service provider controlled, data and information silosRDF, Social NetworkingRDF brought attention to the fact that User Generated Data wasn't actually owned or controlled by the Data Creators etc.

The emergence of "Data Portability" has created a palatable moniker for a clearly defined, and slightly easier to understand, problem: the meshing of Data and Identity in cyberspace i.e. individual points of presence in cyberspace, in the form of "Personal Data Spaces in the Clouds" (think: doing really powerful stuff with .name domains). In a sense, this is the critical inflection point between the document centric "Web of Linked Documents" and the data centric "Web or Linked Data". There is absolutely no other way solve this problem in a manner that alleviates the imminent challenges presented by information overload -- resulting from the exponential growth of user generated data across the Internet and enterprise Intranets.

Additional OpenLink Data Spaces Features

Mon, 11 Feb 2008 16:38:03 GMT

Daniel Lewis has published another post about OpenLink Data Spaces (ODS) functionality titled:A few new features in OpenLink Data Spaces, that exposes additional features (some hot out the oven).

OpenLink Data Spaces (ODS) now officially supports:

Attention Profiling Markup Language (APML).

Meaning of a Tag (MOAT) in conjunction with Simple Knowledge Organisation System (SKOS) and Social-Semantic Cloud of Tags (SCOT).

OAuth - an Open Authentication Protocol

Which means that OpenLink Data Spaces support all of the main standards being discussed in the DataPortability Interest Group!

APML Example:

All users of ODS automatically get a dynamically created APML file, for example: APML profile for Kingsley Idehen

The URI for an APML profile is: http://myopenlink.net/dataspace//apml.xml

Meaning of a Tag Example:

All users of ODS automatically have tag cloud information embedded inside their SIOC file, for example: SIOC for Kingsley Idehen on the Myopenlink.net installation of ODS.

But even better, MOAT has been implemented in the ODS Tagging System. This has been demonstrated in a recent test blog post by my colleague Mitko Iliev, the blog post comes up on the tag search: http://myopenlink.net/dataspace/imitko/weblog/Mitko%27s%20Weblog/tag/paris

Which can be put through the OpenLink Data Browser:

OpenLink Data Browser with Mitko Iliev’s Paris Blog Tag

OAuth Example:

OAuth Tokens and Secrets can be created for any ODS application. To do this:

you can log in to MyOpenlink.net beta service, the Live Demo ODS installation, an EC2 instance, or your local installation

then go to ‘Settings’

and then you will see ‘OAuth Keys’

you will then be able to choose the applications that you have instantiated and generate the token and secret for that app.

Related Document (Human) Links

OpenLink Data Spaces Official Page

OpenLink Software Page

OpenLink Data Spaces Wikipedia Page

Attention Profiling Markup Language Project Website

Meaning of a Tag Project Website

Simple Knowledge Organisation Systems Project Website

Social-Semantic Cloud of Tags Project Website

OAuth Protocol Website

DataPortability.org Website

Semantically Interlinked Online Communities Project Website

Remember (as per my most recent post about ODS), ODS is about unobtrusive fusion of Web 1.0, 2.0, and 3.0+ usage and interaction patterns. Thanks to a lot of recent standardization in the Semantic Web realm (e.g SPARQL), we are now employ the MOAT, SKOS, and SCOT ontologies as vehicles for Structured Tagging.

Structured Tagging?

This is how we take a key Web 2.0 feature (think 2D in a sense), bend it over, to create a Linked Data Web (Web 3.0) experience unobtrusively (see earlier posts re. Dimensions of Web). Thus, nobody has to change how they tag or where they tag, just expose ODS to the URLs of your Web 2.0 tagged content and it will produce URIs (Structured Data Object Identifiers) and a lnked data graph for your Tags Data Space (nee. Tag Cloud). ODS will construct a graph which exposes tag subject association, tag concept alignment / intended meaning, and tag frequencies, that ultimately deliver "relative disambiguation" of intended Tag Meaning (i.e. you can easily discern the taggers meaning via the Tags actual Data Space which is associated with the tagger). In a nutshell, the dynamics of relevance matching, ranking, and the like, change immensely without futile timeless debates about matters such as:

What's the Linked Data value proposition?

What's the Linked Data business model?

What's the Semantic Web Killer application?

We can just get on with demonstrating Linked Data value using what exists on the Web today. This is the approach we are deliberately taking with ODS.

Related Items

Stefano Mazzocch

response to Clay Shirky's 2005 talk

Ontology is Overrated: Links, Tags and Post-hoc Metadata

Tom Gruber

Ontology of Folksonomy: A Mash-up of Apples and Oranges

Tip: This post is best viewed via an RDF aware User Agent (e.g. a Browser or Data Viewer). I say this because the permalink of this post is a URI in a Linked Data Space (My Blog) comprised of more data than meets the eye (i.e. what you see when you read this post via a Document Web Browser) :-)

SPARQL Guide for the Javascript Developer

Wed, 26 Jan 2011 23:10:28 GMT

What?

A simple guide usable by any Javascript developer seeking to exploit SPARQL without hassles.

Why?

SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.

How?

SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.

Steps:

Determine which SPARQL endpoint you want to access e.g. DBpedia or a local Virtuoso instance (typically: http://localhost:8890/sparql).
If using Virtuoso, and you want to populate its quad store using SPARQL, assign "SPARQL_SPONGE" privileges to user "SPARQL" (this is basic control, more sophisticated WebID based ACLs are available for controlling SPARQL access).

Script:

/*
Demonstrating use of a single query to populate a # Virtuoso Quad Store via Javascript. 
*/

/* 
HTTP URL is constructed accordingly with JSON query results format as the default via mime type.
*/

function sparqlQuery(query, baseURL, format) {
	if(!format)
		format="application/json";
	var params={
		"default-graph": "", "should-sponge": "soft", "query": query,
		"debug": "on", "timeout": "", "format": format,
		"save": "display", "fname": ""
	};
	
	var querypart="";
	for(var k in params) {
		querypart+=k+"="+encodeURIComponent(params[k])+"&";
	}
	var queryURL=baseURL + '?' + querypart;
	if (window.XMLHttpRequest) {
  	xmlhttp=new XMLHttpRequest();
  }
  else {
  	xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }
  xmlhttp.open("GET",queryURL,false);
  xmlhttp.send();
  return JSON.parse(xmlhttp.responseText);
}

/*
setting Data Source Name (DSN)
*/

var dsn="http://dbpedia.org/resource/DBpedia";

/*
Virtuoso pragma "DEFINE get:soft "replace" instructs Virtuoso SPARQL engine to perform an HTTP GET using the IRI in FROM clause as Data Source URL with regards to 
DBMS record inserts
*/

var query="DEFINE get:soft \"replace\"\nSELECT DISTINCT * FROM <"+dsn+"> WHERE {?s ?p ?o}"; 
var data=sparqlQuery(query, "/sparql/");

Output

Place the snippet above into the