Kingsley Idehen's Blog Data Space

A Structured Web of Data Picture....

Sun, 22 Jul 2007 23:18:25 GMT

As the saying goes, "A picture speaks a thousand words..". In this post I simply provide a Data Web view of Mike Bergman's post titled: More Structure, More Terminology and (hopefully) More Clarity. I am hoping the OpenLink RDF Browser view of Mike's post aids in the understanding of the following terms:

Structured Data
Structured Data Resources
Information Resources

Note: I make no reference to "non information" resource, since a non-information resource is a data resource that may or may not contain 100% structured data. Also note that even when structured, the format may not be RDF.

Data Web and Major League Baseball

Fri, 02 Mar 2007 00:13:27 GMT

Using Solvent to extract data from structured pages: "

I’ve put together a short tutorial on Solvent, a very nice web page parsing utility. It is still a little rough around the edges, but I wanted to throw it out there and continue working on it since there isn’t a whole lot of existing documentation.
"
(Via Wing Yung.)

After reading the interesting post above I quickly (and quite easily) knocked together a "Dynamic Data Web Page for Major League Baseball" using data from the Virtuoso hosted edition of dbpedia. Just click on the "Explore" option whenever you click on a URI of interest. Enjoy!

Structured Data vs. Unstructured Data

Tue, 27 Jun 2006 05:39:09 GMT

There is an interesting article at regdeveloper.com titled: Structured data is boring and useless.. This article provides insight into a serious point of confusion about what exactly is structured vs. unstructured data. Here is a key excerpt:

"We all know that structured data is boring and useless; while unstructured data is sexy and chock full of value. Well, only up to a point, Lord Copper. Genuinely unstructured data can be a real nuisance - imagine extracting the return address from an unstructured letter, without letterhead and any of the formatting usually applied to letters. A letter may be thought of as unstructured data, but most business letters are, in fact, highly-structured." ....

Duncan Pauly, founder and chief technology officer of Coppereye add's eloquent insight to the conversation:

"The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure:
* The structure of the data itself.

* The structure of the container that hosts the data.

* The structure of the access method used to access the data.
These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."

Data understanding and appreciation is dwindling at a time when the reverse should be happening. We are supposed to be in the throws of the "Information Age", but for some reason this appears to have no correlation with data and "data access" in the minds of many -- as reflected in the broad contradictory positions taken re. unstructured data vs structured data, structured is boring and useless while unstructured is useful and sexy....

The difference between "Structured Containers" and "Structured Data" are clearly misunderstood by most (an unfortunate fact).

For instance all DBMS products are "Structured Containers" aligned to one or more data models (typically one). These products have been limited by proprietary data access APIs and underlying data model specificity when used in the "Open-world" model that is at the core of the World Wide Web. This confusion also carries over to the misconception that Web 2.0 and the Semantic/Data Web are mutually exclusive.

But things are changing fast, and the concept of multi-model DBMS products is beginning to crystalize. On our part, we have finally released the long promised "OpenLink Data Spaces" application layer that has been developed using our Virtuoso Universal Server. We have structured unified storage containment exposed to the data web cloud via endpoints for querying or accessing data using a variety of mechanisms that include; GData, OpenSearch, SPARQL, XQuery/XPath, SQL etc..

To be continued....

7 Things Brought to You by HTTP-based Hypermedia

Mon, 08 Nov 2010 20:29:43 GMT

There are some very powerful benefits that accrue from the use of HTTP based Hypermedia. 7 that come to mind immediately include:

Structured & Platform Independent Enterprise Data Virtualization -- concrete conceptual level access and provisioning of abstract domain entities such as Customers, Orders, Employees, Products, Countries, Competitors etc.
Distributed Application State (REST) -- application state transitions via links
Structured Data Representation (Linked Data) -- whole data data representation via links
Structured Identity (WebID) -- verifiable distributed identity
Structured Profiles (FOAF) -- platform independent profiles for people and organizations
Articulation of Structured Value Propositions (GoodRelations) -- Product & Service Offers, Business Entities, Locations, Business Hours, etc.
Structured Collaboration Spaces (SIOC) -- Blogs, Wikis, File Sharing, Discussion Forums, Aggregated Feeds, Statuses, Photo Galleries, Polls etc.

Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 5

Tue, 25 May 2010 21:10:28 GMT

After a long period of trying to demystify and unravel the wonders of standards compliant structured data access, combined with protocols (e.g., HTTP) that separate:

Identity,
Access,
Storage,
Representation, and
Presentation.

I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.

Some Related Work

Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).

It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.

Data 3.0 manifesto

An "Entity" is the "Referent" of an "Identifier."
An "Identifier" SHOULD provide a global, unambiguous, and unchanging (though it MAY be opaque!) "Name" for its "Referent".
A "Referent" MAY have many "Identifiers" (Names), but each "Identifier" MUST have only one "Referent".
Structured Entity Descriptions SHOULD be based on the Entity-Attribute-Value (EAV) Data Model, and SHOULD therefore take the form of one or more 3-tuples (triples), each comprised of:
- an "Identifier" that names an "Entity" (i.e., Entity Name),
- an "Identifier" that names an "Attribute" (i.e., Attribute Name), and
- an "Attribute Value", which may be an "Identifier" or a "Literal".
Structured Descriptions SHOULD be CARRIED by "Descriptor Documents" (i.e., purpose specific documents where Entity Identifiers, Attribute Identifiers, and Attribute Values are clearly discernible by the document's intended consumers, e.g., humans or machines).
Structured Descriptor Documents can contain (carry) several Structured Entity Descriptions
Stuctured Descriptor Documents SHOULD be network accessible via network addresses (e.g., HTTP URLs when dealing with HTTP-based Networks).
An Identifier SHOULD resolve (de-reference) to a Structured Representation of the Referent's Structured Description.

Referent, Identifier, and Descriptor/Sense (The Data Perception Trinity) illustration
Referent, Identifier, and Descriptor/Sense Trinity (as exploited in FOAF+SSL based Secure WebIDs) illustration
Demystifying Linked Data via EAV Model based Structured Descriptions
What do people have against URIs and URLs?
The URI, URL, and Linked Data Meme's Generic HTTP URI
Simple Explanation of RDF and Linked Data Dynamics
Linked Data and Identity
FOAF+SSL FAQ
LOD Community Thread (showing evolution of this manifesto based on feedback from members such as Richard Cyganiak).
Googlebase Data API Docs
Google Data Protocol (GData)
Microsoft's OData Protocol
Magic of De-referencable Names and actual Data via Binky Video
Social Objects Presentation (aka. Social Linked Data Objects) - by Jyri Engeström
What's a Reference?

Web 1.0, 2.0, and 3.0 (Yet Again)

Mon, 15 Sep 2008 17:48:15 GMT

If your Web presence doesn't extend beyond (X)HTML web pages, you are only participating in Web usage Dimension 1.0.

If your Web presence goes beyond (X)HTML pages, via the addition of REST or SOAP based Web Services, then you re participating in Web usage dimension 2.0.

If you Web presence includes all of the above, with the addition of structured data interlinked with structured data across other points of presence on the Web, then you are participating in Web usage dimension 3.0 i.e., "Linked Data Web" or "Web of Data" or "Data Web".

BTW - If you've already done all of the above, and you have started building intelligent agents that exploit the aforementioned structured interlinked data substrate, then you are already in Web usage dimension 4.0.

Web usage pattern evolution

XBRL Ontology Project

Tue, 05 Feb 2008 04:20:04 GMT

Introducing the XBRL Ontology Project.

The XBRL Ontology Project seeks to address the obvious need to bring structured financial data into the emerging Semantic Data Web as articulated in this excerpt from the inaugural mailing list post:

The parallel evolution of XBRL and the Semantic Web is one of the more puzzling current day technology misnomers:
The Semantic Web expresses a vision about a Web of Data connected by formal meaning (Context). Congruently, XBRL espouses a vision whereby by formally defined Financial Data is accessible via the Web (and other networks). Sadly, we have an abundance of XBRL Taxonomies, pretty wide adoption of the XBRL standard globally, but not a single RDFS Schema or OWL Ontology, derived from said taxonomies, in sight!

Read on..."

(Via XBRL Ontology Specification Group Google Group.)

Amazon.com RSS Feeds

Thu, 22 Jun 2006 12:56:58 GMT

Amazon RSS Feeds

RSS feeds are everywhere, and they are changing the Web landscape fast. The Web is shifting from distributed freeform database, to distributed semi-structured database.

Amazon.com RSS Feeds They never got around to it, so we set up 160+ separate RSS channels for darn near every type of product on Amazon.com for you. If you have any feedback for this new (free) service, please let us know immediately! We're looking to make it an outstanding and permanent part to your collection. Enjoy! (Chris) [via Lockergnome's Bits and Bytes]

Your Web Site is gradually becoming a database (what?). Yes, your Web Site needs to be driven by database software that can rapidly create RSS feeds for your organizations non XML and XML data sources. Your web site needs to provide direct data access to users, bots, Web Services.

Here is my blog database for instance, you can query the XML data in this database using XQuery, XPath, and Web Services (if I decide to publish any of my XML Query Templates as Web Services).

Note the teaser here, each XML document is zero bytes! This is becuase these are live Virtuoso SQL-XML documents that are producing a variety of XML documents on the fly, which means that they retain a high degree of sensitivity to changes in the underlying databases supplying the data. I could have chosen to make these persistent XML docs with interval based synchronization with the backen data sources (but I chose not to for maximum effect).

As you can see SQL and XML (Relational and Hierarchical Models) engines can co-exist in a single server, ditto Object-Relational (which might be hidden from view but could be used in the SQL that serves the SQL-XML docs), ditto Full Text (see the search feature of this blog) and finally, ditto directed graph model for accessing my RDF data.(more on this as the RDF data pool increases).

Linked Data Web Collaborators: Introducing Structured Dynamics

Sat, 03 Jan 2009 04:27:26 GMT

As indicated in posts from Fred Giasson and Mike Bergman, the Zitgist incubation effort that contributed to the delivery of vital Linked Data Web infrastructure components such as TalkDigger (discourse discovery and participation), PingTheSemanticWeb (ground-zero data source for most Semantic Web search engines), UMBEL (binding layer for Upper and Lower Ontologies amongst other things), Music Ontology (enabling meaningful description of Music), and Bibliographic Ontology (enabling meaningful description of Bibliographic content), is now ready to continue its business development and technology growth as a going concern known as Structured Dynamics.

With great joy and pride, I wish Structured Dynamics all the success they deserve. Naturally, the collaborations and close relationship between OpenLink Software and its latest technology partner will continue -- especially as we collectively work towards a more comprehendible and pragmatic Web of Linked Data for developers (across Web 1.0, 2.0, 3.0, and beyond), end-users (information- and knowledge-workers), and entrepreneurs (driven by quality and tangible value contribution).

Web of Linked Data & Hyperdata

Tue, 05 Feb 2008 01:43:55 GMT

I've just read the extensive post by Nova Spivack titled: The Semantic Web, Collective Intelligence and Hyperdata, courtesy of a post by Danny Ayres titled: Confused about the Semantic Web , in response to a post by Tim O'Reilly titled: Economist Confused About the Semantic Web? .

My Comments:

Hyperdata is short for HyperLinked Data :-) The same applies to Linked Data. Thus, we have two literal labels for the same core Concept. HTTP is the enabling protocol for "Hyper-linking" Documents and associated Structured Data via the World Wide Web (Web for short). Data Links associated with Structured Data contained in, or hosted by, Documents on the Web.

RDFa, eRDF, GRDDL, SPARQL Query Language, SPARQL Protocol (SOAP or REST service), SPARQL Results Serializations (XML or JSON) collectively provide a myriad of unobtrusive routes to structured data embedded within, or associated with, existing Web Documents.

As Danny already states, ontologies are not prerequisites for producing structured data using the RDF Data Model. They simply aid the ability to express one's self clearly (i.e. no repetition or ambiguity) across a broad audience of machines (directly) and their human masters (indirectly).

Using the crux of this post as the anecdote: The Semantic Data Web would simplify the process of claiming and/or proving that Linked Data and Hyperdata describe the same concept. It achieves this by using Triples (Subject, Predicate, Object) expressed in various forms (N3, Turtle, RDF/XML etc.) to formalize claims in a form palatable to electronic agents (machines) operating on behalf of Humans. In a nutshell, this increases human productive by completely obliterates the erstwhile exponential costs of discovering data, information, and knowledge.

BTW - for full effect, view this post (i.e. cut and paste the Permalink URI of this post, below) into an RDF Browser such as:

Another Paper Discussing RDF Data Publishing

Wed, 25 Jul 2007 02:02:56 GMT

I stumbled across an article titled: Thoughts on Compound Documents, from the Open Archives initiative (OAI). The article discusses the increasingly popular topic of deploying structured data containers on the Web.

This article, like the one from Mike, and our soon to be released Linked Data Deployment white paper, collectively address the main topic without inadvertent distraction by the misnomer: non-information resource. For instance, the OAI article uses the term: Generic Resource instead of Non-informaton Resource.

The Semantic Data Web is here, but we need to diffuse this reality across a broader spectrum of Web communities, so as to avoid unnecessary uptake inertia that can arise due basic incomprehension of key concepts such as Linked Data deployment.

Linked Data & The Web Information BUS

Wed, 08 Aug 2007 22:26:55 GMT

Chris Bizer, Richard Cyganiak, and Tom Heath have just published a Linked Data Publishing Tutorial that provides a guide to the mechanics of Linked Data injection into the Semantic Data Web.

On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS.

What is an Information BUS?

Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol.

The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers).

What are Web Information Payloads?

HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc).

Examples of Information Resource and Data Source URIs:

http://demo.openlinksw.com/Northwind/Customer/ALFKI

http://demo.openlinksw.com/Northwind/Customer/ALFKI#this

Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL).

What about Structured Data?

Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following:

HTML - Web Resource with presentation focused structure (Web 1.0 dominant payload form)
XML - Web Resource with structure that separates presentation and data (Web 2.0's dominant payload form).

The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as an Information Resource. It's comprised of granular data items called "Entities", that expose fine grained properties values, individual and/or group characteristics (attributes), and relationships (associations) with other Entities.

Where is this all headed?

The Web is in the final stages of the 3rd phase of it's evolution. A phase characterized by the shuttling of structured data payloads (RDF) alongside less data oriented payloads (HTML, XHTML, XML etc.). As you can see, Linked Data and Structured Data are both terms used to describe the addition of more data centric payloads to the Web. Thus, you could view the process of creating a Structured Web of Linked Data as follows:

Identify or Create Structured Data Sources
Name these Data Sources using Data Source URIs
Expose Structured Data Sources to the Web as Linked Data using Information Resource (conduit) URIs

Conclusions

The Semantic Data Web is an evolution of the current Web (an Information Space) that adds structured data payloads (RDF) to current, less data oriented, structured payloads (HTML, XHTML, XML, and others).

The Semantic Data Web is increasingly seen as an inevitability because it's rapidly reaching the point of critical mass (i.e. network effect kick-in). As a result, Data Web emphasis is moving away from: "What is the Semantic Data Web?" To: "How will Semantic Data Web make our globally interconnected village an even better place?", relative to the contributions accrued from the Web thus far. Remember, the initial "Document Web" (Web 1.0) bootstrapped because of the benefits it delivered to blurb-style content publishing (remember the term electronic brochure-ware?). Likewise, in the case of the "Services Web" (Web 2.0), the bootstrap occurred because it delivered platform independence to Web Application Developers - enabling them to expose application logic behind Web Services. It is my expectation that the Data Integration prowess of the Data Web will create a value exchange realm for data architects and other practitioners from the database and data access realms.

URIBurner: Painless Generation & Exploitation of Linked Data (Update 1 - Demo Links Added)

Thu, 11 Mar 2010 15:16:34 GMT

What is URIBurner?

A service from OpenLink Software, available at: http://uriburner.com, that enables anyone to generate structured descriptions -on the fly- for resources that are already published to HTTP based networks. These descriptions exist as hypermedia resource representations where links are used to identify:

the entity (data object or datum) being described,
each of its attributes, and
each of its attributes values (optionally).

The hypermedia resource representation outlined above is what is commonly known as an Entity-Attribute-Value (EAV) Graph. The use of generic HTTP scheme based Identifiers is what distinguishes this type of hypermedia resource from others.

Why is it Important?

The virtues (dual pronged serendipitous discovery) of publishing HTTP based Linked Data across public (World Wide Web) or private (Intranets and/or Extranets) is rapidly becoming clearer to everyone. That said, the nuance laced nature of Linked Data publishing presents significant challenges to most. Thus, for Linked Data to really blossom the process of publishing needs to be simplified i.e., "just click and go" (for human interaction) or REST-ful orchestration of HTTP CRUD (Create, Read, Update, Delete) operations between Client Applications and Linked Data Servers.

How Do I Use It?

In similar vane to the role played by FeedBurner with regards to Atom and RSS feed generation, during the early stages of the Blogosphere, it enables anyone to publish Linked Data bearing hypermedia resources on an HTTP network. Thus, its usage covers two profiles: Content Publisher and Content Consumer.

Content Publisher

The steps that follow cover all you need to do:

place a tag within your HTTP based hypermedia resource (e.g. within section for HTML )
use a URL via the @href attribute value to identify the location of the structured description of your resource, in this case it takes the form: http://linkeddata.uriburner.com/about/id/{scheme-or-protocol}/{your-hostname-or-authority}/{your-local-resource}
for human visibility you may consider adding associating a button (as you do with Atom and RSS) with the URL above.

That's it! The discoverability (SDQ) of your content has just multiplied significantly, its structured description is now part of the Linked Data Cloud with a reference back to your site (which is now a bona fide HTTP based Linked Data Space).

Examples

HTML+RDFa based representation of a structured resource description:

JSON based representation of a structured resource description:

N3 based representation of a structured resource description:

RDF/XML based representations of a structured resource description:

Content Consumer

As an end-user, obtaining a structured description of any resource published to an HTTP network boils down to the following steps:

go to: http://uriburner.com
drag the Page Metadata Bookmarklet link to your Browser's toolbar
whenever you encounter a resource of interest (e.g. an HTML page) simply click on the Bookmarklet
you will be presented with an HTML representation of a structured resource description (i.e., identifier of the entity being described, its attributes, and its attribute values will be clearly presented).

Examples

Description of a Book culled from an Amazon web page
Description of a product offering culled from a BestBuy web page
Description of a product (a camera) culled from a CNET web page
Description of the same CNET product as an Offer on eBay (exposed by the description above via seeAlso property value).

If you are a developer, you can simply perform an HTTP operation request (from your development environment of choice) using any of the URL patterns presented below:

HTML:

curl -I -H "Accept: text/html" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}

JSON:

curl -I -H "Accept: application/json" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl http://linkeddata.uriburner.com/about/data/json/{scheme}/{authority}/{local-path}

Notation 3 (N3):

curl -I -H "Accept: text/n3" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl http://linkeddata.uriburner.com/about/data/n3/{scheme}/{authority}/{local-path}

curl -I -H "Accept: text/turtle" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl http://linkeddata.uriburner.com/about/data/ttl/{scheme}/{authority}/{local-path}

RDF/XML:

curl -I -H "Accept: application/rdf+xml" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
curl http://linkeddata.uriburner.com/about/data/xml/{scheme}/{authority}/{local-path}

Conclusion

URIBurner is a "deceptively simple" solution for cost-effective exploitation of HTTP based Linked Data meshes. It doesn't require any programming or customization en route to immediately realizing its virtues.

If you like what URIBurner offers, but prefer to leverage its capabilities within your domain -- such that resource description URLs reside in your domain, all you have to do is perform the following steps:

download a copy of Virtuoso (for local desktop, workgroup, or data center installation) or
instantiate Virtuoso via the Amazon EC2 Cloud
enable the Sponger Middleware component via the RDF Mapper VAD package (which includes cartridges for over 30 different resources types)

When you install your own URIBurner instances, you also have the ability to perform customizations that increase resource description fidelity in line with your specific needs. All you need to do is develop a custom extractor cartridge and/or meta cartridge.

Virtuoso Sponger Middleware -- (technology behind URIBurner Service)
Animation demonstrating how the Virtuoso Sponger works.

Revisiting HTTP based Linked Data (Update 1 - Demo Video Links Added)

Mon, 08 Mar 2010 14:59:37 GMT

Motivation for this post arose from a series of Twitter exchanges between Tony Hirst and I, in relation to his blog post titled: So What Is It About Linked Data that Makes it Linked Data™ ?

At the end of the marathon session, it was clear to me that a blog post was required for future reference, at the very least :-)

What is Linked Data?

"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.

Why is it Important?

Information makes the world tick!

Information doesn't exist without data to contextualize.

Information is inaccessible without a projection (presentation) medium.

All information (without exception, when produced by humans) is subjective. Thus, to truly maximize the innate heterogeneity of collective human intelligence, loose coupling of our information and associated data sources is imperative.

How is Linked Data Delivered?

Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources bearing structured representations of data object descriptions. Remember, you have a single Identifier abstraction (generic HTTP URI) that embodies: Data Object Name and Data Representation Location (aka URL).

How are Linked Data Object Representations Structured?

A structured representation of data exists when an Entity (Datum), its Attributes, and its Attribute Values are clearly discernible. In the case of a Linked Data Object, structured descriptions take the form of a hypermedia based Entity-Attribute-Value (EAV) graph pictorial -- where each Entity, its Attributes, and its Attribute Values (optionally) are identified using Generic HTTP URIs.

Examples of structured data representation formats (content types) associated with Linked Data Objects include:

text/html
text/turtle
text/n3
application/json
application/rdf+xml
Others

How Do I Create Linked Data oriented Hypermedia Resources?

You markup resources by expressing distinct entity-attribute-value statements (basically these a 3-tuple records) using a variety of notations:

(X)HTML+RDFa,
JSON,
Turtle,
N3,
TriX,
TriG,
RDF/XML, and
Others (for instance you can use Atom data format extensions to model EAV graph as per OData initiative from Microsoft).

You can achieve this task using any of the following approaches:

Notepad
WYSIWYG Editor
Transformation of Database Records via Middleware
Transformation of XML based Web Services output via Middleware
Transformation of other Hypermedia Resources via Middleware
Transformation of non Hypermedia Resources via Middleware
Use a platform that delivers all of the above.

Practical Examples of Linked Data Objects Enable

Describe Who You Are, What You Offer, and What You Need via your structured profile, then leave your HTTP network to perform the REST (serendipitous discovery of relevant things)
Identify (via map overlay) all items of interest based on a 2km+ radious of my current location (this could include vendor offerings or services sought by existing or future customers)
Share the latest and greatest family photos with family members *only* without forcing them to signup for Yet Another Web 2.0 service or Social Network
No repetitive signup and username and password based login sequences per Web 2.0 or Mobile Application combo
Going beyond imprecise Keyword Search to the new frontier of Precision Find - Example, Find Data Objects associated with the keywords: Tiger, while enabling the seeker disambiguate across the "Who", "What", "Where", "When" dimensions (with negation capability)
Determine how two Data Objects are Connected - person to person, person to subject matter etc. (LinkedIn outside the walled garden)
Use any resource address (e.g blog or bookmark URL) as the conduit into a Data Object mesh that exposes all associated Entities and their social network relationships
Apply patterns (social dimensions) above to traditional enterprise data sources in combination (optionally) with external data without compromising security etc.

How Do OpenLink Software Products Enable Linked Data Exploitation?

Our data access middleware heritage (which spans 16+ years) has enabled us to assemble a rich portfolio of coherently integrated products that enable cost-effective evaluation and utilization of Linked Data, without writing a single line of code, or exposing you to the hidden, but extensive admin and configuration costs. Post installation, the benefits of Linked Data simply materialize (along the lines described above).

Our main Linked Data oriented products include:

OpenLink Data Explorer -- visualizes Linked Data or Linked Data transformed "on the fly" from hypermedia and non hypermedia data sources
URIBurner -- a "deceptively simple" solution that enables the generation of Linked Data "on the fly" from a broad collection of data sources and resource types
OpenLink Data Spaces -- a platform for enterprises and individuals that enhances distributed collaboration via Linked Data driven virtualization of data across its native and/or 3rd party content manager for: Blogs, Wikis, Shared Bookmarks, Discussion Forums, Social Networks etc
OpenLink Virtuoso -- a secure and high-performance native hybrid data server (Relational, RDF-Graph, Document models) that includes in-built Linked Data transformation middleware (aka. Sponger).

Hypertext Transfer Protocol 1.1 RFC
Open Data Protocol Glossary
Simple Explanation of RDF and Linked Data Dynamics
Collection of post from the past about Linked Data
Are We There Yet Re. Web++? -- includes link to podcast conversation with Jon Udell
Web of Linked Data Pivoting Demo from TED -- by Microsoft's Gary Flake
Microsoft Pivot atop Virtuoso Quad Store's Faceted Browser Engine-- My Demonstration of EAV model transcending data representation variations (i.e., RDF's EAV data model data served up in Microsoft CXML data representation format).

Response to: What is Web 3.0 and Why Should I Care?

Thu, 29 Jan 2009 18:45:11 GMT

Another post done in response to lost comments. This time, the comments relate to Robin Bloor's article titled: What is Web 3.0 and Why Should I Care?

Robin:

Web 3.0 is fundamentally about the World Wid Web becoming a structured database equipped with a formal data model (RDF which is a moniker for Entity-Attribute-Value with Classes & Relationships based Graph Model), query language, and a protocol for handling divrerse data representational requirements via negotiation

Web 3.0 is about a Web that facilitates serendipitous discovery of relevant things; thereby making serendipitous discovery quotient (SDQ), rather than search engine optimization (SEO), the critical success factor that drives how resources get published on the Web.

Personally, I believe we are on the cusp of a major industry inflection re. how we interact with data hosted in computing spaces. In a nutshell, the conceptual model interaction based on real-world entities such as people, places, and other things (including abstract subject matter) will usurp traditional logical model interaction based on rows and columns of typed and/or untyped literal values exemplified by relational data access and management systems.

Labels such as "Web 3.0", "Linked Data", and "Semantic Web", are simply about the aforementioned model transition playing out on the World Wide Web and across private Linked Data Webs such as Intranets & Extranets, as exemplified emergence of the "Master Data Management" label/buzzword.

What's the critical infrastructure supporting Web 3.0?

As was the case with Web Services re. Web 2.0, there is a critical piece of infrastructure driving the evolution in question, and in this case it comes down to the evolution of Hyperlinking.

We now have a new and complimentary variant of Hyperlinking commonly referred to as "Hyperdata" that now sits alongside "Hypertext". Hyperdata when used in conjunction with HTTP based URIs as Data Source Names (or Identifiers), delivers a potent and granular data access mechanism scoped down to the datum (object or record) level; which is much different from the document (record or entity container) level linkage that Hypertext accords.

In addition, the incorporation of HTTP into this new and enhanced granular Data Source Naming mechanism also addresses past challenges relating to separation of data, data representation, and data transmission protocols -- remember XDR woes familiar to all sockets level programmers -- courtesy of in-built content negotiation. Hence, via a simple HTTP GET --against a Data Source Name exposed by a Hyperdata link -- I can negotiate (from client or server sides) the exact representation of the description (entity-attribute-value graph) of an Entity / Data Object / Resource, dispatched by a data server.

For example, this is how a description of entity "Me" ends up being available in (X)HTML or RDF document representations (as you will observe when you click on that link to my Personal URI).

The foundation of what I describe above comes from:

Entity-Attribute-Value & Class Relationship Data Model (originating from LISP era with detours via the Object Database era. into the Triples approach in RDF)
Use of HTTP based Identifiers in the Entity ID construction process
SPARQL query language for the Data Model.

Some live examples from DBpedia:

http://dbpedia.org/resource/Linked_Data
http://dbpedia.org/resource/Hyperdata
http://dbpedia.org/resource/Entity-attribute-value_model
http://dbpedia.org/resource/Benjamin_Franklin

Zero-based Cognition (Difference between Humans & Machines)

Fri, 17 Oct 2008 11:23:42 GMT

Human beings, courtesy of the gift of cognition, are capable of creating reusable data, information, knowledge from simple or complex observations in an abstract realm. A machine on the other hand can only discover and infere based on a substrate of structured and interlinked data, information, or knowledge in a concrete human created realm e.g., a Web of Linked Data.

As is quite common these days, Yihong Ding has written another great piece titled: A New Take on Internet-Based AI, that delves into this specific matter. Yihong expresses an vital insight as excerpted below:

"Artificial intelligence is supposed to let machines do things for people. The risk is that we may rely too much on them. Two months ago, for instance, writer Nicolas Carr asked whether Google is making us stupid. In my recent blog series "The Age of Google," I extended Carr’s discussion. Due to the success of Google, we are relying more on objective search than on active thinking to answer questions. In consequence, the more Google has advanced its service, the farther Google users have drifted from active thinking."

"But at least one form of human thinking cannot be replaced by machines. I am not talking about inference/discovery (which machines may be capable of doing) but about creation/generation-from-nothing (which I don’t believe machines may ever do)."

I tend to describe our ability to create/generate-from-nothing as "Zero-based Cognition", which is initially about "thought" and the eventually about "speed of thought dissemination" and "global thought meshing".

In a peculiar sense, Zero-based cognition is analogous to Zero-based budgeting from the accounting realm :-)

W3C's SPARQLing Data Access Ingenuity

Thu, 17 Jan 2008 20:41:04 GMT

The W3C officially unveiled the SPARQL Query Language today via a press release titled: W3C Opens Data on the Web with SPARQL.

What is SPARQL?

A query language for the burgeoning Structured & Linked Data Web (aka Semantic Web / Giant Global Graph). Like SQL, for the Relational Data Model, it provides a query language for the Graph based RDF Data Model.

It's also a REST or SOAP based Web Service that exposes SPARQL access to RDF Data via an endpoint.

In addition, it's also a Query Results Serialization format that includes XML and JSON support.

Why is it Important?

It brings important clarity to the notion of the "Web as a Database" by transforming existing Web Sites, Portals, and Web Services into bona fide corpus of Mesh-able (rather than Mash-able) Data Sources. For instance, you can perform queries that join one or more of the aforementioned data sources in exactly the same manner (albeit different syntax) as you would one or more SQL Tables.

Example:

-- SPARQL equivalent of SQL SELECT * against my personal data space hosted FOAF file

SELECT DISTINCT ?s ?p ?o FROM WHERE {?s ?p ?o}

-- SPARQL against my social network -- Note: My SPARQL will be beamed across all of contacts in the social networks of my contacts as long as they are all HTTP URI based within each data space

PREFIX foaf: SELECT DISTINCT ?Person FROM WHERE {?s a foaf:Person; foaf:knows ?Person}

Note: you can use the basic SPARQL Endpoint, SPARQL Query By Example, or SPARQL Query Builder Demo tool to experiment with the demonstration queries above.

How Do I use It?

SPARQL is implemented by RDF Data Management Systems (Triple or Quad Stores) just as SQL is implemented by Relational Database Management Systems. The aforementioned data management systems will typically expose SPARQL access via a SPARQL endpoint.

Where are it's implementations?

A SPARQL implementors Testimonial page accompanies the SPARQL press release. In addition the is a growing collection of implementations on the ESW Wiki Page for SPARQL compliant RDF Triple & Quad Stores.

Is this really a big deal?

Yes! SPARQL facilitates an unobtrusive manifestation of a Linked Data Web by way of natural extension of the existing Document Web i.e these Web enclaves co-exist in symbiotic fashion.

As DBpedia very clearly demonstrates, Linked Data makes the Semantic Web demonstrable and much easier to comprehend. Without SPARQL there would be no mechanism for Linked Data deployment, and without Linked Data there is no mechanism for Beaming Queries (directly or indirectly) across the Giant Global Graph of data hosted by Social Networks, Shard Bookmarks Services, Weblogs, Wikis, RSS/Atom/OPML feeds, Photo Galleries and other Web accessible Data Sources (Data Spaces).

Related items

Cool URIs

Publishing Linked Data Tutorial

Detailed SPARQL Query Examples using SIOC Data Spaces

Detailed SPARQL Query Examples using FOAF Data Spaces

Structured writing, structured search

Thu, 22 Jun 2006 12:56:58 GMT

Structured writing, structured search

From a user's point of view, XPath query strings are pretty darned geeky. I'm hopeless with them myself unless I have examples in front of me. I find that having a list of examples available in the context of my own live data, and synchronizing it to an input box in which examples can be modified, leads me to discover and record more useful patterns. A subtler thing happens too. As you're writing the XHTML, the search possibilities begin to guide your choices. [Full story at O'Reilly Network]

I always think that my latest invention is the coolest one ever, so you should take this with a grain of salt, but I can't stop thinking about the implications of this one. First, because of the cross-browser, cross-OS angle introduced by Mozilla. Second, because it strikes me that XPath really could be packaged up for use by civilians (i.e., non-geeks). Third, because the availability of structured search -- during the writing process -- can have a profound effect on how (and why) we structure what we write. ... [via Jon's Radio]

Semantic Web Value Proposition

Fri, 21 Sep 2007 12:05:07 GMT

The motivation behind this post is a response to the Read/WriteWeb post titled: Semantic Web: Difficulties with the Classic Approach.

First off, I am going to focus on the Semantic Data Web aspect of the overall Semantic Web vision (a continuum) as this is what we have now. I am also writing this post as a deliberate contribution to the discourse swirling around the real topic: Semantic Web Value Proposition.

Situation Analysis

We are in the early stages of the long anticipated Knowledge Economy. That being the case, it would be safe to assume that information access, processing, and dissemination are of utmost importance to individuals and organizations alike. You don't produce knowledge in a vacum! Likewise, you can produce Information in a vacum, you need Data.

The Semantic Data Web's value to Individuals

Problem:

Increasingly, Blogs, Wikis, Shared Bookmarks, Photo Galleries, Discussion Forums, Shared Calendars and the like, have become invaluable tools for individual and organizational participation in Web enabled global discourse (where a lot of knowledge is discovered). These tools, are typically associated with Web 2.0, implying Read-Write access via Web Services, centralized application hosting, and data lock-in (silos).

The reality expressed above is a recipe for "Information Overload" and complete annihilation of ones effective pursuit and exploitation of knowledge due "Time Scarcity" (note: disconnecting is not an option). Information abundance is inversely related to available processing time (for humans in particular). In my case for instance, I was actively subscribed to over 500+ RSS feeds in 2003. As of today, I've simply stopped counting, and that's just my Weblog Data Space. Then add to that, all of the Discussions I track across Blogs, wikis, message boards, mailing lists, traditional usnet discussion forumns, and the like, and I think you get the picture.

Beyond information overload, Web 2.0 data is "Semi-Structured" by way of it's dominant data containers ((X)HTML, RSS, Atom documents and data streams etc.) lacking semantics that formally expose individual data items as distinct entities, endowed with unambiguous naming / identification, descriptive attributes (a type of property/predicate), and relationships (a type of property/predicate).

Solution:

Devise a standard for Structured Data Semantics that is compatible with the Web Information BUS.

Produce structured data (entities, entity types, entity relationships) from Web 1.0 and Web 2.0 resources that already exists on the Web such that individual entities, their attributes, and relationships are accessible and discernible to software agents (machines).

Once the entities are individually exposed, the next requirement is a mechanism for selective access to these entities i.e. a query language.

Semantic Data Web Technologies that facilitate the solution described above include:

Structured Data Standards:

RDF

RDF/XML - A serialization format for RDF based structured data

Turtle

Entity Exposure & Generation:

. Entity Naming & Identification:

Use of URIs or IRIs for uniquely identifying physical (HTML Documents, Image Files, Multimedia Files etc..) and abstract (People, Places, Music, and other abstract things).

Entity Access & Querying:

SPARQL Query Language - the SQL analog of the Semantic Data Web that enables query constructs that target named entities, entity attributes, and entity relationships

SPARQL Protocol

REST

SOAP

SPARQL Results Serialization Formats

The Semantic Data Web's value to Organizations

Problem:

Organizations are rife with a plethora of business systems that are built atop a myriad of database engines, sourced from a variety of DBMS vendors. A typical organization would have a different database engine, from a specific DBMS vendor, underlying critical business applications such as: Human Resource Management (HR), Customer Relationship Management (CRM), Accounting, Supply Chain Management etc. In a nutshell, you have DBMS Engines, and DBMS Schema heterogeneity permeating the IT infrastructure of organizations on a global scale, making Data & Information Integration the biggest headache across all IT driven organizations.

Solution:

Alleviation of the pain (costs) associated with Data & Information Integration.

Semantic Data Web offerings:

A dexterous data model (RDF) that enables the construction of conceptual views of disparate data sources across an organization based on existing web architecture components such as HTTP and URIs.

Existing middleware solutions that facilitate the exposure of SQL DBMS data as RDF based Structured Data include:

Virtuoso's Meta Schema Language for RDF Views of SQL Data

Virtuoso SQL-RDF Technical White Paper

D2RQ

DataGrid

Others

BTW - There is an upcoming W3C Workshop covering the integration of SQL and RDF data.

Conclusion

The Semantic Data Web is here, it's value delivery vehicle is the URI. The URI is a conduit to Interlinked Structured Data (RDF based Linked Data) derived from existing data sources on the World Wide Web alongside data continuously injected into the Web by organizations world wide. Ironically, the Semantic Data Web only platform that crystallizes the: Information at Your Fingertips vision, without development environment, operating system, application, or database lock-in. You simply click on a Linked Data URI and the serendipitous exploration and discovery of data commences.

The unobtrusive emergence of the Semantic Data Web is a reflection of the soundness of the underlying Semantic Web vision.

If you are excited about Mash-ups then your are a Semantic Web enthusiast and benefactor in the making, because you only "Mash" (brute force data extraction and interlinking) because you can't "Mesh" (natural data extraction and interlinking). Likewise, if you are a social-networking, open social-graph, or portable social-network enthusiast, then you are also a Semantic Data Web benefactor and enthusiasts, because your "values" (yes, the values associated with the properties that define you e.g your interests etc) are the fundamental basis for portable, open, social-networking, which is what the Semantic Data Web hands to you on a platter without compromise (i.e. data lock-in or loss of data ownership).

Some practical examples of Semantic Data Web prowess:

Read/WriteWeb via the OpenLink Data Web Browser

Read/WriteWeb via the Zitgist Data Web Browser

DBpedia

Zitgist zLinks

Mike Bergman's Blog Post also demonstrating zLinks

The Power of Structured Data Exposure via RDFa

Tue, 05 Feb 2008 01:45:02 GMT

I regularly check announcement from Ben Adida re. RDFa as part of a perpetual certification process for my ODS based Weblog. The most recent post from Ben contains a link to an "RDFa in the Wild" portal (in the making).

One I installed Opertaor 0.8 and then scanned a few of the pages from the RDFa portal. Operator 0.8 didn't do much for me i.e. if the RDFa didn't express RDF aligned in some form to a microformat that it understood, it simply routed it's findings to a generic "resource" category :-( Of course, it is possible to enhance this aspect of Operator (and I may get round to that some day). Anyway, I pressed on, and took one of the more interesting URIs from the RDFa page and pasted that into the OpenLink RDF Browser instead. Here are the links:

1. Semantically annotated publication database using Ajax (a page containing structured data expressed in RDF and exposed via RDFa)

2. Same Page via OpenLink RDF Browser

The RDF Browser uses the Virtuoso Sponger to extract the embedded RDF from RDFa embedded in the page.

Search Engine Challenges Posed by the Semantic Web

Thu, 22 Jun 2006 12:56:58 GMT

Search Engine Challenges Posed by the Semantic Web: "

A pre-print from Tim Finin and Li Deng entitled, Search Engines for Semantic Web Knowledge,¹ presents a thoughtful and experienced overview of the challenges posed to conventional search by semantic Web constructs.' The authors’ base much of their observations on their experience with the Swoogle semantic Web search engine over the past two years.' They also used Swoogle, whose index contains information on over 1.3M RDF documents, to generate statistics on the semantic Web size and growth in the paper.

Among other points, the authors note these key differences and challenges from conventional search engines:

Harvesting — the need to discriminantly discover semantic Web documents and to accurately index their semi-structured components
Search - the need for search to cover a broader range than documents in a repository, going from the universal to the atomic granularity of a triple.' Path tracing and provenance of the information may also be important
Rank — results ranking needs to account for the contribution of the semi-structured data, and
Archive — more versioning and tracking is needed since undelrying ontologies will surely grow and evolve.

The authors particularly note the challenge of indexing as repositories grow to actual Internet scales.

Though not noted, I would add to this list the challenge of user interfaces. Only a small percentage of users, for example, use Google’s more complicated advanced search form.' In its full-blown implementation, semantic Web search variations could make the advanced Google form look like child’s play.

¹Tim Finin and Li Ding, 'Search Engines for Semantic Web Knowledge,' a pre-print to be published in the Proceedings of XTech 2006: Building Web 2.0, May 16, 2006, 19 pp.' A PDF of the paper is available for download.

(Via AI3 - Adaptive Information:::.)

Breaking the Web Wide Open!

Thu, 22 Jun 2006 12:56:58 GMT

Marc Canter's Breaking the Web Wide Open! article is something I found pretty late (by my normal discovery standards). This was partly due to the pre- and post- Web 2.0 event noise levels that have dumped the description of an important industry inflection into the "Bozo Bin" of many. Personally, I think we shouldn't confuse the Web 2.0 traditional-pitch-fest conference with an attempt to identify an important industry inflection).

Anyway, Marc's article is a very refreshing read because it provides a really good insight into the general landscape of a rapidly evolving Web alongside genuine appreciation of our broader timeless pursuit of "Openness".

To really help this document provide additional value have scrapped the content of the original post and dumped it below so that we can appreciate the value of the links embedded within the article (note: thanks to Virtuoso I only had to paste the content into my blog, the extraction to my Linkblog and Blog Summary Pages are simply features of my Virtuoso based Blog Engine):

Breaking the Web Wide Open! (complete story)
Even the web giants like AOL, Google, MSN, and Yahoo need to observe these open standards, or they'll risk becoming the "walled gardens" of the new web and be coolio no more.
Marc Canter [Broadband Mechanics, Inc.] | POSTED: 09.26.05 @12:00
Editorial Note: Several months ago, AlwaysOn got a personal invitation from Yahoo founder Jerry Yang "to see and give us feedback on our new social media product, y!360." We were happy to oblige and dutifully showed up, joining a conference room full of hard-core bloggers and new, new media types. The geeks gave Yahoo 360 an overwhelming thumbs down, with comments like, "So the only services I can use within this new network are Yahoo services? What if I don't use Yahoo IM?" In essence, the Yahoo team was booed for being "closed web," and we heartily agreed. With Yahoo 360, Yahoo continues building its own "walled garden" to control its 135 million customersan accusation also hurled at AOL in the early 1990s, before AOL migrated its private network service onto the web. As the Economist recently noted, "Yahoo, in short, has old media plans for the new-media era."

The irony to our view here is, of course, that today's AO Network is also a "closed web." In the end, Mr. Yang's thoughtful invitation and our ensuing disappointment in his new service led to the assignment of this article. It also confirmed our existing plan to completely revamp the AO Network around open standards. To tie it all together, we recruited the chief architect of our new site, the notorious Marc Canter, to pen this piece. We look forward to our reader feedback.

Breaking the Web Wide Open!
By Marc Canter

For decades, "walled gardens" of proprietary standards and content have been the strategy of dominant players in mainframe computer software, wireless telecommunications services, and the World Wide Webit was their successful lock-in strategy of keeping their customers theirs. But like it or not, those walls are tumbling down. Open web standards are being adopted so widely, with such value and impact, that the web giantsAmazon, AOL, eBay, Google, Microsoft, and Yahooare facing the difficult decision of opening up to what they don't control.

The online world is evolving into a new open web (sometimes called the Web 2.0), which is all about being personalized and customized for each user. Not only open source software, but open standards are becoming an essential component.

Many of the web giants have been using open source software for years. Most of them use at least parts of the LAMP (Linux, Apache, MySQL, Perl/Python/PHP) stack, even if they aren't well-known for giving back to the open source community. For these incumbents that grew big on proprietary web services, the methods, practices, and applications of open source software development are difficult to fully adopt. And the next open source movementswhich will be as much about open standards as about codewill be a lot harder for the incumbents to exploit.

While the incumbents use cheap open source software to run their back-ends systems, their business models largely depend on proprietary software and algorithms. But our view a new slew of open software, open protocols, and open standards will confront the incumbents with the classic Innovator's Dilemma. Should they adopt these tools and standards, painfully cannibalizing their existing revenue for a new unproven concept, or should they stick with their currently lucrative model with the risk that eventually a bunch of upstarts eat their lunch?

Credit should go to several of the web giants who have been making efforts to "open up." Google, Yahoo, eBay, and Amazon all have Open APIs (Application Programming Interfaces) built into their data and systems. Any software developer can access and use them for whatever creative purposes they wish. This means that the API provider becomes an open platform for everyone to use and build on top of. This notion has expanded like wildfire throughout the blogosphere, so nowadays, Open APIs are pretty much required.

Other incumbents also have open strategies. AOL has got the RSS religion, providing a feedreader and RSS search in order to escape the "walled garden of content" stigma. Apple now incorporates podcasts, the "personal radio shows" that are latest rage in audio narrowcasting, into iTunes. Even Microsoft is supporting open standards, for example by endorsing SIP (Session Initiation Protocol) for internet telephony and conferencing over Skype's proprietary format or one of its own devising.

But new open standards and protocols are in use, under construction, or being proposed every day, pushing the envelope of where we are right now. Many of these standards are coming from startup companies and small groups of developers, not from the giants. Together with the Open APIs, those new standards will contribute to a new, open infrastructure. Tens of thousands of developers will use and improve this open infrastructure to create new kinds of web-based applications and services, to offer web users a highly personalized online experience.

A Brief History of Openness

At this point, I have to admit that I am not just a passive observer, full-time journalist or "just some blogger"but an active evangelist and developer of these standards. It's the vision of "open infrastructure" that's driving my company and the reason why I'm writing this article. This article will give you some of the background behind on these standards, and what the evolution of the next generation of open standards will look like.

Starting back in the 1980s, establishing a software standard was a key strategy for any software company. My former company, MacroMind (which became Macromedia), achieved this goal early on with Director. As Director evolved into Flash, the world saw that other companies besides Microsoft, Adobe, and Apple could establish true cross-platform, independent media standards.

Then Tim Berners-Lee and Marc Andreessen came along, and changed the rules of the software business and of entrepreneurialism. No matter how entrenched and "standardized" software was, the rug could still get pulled out from under it. Netscape did it to Microsoft, and then Microsoft did it back to Netscape. The web evolved, and lots of standards evolved with it. The leading open source standards (such as the LAMP stack) became widely used alternatives to proprietary closed-source offerings.

Open standards are more than just technology. Open standards mean sharing, empowering, and community support. Someone floats a new idea (or meme) and the community runs with it – with each person making their own contributions to the standard – evolving it without a moment's hesitation about "giving away their intellectual property."

One good example of this was Dave Sifry, who built the Technorati blog-tracking technology inspired by the Blogging Ecosystem, a weekend project by young hacker Phil Pearson. Dave liked what he saw and he ran with itturning Technorati into what it is today.

Dave Winer has contributed enormously to this area of open standards. He defined and personally created several open standards and protocolssuch as RSS, OPML, and XML-RPC. Dave has also helped build the blogosphere through his enthusiasm and passion.

By 2003, hundreds of programmers were working on creating and establishing new standards for almost everything. The best of these new standards have evolved into compelling web services platforms – such as del.icio.us, Webjay, or Flickr. Some have even spun off formal standards – like XSPF (a standard for playlists) or instant messaging standard XMPP (also known as Jabber).

Today's Open APIs are complemented by standardized Schemasthe structure of the data itself and its associated meta-data. Take for example a podcasting feed. It consists of: a) the radio show itself, b) information on who is on the show, what the show is about and how long the show is (the meta-data) and also c) API calls to retrieve a show (a single feed item) and play it from a specified server.

The combination of Open APIs, standardized schemas for handling meta-data, and an industry which agrees on these standards are breaking the web wide open right now. So what new open standards should the web incumbentsand yoube watching? Keep an eye on the following developments:

Identity
Attention
Open Media
Microcontent Publishing
Open Social Networks
Tags
Pinging
Routing
Open Communications
Device Management and Control

1. Identity

Right now, you don't really control your own online identity. At the core of just about every online piece of software is a membership system. Some systems allow you to browse a site anonymouslybut unless you register with the site you can't do things like search for an article, post a comment, buy something, or review it. The problem is that each and every site has its own membership system. So you constantly have to register with new systems, which cannot share dataeven you'd want them to. By establishing a "single sign-on" standard, disparate sites can allow users to freely move from site to site, and let them control the movement of their personal profile data, as well as any other data they've created.

With Passport, Microsoft unsuccessfully attempted to force its proprietary standard on the industry. Instead, a world is evolving where most people assume that users want to control their own data, whether that data is their profile, their blog posts and photos, or some collection of their past interactions, purchases, and recommendations. As long as users can control their digital identity, any kind of service or interaction can be layered on top of it.

Identity 2.0 is all about users controlling their own profile data and becoming their own agents. This way the users themselves, rather than other intermediaries, will profit from their ID info. Once developers start offering single sign-on to their users, and users have trusted places to store their datawhich respect the limits and provide access controls over that data, users will be able to access personalized services which will understand and use their personal data.

Identity 2.0 may seem like some geeky, visionary future standard that isn't defined yet, but by putting each user's digital identity at the core of all their online experiences, Identity 2.0 is becoming the cornerstone of the new open web.

The Initiatives:
Right now, Identity 2.0 is under construction through various efforts from Microsoft (the "InfoCard" component built into the Vista operating system and its "Identity Metasystem"), Sxip Identity, Identity Commons, Liberty Alliance, LID (NetMesh's Lightweight ID), and SixApart's OpenID.

More Movers and Shakers:
Identity Commons and Kaliya Hamlin, Sxip Identity and Dick Hardt, the Identity Gang and Doc Searls, Microsoft's Kim Cameron, Craig Burton, Phil Windley, and Brad Fitzpatrick, to name a few.

2. Attention

How many readers know what their online attention is worth? If you don't, Google and Yahoo dothey make their living off our attention. They know what we're searching for, happily turn it into a keyword, and sell that keyword to advertisers. They make money off our attention. We don't.

Technorati and friends proposed an attention standard, Attention.xml, designed to "help you keep track of what you've read, what you're spending time on, and what you should be paying attention to." AttentionTrust is an effort by Steve Gillmor and Seth Goldstein to standardize on how captured end-user performance, browsing, and interest data are used.

Blogger Peter Caputa gives a good summary of AttentionTrust:
"As we use the web, we reveal lots of information about ourselves by what we pay attention to. Imagine if all of that information could be stored in a nice neat little xml file. And when we travel around the web, we can optionally share it with websites or other people. We can make them pay for it, lease it ... we get to decide who has access to it, how long they have access to it, and what we want in return. And they have to tell us what they are going to do with our Attention data."

So when you give your attention to sites that adhere to the AttentionTrust, your attention rights (you own your attention, you can move your attention, you can pay attention and be paid for it, and you can see how your attention is used) are guaranteed. Attention data is crucial to the future of the open web, and Steve and Seth are making sure that no one entity or oligopoly controls it.

Movers and Shakers:
Steve Gillmor, Seth Goldstein, Dave Sifry and the other Attention.xml folks.

3. Open Media

Proprietary media standardsFlash, Windows Media, and QuickTime, to name a few helped liven up the web. But they are proprietary standards that try to keep us locked in, and they weren't created from scratch to handle today's online content. That's why, for many of us, an Open Media standard has been a holy grail. Yahoo's new Media RSS standard brings us one step closer to achieving open media, as do Ogg Vorbis audio codecs, XSPF playlists, or MusicBrainz. And several sites offer digital creators not only a place to store their content, but also to sell it.

Media RSS (being developed by Yahoo with help from the community) extends RSS and combines it with "RSS enclosures" adds metadata to any media itemto create a comprehensive solution for media "narrowcasters." To gain acceptance for Media RSS, Yahoo knows it has to work with the community. As an active member of this community, I can tell you that we'll create Media RSS equivalents for rdf (an alternative subscription format) and Atom (yet another subscription format), so no one will be able to complain that Yahoo is picking sides in format wars.

When Yahoo announced the purchase of Flickr, Yahoo founder Jerry Yang insinuated that Yahoo is acquiring "open DNA" to turn Yahoo into an open standards player. Yahoo is showing what happens when you take a multi-billion dollar company and make openness one of its core valuesso Google, beware, even if Google does have more research fellows and Ph.D.s.

The open media landscape is far and wide, reaching from game machine hacks and mobile phone downloads to PC-driven bookmarklets, players, and editors, and it includes many other standardization efforts. XSPF is an open standard for playlists, and MusicBrainz is an alternative to the proprietary (and originally effectively stolen) database that Gracenote licenses.

Ourmedia.org is a community front-end to Brewster Kahle's Internet Archive. Brewster has promised free bandwidth and free storage forever to any content creators who choose to share their content via the Internet Archive. Ourmedia.org is providing an easy-to-use interface and community to get content in and out of the Internet Archive, giving ourmedia.org users the ability to share their media anywhere they wish, without being locked into a particular service or tool. Ourmedia plans to offer open APIs and an open media registry that interconnects other open media repositories into a DNS-like registry (just like the www domain system), so folks can browse and discover open content across many open media services. Systems like Brightcove and Odeo support the concept of an open registry, and hope to work with digital creators to sell their work to fulfill the financial aspect of the "Long Tail."

More Movers and Shakers:
Creative Commons, the Open Media Network, Jay Dedman, Ryanne Hodson, Michael Verdi, Eli Chapman, Kenyatta Cheese, Doug Kaye, Brad Horowitz, Lucas Gonze, Robert Kaye, Christopher Allen, Brewster Kahle, JD Lasica, and indeed, Marc Canter, among others.

4. Microcontent Publishing

Unstructured content is cheap to create, but hard to search through. Structured content is expensive to create, but easy to search. Microformats resolve the dilemma with simple structures that are cheap to use and easy to search.

The first kind of widely adopted microcontent is blogging. Every post is an encapsulated idea, addressable via a URL called a permalink. You can syndicate or subscribe to this microcontent using RSS or an RSS equivalent, and news or blog aggregators can then display these feeds in a convenient readable fashion. But a blog post is just a block of unstructured text—not a bad thing, but just a first step for microcontent. When it comes tostructured data, such as personal identity profiles, product reviews, or calendar-type event data, RSS was not designed to maintain the integrity of the structures.

Right now, blogging doesn't have the underlying structure necessary for full-fledged microcontent publishing. But that will change. Think of local information services (such as movie listings, event guides, or restaurant reviews) that any college kid can access and use in her weekend programming project to create new services and tools.

Today's blogging tools will evolve into microcontent publishing systems, and will help spread the notion of structured data across the blogosphere. New ways to store, represent and produce microcontent will create new standards, such as Structured Blogging and Microformats. Microformats differ from RSS feeds in that you can't subscribe to them. Instead, Microformats are embedded into webpages and discovered by search engines like Google or Technorati. Microformats are creating common definitions for "What is a review or event? What are the specific fields in the data structure?" They can also specify what we can do with all this information.OPML (Outline Processor Markup Language) is a hierarchical file format for storing microcontent and structured data. It was developed by Dave Winer of RSS and podcast fame.

Events are one popular type of microcontent. OpenEvents is already working to create shared databases of standardized events, which would get used by a new generation of event portals—such as Eventful/EVDB, Upcoming.org, and WhizSpark. The idea of OpenEvents is that event-oriented systems and services can work together to establish shared events databases (and associated APIs) that any developer could then use to create and offer their own new service or application. OpenReviews is still in the conceptual stage, but it would make it possible to provide open alternatives to closed systems like Epinions, and establish a shared database of local and global reviews. Its shared open servers would be filled with all sorts of reviews for anyone to access.

Why is this important? Because I predict that in the future, 10 times more people will be writing reviews than maintaining their own blog. The list of possible microcontent standards goes on: OpenJobpostings, OpenRecipes, and even OpenLists. Microsoft recently revealed that it has been working on an important new kind of microcontent: Lists—so OpenLists will attempt to establish standards for the kind of lists we all use, such as lists of Links, lists of To Do Items, lists of People, Wish Lists, etc.

Movers and Shakers:
Tantek Çelik and Kevin Marks of Technorati, Danny Ayers, Eric Meyer, Matt Mullenweg, Rohit Khare, Adam Rifkin, Arnaud Leene, Seb Paquet, Alf Eaton, Phil Pearson, Joe Reger, Bob Wyman among others.

5. Open Social Networks

I'll never forget the first time I met Jonathan Abrams, the founder of Friendster. He was arrogant and brash and he claimed he "owned" all his users, and that he was going to monetize them and make a fortune off them. This attitude robbed Friendster of its momentum, letting MySpace, Facebook, and other social networks take Friendster's place.

Jonathan's notion of social networks as a way to control users is typical of the Web 1.0 business model and its attitude towards users in general. Social networks have become one of the battlegrounds between old and new ways of thinking. Open standards for Social Networking will define those sides very clearly. Since meeting Jonathan, I have been working towards finding and establishing open standards for social networks. Instead of closed, centralized social networks with 10 million people in them, the goal is making it possible to have 10 million social networks that each have 10 people in them.

FOAF (which stands for Friend Of A Friend, and describes people and relationships in a way that computers can parse) is a schema to represent not only your personal profile's meta-data, but your social network as well. Thousands of researchers use the FOAF schema in their "Semantic Web" projects to connect people in all sorts of new ways. XFN is a microformat standard for representing your social network, while vCard (long familiar to users of contact manager programs like Outlook) is a microformat that contains your profile information. Microformats are baked into any xHTML webpage, which means thatany blog, social network page, or any webpage in general can "contain" your social network in itand be used byany compatible tool, service or application.

PeopleAggregator is an earlier project now being integrated into open content management framework Drupal. The PeopleAggregator APIs will make it possible to establish relationships, send messages, create or join groups, and post between different social networks. (Sneak preview: this technology will be available in the upcoming GoingOn Network.)

All of these open social networking standards mean that inter-connected social networks will form a mesh that will parallel the blogosphere. This vibrant, distributed, decentralized world will be driven by open standards: personalized online experiences are what the new open web will be all aboutand what could be more personalized than people's networks?

Movers and Shakers:
Eric Sigler, Joel De Gan, Chris Schmidt, Julian Bond, Paul Martino, Mary Hodder, Drummond Reed, Dan Brickley, Randy Farmer, and Kaliya Hamlin, to name a few.

6. Tags

Nowadays, no self-respecting tool or service can ship without tags. Tags are keywords or phrases attached to photos, blog posts, URLs, or even video clips. These user- and creator-generated tags are an open alternative to what used to be the domain of librarians and information scientists: categorizing information and content using taxonomies. Tags are instead creating "folksonomies."

The recently proposed OpenTags concept would be an open, community-owned version of the popular Technorati Tags service. It would aggregate the usage of tags across a wide range of services, sites, and content tools. In addition to Technorati's current tag features, OpenTags would let groups of people share their tags in "TagClouds." Open tagging is likely to include some of the open identity features discussed above, to create a tag system that is resilient to spam, and yet trustable across sites all over the web.

OpenTags owes a debt to earlier versions of shared tagging systems, which include Topic Exchange and something called the k-collectora knowledge management tag aggregatorfrom Italian company eVectors.

Movers & Shakers:
Phil Pearson, Matt Mower , Paolo Valdemarin, and Mary Hodder and Drummond Reed again, among others.

7. Pinging

Websites used to be mostly static. Search engines that crawled (or "spidered") them every so often did a good enough job to show reasonably current versions of your cousin's homepage or even Time magazine's weekly headlines. But when blogging took off, it became hard for search engines to keep up. (Google has only just managed to offer blog-search functionality, despite buying Blogger back in early 2003.)

To know what was new in the blogosphere, users couldn't depend on services that spidered webpages once in a while. The solution: a way for blogs themselves to automatically notify blog-tracking sites that they'd been updated. Weblogs.com was the first blog "ping service": it displayed the name of a blog whenever that blog was updated. Pinging sites helped the blogosphere grow, and more tools, services, and portals started using pinging in new and different ways. Dozens of pinging services and sitesmost of which can't talk to each othersprang up.

Matt Mullenweg (the creator of open source blogging software WordPress) decided that a one-stop service for pinging was needed. He created Ping-o-Maticwhich aggregates ping services and simplifies the pinging process for bloggers and tool developers. With Ping-o-Matic, any developer can alert all of the industry's blogging tools and tracking sites at once. This new kind of open standard, with shared infrastructure, is a critical to the scalability of Web 2.0 services.

As Matt said:
There are a number of services designed specifically for tracking and connecting blogs. However it would be expensive for all the services to crawl all the blogs in the world all the time. By sending a small ping to each service you let them know you've updated so they can come check you out. They get the freshest data possible, you don't get a thousand robots spidering your site all the time. Everybody wins.

Movers and Shakers:
Matt Mullenweg, Jim Winstead, Dave Winer

8. Routing

Bloggers used to have to manually enter the links and content snippets of blog posts or news items they wanted to blog. Today, some RSS aggregators can send a specified post directly into an associated blogging tool: as bloggers browse through the feeds they subscribe to, they can easily specify and send any post they wish to "reblog" from their news aggregator or feed reader into their blogging tool. (This is usually referred to as "BlogThis.") As structured blogging comes into its own (see the section on Microcontent Publishing), it will be increasingly important to maintain the structural integrity of these pieces of microcontent when reblogging them.

Promising standard RedirectThis will combine a "BlogThis"-like capability while maintaining the integrity of the microcontent. RedirectThis will let bloggers and content developers attach a simple "PostThis" button to their posts. Clicking on that button will send that post to the reader/blogger's favorite blogging tool. This favorite tool is specified at the RedirectThis web service, where users register their blogging tool of choice. RedirectThis also helps maintain the integrity and structure of microcontentthen it's just up to the user to prefer a blogging tool that also attains that lofty goal of microcontent integrity.

OutputThis is another nascent web services standard, to let bloggers specify what "destinations" they'd like to have as options in their blogging tool. As new destinations are added to the service, more checkboxes would get added to their blogging toolallowing them to route their published microcontent to additional destinations.

Movers and Shakers:
Michael Migurski, Lucas Gonze

9. Open Communications

Likely, you've experienced the joys of finding friends on AIM or Yahoo Messenger, or the convenience of Skyping with someone overseas. Not that you're about to throw away your mobile phone or BlackBerry, but for many, also having access to Instant Messaging (IM) and Voice over IP (VoIP) is crucial.

IM and VoIP are mainstream technologies that already enjoy the benefits of open standards. Entire industries are bornright this secondbased around these open standards. Jabber has been an open IM technology for yearsin fact, as XMPP, it was officially dubbed a standard by the IETF. Although becoming an official IETF standard is usually the kiss of death, Jabber looks like it'll be around for a while, as entire generations of collaborative, work-group applications and services have been built on top of its messaging protocol. For VoIP, Skype is clearly the leading standard todaythough one could argue just how "open" it is (and defenders of the IETF's SIP standard often do). But it is free and user-friendly, so there won't be much argument from users about it being insufficiently open. Yet there may be a cloud on Skype's horizon: web behemoth Google recently released a beta of Google Talk, an IM client committed to open standards. It currently supports XMPP, and will support SIP for VoIP calls.

Movers and Shakers:
Jeremie Miller, Henning Schulzrinne, Jon Peterson, Jeff Pulver

10. Device Management and Control

To access online content, we're using more and more devices. BlackBerrys, iPods, Treos, you name it. As the web evolves, more and more different devices will have to communicate with each other to give us the content we want when and where we want it. No-one wants to be dependent on one vendor anymorelike, say, Sonyfor their laptop, phone, MP3 player, PDA, and digital camera, so that it all works together. We need fully interoperable devices, and the standards to make that work. And to fully make use of how content is moving online content and innovative web services, those standards need to be open.

MIDI (musical instrument digital interface), one of the very first open standards in music, connected disparate vendors' instruments, post-production equipment, and recording devices. But MIDI is limited, and MIDI II has been very slow to arrive. Now a new standard for controlling musical devices has emerged: OSC (Open SoundControl). This protocol is optimized for modern networking technology and inter-connects music, video and controller devices with "other multimedia devices." OSC is used by a wide range of developers, and is being taken up in the mainstream MIDI marketplace.

Another open-standards-based device management technology is ZigBee, for building wireless intelligence and network monitoring into all kinds of devices. ZigBee is supported by many networking, consumer electronics, and mobile device companies.

· · · · · ·

The Change to Openness

The rise of open source software and its "architecture of participation" are completely shaking up the old proprietary-web-services-and-standards approach. Sun Microsystemswhose proprietary Java standard helped define the Web 1.0is opening its Solaris OS and has even announced the apparent paradox of an open-source Digital Rights Management system.

Today's incumbents will have to adapt to the new openness of the Web 2.0. If they stick to their proprietary standards, code, and content, they'll become the new walled gardensplaces users visit briefly to retrieve data and content from enclosed data silos, but not where users "live." The incumbents' revenue models will have to change. Instead of "owning" their users, users will know they own themselves, and will expect a return on their valuable identity and attention. Instead of being locked into incompatible media formats, users will expect easy access to digital content across many platforms.

Yesterday's web giants and tomorrow's users will need to find a mutually beneficial new balancebetween open and proprietary, developer and user, hierarchical and horizontal, owned and shared, and compatible and closed.

Marc Canter is an active evangelist and developer of open standards. Early in his career, Marc founded MacroMind, which became Macromedia. These days, he is CEO of Broadband Mechanics, a founding member of the Identity Gang and of ourmedia.org. Broadband Mechanics is currently developing the GoingOn Network (with the AlwaysOn Network), as well as an open platform for social networking called the PeopleAggregator.

A version of the above post appears in the Fall 2005 issue of AlwaysOn's quarterly print blogozine, and ran as a four-part series on the AlwaysOn Network website.

(Via Marc's Voice.)

The Difference Between Information and Knowledge

Thu, 22 Jun 2006 12:56:58 GMT

Earlier this week, Jon Udell (view here) and Dare Obasanjo (view here) both contributed great articles covering the effect of networks. As I read these posts it got me thinking (once again) about the issue of differentiating data, information, and knowledge. I also realized during my musings that this would actually bring some clarity to technology areas that are oftenly completely misunderstood as a result of value proposition misconceptions or misunderstandings.

A quick head to blog dispatch of these thoughts (while they remain fresh):

Data is an expression of feedback; a statement (rightly or wrongly so) about an observation. If you think about it, didn't we used to capture observed data on paper in tabular form (row and columns which are analogous to Relational Database Tables and Columns)?

Information is data in context, or as I would prefer to say: contextualized data. Thus, information provides an understanding of data (provides insight about statements of observation). I also recall a myriad of context oriented hierarchical presentation forms: taxonomies and ontologies or conceptual schemas (nowadays expressed in an hierarchical tree form called XML and persisted for future reference in an XML aware database).

Knowledge isn't contextualized information, and it is certainly distinct from information (contrary to many dictionary definitions as highlighted in this post by Amy Gahran). I prefer to define knowledge as the basis of what you can, will, would, should, or might do with information. And all cases we express our levels knowledge by the way we act on the information (or lack there of) at our disposal. Think about brainstorming for a moment; you are trying to determine a path of action based on information at your disposal, a typical action would be to draw conceptual or topic relationship maps (graphing, with direction driven by the information processing action) on a whiteboard or piece of paper. Expressing, sharing, processing, and persisting these concepts and topics graphs are what the 'Graph Model' based semantic/knowledge database is all about.

Our industry has derived appropriate technology solution realms for Data, Information, and Knowledge Management (although we mix them up more often than not). Thus, there is room for Network, Hierarchical, SQL, XML (Semi-Structured Model), Object, Object-Relational, and Associative Model (graph based modeling of: source, verb, target; analogous to subject, predicate, object as per RDF).

We are spawning data, databases, infobases, knowledgebases, networks, and eventually agents, that will reflect the timeless relationships that exist across; data, information, and knowledge.

XML, the New Database Heresy

Thu, 22 Jun 2006 12:56:58 GMT

A great post by Dare, especially his bringing into context the essence of this matter refrred to by C.J. Date as "XML the New Database Heresy".

I have little to add to this matter as our understanding and vision is aptly expressed via the architecture and feature set of Virtuoso (this area was actually addressed circa 1999).

We are heading into a era of multi-model databases, these are single database engines that are capable of effectively serving the requirements of the Hierarchical, Network, Relational, and Object database models . As we get closer to the unravelling of universal storage, hopefully this will get clearer.

Back to Dare's commentary:

C.J. Date, one of the most influential names in the relational database world, had some harsh words about XML's encroachment into the world of relational databases in a recent article entitled Date defends relational model that appeared on SearchDatabases.com. Key parts of the article are excerpted below

Date reserved his harshest criticism for the competition, namely object-oriented and XML-based DBMSs. Calling them "the latest fashions in the computer world," Date said he rejects the argument that relational DBMSs are yesterday's news. Fans of object-oriented database systems "see flaws in the relational model because they don't fully understand it," he said.

Date also said that XML enthusiasts have gone overboard.

"XML was invented to solve the problem of data interchange, but having solved that, they now want to take over the world," he said. "With XML, it's like we forget what we are supposed to be doing, and focus instead on how to do it."

Craig S. Mullins, the director of technology planning at BMC Software and a SearchDatabase.com expert, shares Date's opinion of XML. It can be worthwhile, Mullins said, as long as XML is only used as a method of taking data and putting it into a DBMS. But Mullins cautioned that XML data that is stored in relational DBMSs as whole documents will be useless if the data needs to be queried, and he stressed Date's point that XML is not a real data model.

Craig Mullins points are more straightforward to answer since his comments don't jibe with the current state of the art in the XML world. He states that you can't query XML documents stored in databases but this is untrue. Almost three years ago, I was writing articles about querying XML documents stored in relational databases. Storing XML in a relational database doesn't mean it has to be stored in as an opaque binary BLOB or as a big, bunch of text which cannot effectively be queried. The next version of SQL Server will have extensive capabilities for querying XML data in relational database and doing joins across relational and XML data, a lot of this functionality is described in the article on XML Support in SQL Server 2005. As for XML not having a data model, I beg to differ. There is a data model for XML that many applications and people adhere to, often without realizing that they are doing so. This data model is the XPath 1.0 data model, which is being updated to handled typed data as the XQuery and XPath 2.0 data model.

Now to tackle the meat of C.J. Date's criticisms which is that XML solves the problem of data interchange but now is showing up in the database. The thing first point I'd like point out is that there are two broad usage patterns of XML, it is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The latter type of data will only grow now that office productivity software like Microsoft Office have enabled users to save their documents as XML instead of proprietary binary formats. In many cases, these documents cannot simply shredded into relational tables. Sure you can shred an Excel spreadsheet written in spreadsheetML into relational tables but is the same really feasible for a Word document written in WordprocessingML? Many enterprises would rather have their important business data being stored and queried from a unified location instead of the current situation where some data is in document management systems, some hangs around as random files in people's folders while some sits in a database management system.

As for stating that critics of the relational model don't understand it, I disagree. One of the major benefits of using XML in relational databases is that it is a lot easier to deal with fluid schemas or data with sparse entries with XML. When the shape of the data tends to change or is not fixed the relational model is simply not designed to deal with this. Constantly changing your database schema is simply not feasible and there is no easy way to provide the extensibility of XML where one can say "after the X element, any element from any namespace can appear". How would one describe the capacity to store “any data” in a traditional relational database without resorting to an opaque blob?

I do tend to agree that some people are going overboard and trying to model their data hierarchically instead of relationally which experience has thought us is a bad idea. Recently on the XML-DEV mailing list entitled Designing XML to Support Information Evolution where Roger L. Costello described his travails trying to model his data which was being transferred as XML in a hierarchical manner. Micheal Champion accurately described the process Roger Costello went through as having "rediscovered the relational model". In a response to that thread I wrote "Hierarchical databases failed for a reason".

Using hierarchy as a primary way to model data is bad for at least the following reasons

Hierarchies tend to encourage redundancy. Imagine I have a element who has one or more elements as children as well as one or more elements as children as well. Each order was shipped to an address, so if modelled hierarchically each element also will have a element which leads to a lot of unnecessary duplication of data.

In the real world, there are often multiple groups to which a piece of data belongs which often cannot be modelled with a single hierarchy.

Data is too tightly coupled. If I delete a element, this means I've automatically deleted his entire order history since all the elements are children of . Similarly if I query for a , I end up getting all the information as well.

To put it simply, experience has taught the software world that the relational model is a better way to model data than the hierarchical model. Unfortunately, in the rush to embrace XML many a repreating the mistakes from decades ago in the new millenium.

[via Dare Obasanjo aka Carnage4Life]

Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5)

Wed, 19 Jan 2011 15:43:35 GMT

What is SPARQL?

A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).

SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.

Why is it important?

Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.

Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.

How do I use it, generally?

Locate a SPARQL endpoint (DBpedia, LOD Cloud Cache, Data.Gov, URIBurner, others), or;
Install a SPARQL compliant database server (quad or triple store) on your desktop, workgroup server, data center, or cloud (e.g., Amazon EC2 AMI)
Start the database server
Execute SPARQL Queries via the SPARQL endpoint.

How do I use SPARQL with Virtuoso?

What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:

Software Download and Installation
Data Loading from Data Sources exposed at Network Addresses (e.g. HTTP URLs) using very simple methods
Actual SPARQL query execution via SPARQL endpoint.

Installation Steps

Download Virtuoso Open Source or Virtuoso Commercial Editions
Run installer (if using Commercial edition of Windows Open Source Edition, otherwise follow build guide)
Follow post-installation guide and verify installation by typing in the command: virtuoso -? (if this fails check you've followed installation and setup steps, then verify environment variables have been set)
Start the Virtuoso server using the command: virtuoso-start.sh
Verify you have a connection to the Virtuoso Server via the command: isql localhost (assuming you're using default DB settings) or the command: isql localhost:1112 (assuming demo database) or goto your browser and type in: http://:[port]/conductor (e.g. http://localhost:8889/conductor for default DB or http://localhost:8890/conductor if using Demo DB)
Go to SPARQL endpoint which is typically -- http://:[port]/sparql
Run a quick sample query (since the database always has system data in place): select distinct * where {?s ?p ?o} limit 50 .

Troubleshooting

Ensure environment settings are set and functional -- if using Mac OS X or Windows, so you don't have to worry about this, just start and stop your Virtuoso server using native OS services applets
If using the Open Source Edition, follow the getting started guide -- it covers PATH and startup directory location re. starting and stopping Virtuoso servers.
Sponging (HTTP GETs against external Data Sources) within SPARQL queries is disabled by default. You can enable this feature by assigning "SPARQL_SPONGE" privileges to user "SPARQL". Note, more sophisticated security exists via WebID based ACLs.

Data Loading Steps

Identify an RDF based structured data source of interest -- a file that contains 3-tuple / triples available at an address on a public or private HTTP based network
Determine the Address (URL) of the RDF data source
Go to your Virtuoso SPARQL endpoint and type in the following SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM WHERE {?s ?p ?o}
All the triples in the RDF resource (data source accessed via URL) will be loaded into the Virtuoso Quad Store (using RDF Data Source URL as the internal quad store Named Graph IRI) as part of the SPARQL query processing pipeline.

Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:

Transformation of data from non RDF data sources (file content, hypermedia resources, web services output etc..) into RDF based 3-tuples (triples)
Cache Invalidation Scheme Construction -- thus, subsequent queries (without the define get:soft "replace" pragma will not be required bar when you forcefully want to override cache).
If you have very large data sources like DBpedia etc. from CKAN, simply use our bulk loader .

SPARQL Endpoint Discovery

Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.

Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:

dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for services instances
dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results in Zone File format

Using HTTP from Ruby -- you can just make SPARQL Protocol URLs re. SPARQL
Using SPARQL Endpoints via Ruby -- Ruby example using DBpedia endpoint
Interactive SPARQL Query By Example (QBE) tool -- provides a graphical user interface (as is common in SQL realm re. query building against RDBMS engines) that works with any SPARQL endpoint
Other methods of loading RDF data into Virtuoso
Virtuoso Sponger -- architecture and how it turns a wide variety of non RDF data sources into SPARQL accessible data
Using OpenLink Data Explorer (ODE) to populate Virtuoso -- locate a resource of interest; click on a bookmarklet or use context menus (if using ODE extensions for Firefox, Safari, or Chrome); and you'll have SPARQL accessible data automatically inserted into your Virtuoso instance.
W3C's SPARQLing Data Access Ingenuity -- an older generic SPARQL introduction post
Collection of SPARQL Query Examples -- GoodRelations (Product Offers), FOAF (Profiles), SIOC (Data Spaces -- Blogs, Wikis, Bookmarks, Feed Collections, Photo Galleries, Briefcase/DropBox, AddressBook, Calendars, Discussion Forums)
Collection of Live SPARQL Queries against LOD Cloud Cache -- simple and advanced queries.

What is Linked Data, really?

Tue, 09 Nov 2010 18:53:01 GMT

Linked Data is simply hypermedia-based structured data.

Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.

The fundamental steps to creating Linked Data are as follows:

Choose a Name Reference Mechanism — i.e., URIs.
Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes
1. Subjects (also known as Entities)
2. Subject Attributes (also known as Entity Attributes), and
3. Attribute Values (also known as Subject Attribute Values or Entity Attribute Values).
Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, OData, OpenGraph, and many others.
Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.
Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:
1. Identify Subject(s) using Resolvable URI(s).
2. Identify Subject Attribute(s) using Resolvable URI(s).
3. Assign Attribute Values to Subject Attributes. These Values may be either Literals (e.g., STRINGs, BLOBs) or Resolvable URIs.

You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more.

Linked Data an Introduction -- simple introduction to Linked Data and its virtues
How Data Makes Corporations Dumb -- Jeff Jonas (IBM) interview
Hypermedia Types -- evolving information portal covering different aspects of Hypermedia resource types
URIBurner -- service that generates Linked Data from a plethora of heterogeneous data sources
Linked Data Meme -- TimbL design issues note about Linked Data
Data 3.0 Manifesto -- note about format agnostic Linked Data
DBpedia -- large Linked Data Hub
Linked Open Data Cloud -- collection of Linked Data Spaces
Linked Open Commerce Cloud -- commerce (clicks & mortar and/or clicks & clicks) oriented Linked Data Space
LOD Cloud Cache -- massive Linked Data Space hosting most of the LOD Cloud Datasets
LOD2 Initiative -- EU Co-Funded Project to develop global knowledge space from LOD

What is Linked Data, really?

Tue, 15 Feb 2011 22:28:06 GMT

Linked Data is simply hypermedia-based structured data.

Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.

The fundamental steps to creating Linked Data are as follows:

Choose a Name Reference Mechanism — i.e., URIs.
Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes
1. Subjects (also known as Entities)
2. Subject Attributes (also known as Entity Attributes), and
3. Attribute Values (also known as Subject Attribute Values or Entity Attribute Values).
Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, and OData; there are many others.
Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.
Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:
1. Identify Subject(s) using Resolvable URI(s).
2. Identify Subject Attribute(s) using Resolvable URI(s).
3. Assign Attribute Values to Subject Attributes. These Values may be either Literals (e.g., STRINGs, BLOBs) or Resolvable URIs.

Hypermedia Types -- evolving information portal covering different aspects of Hypermedia resource types
URIBurner -- service that generates Linked Data from a plethora of heterogeneous data sources
Linked Data Meme -- TimbL design issues note about Linked Data
Data 3.0 Manifesto -- note about format agnostic Linked Data
DBpedia -- large Linked Data Hub
Linked Open Data Cloud -- collection of Linked Data Spaces
Linked Open Commerce Cloud -- commerce (clicks & mortar and/or clicks & clicks) oriented Linked Data Space
LOD Cloud Cache -- massive Linked Data Space hosting most of the LOD Cloud Datasets
LOD2 Initiative -- EU Co-Funded Project to develop global knowledge space from LOD

OpenLink Virtuoso - Product Value Proposition Overiew

Sat, 27 Feb 2010 17:46:36 GMT

Situation Analysis

Since the beginning of the modern IT era, each period of innovation has inadvertently introduced its fair share of Data Silos. The driving force behind this anomaly remains an overemphasis on the role of applications when selecting problem solutions. Unfortunately, most solution selecting decision makers remain oblivious to the fact that most applications are architecturally monolithic; i.e., they fail to separate the following five layers that are critical to all solutions:

Data Unit (Datum or Data Object) Identity,
Data Storage/Persistence,
Data Access,
Data Representation, and
Data Presentation/Visualization.

The rise of the Internet, and its exponentially-growing user-friendly enclave known as the World Wide Web, is bringing the intrinsic costs of the monolithic application architecture anomaly to bear -- in manners unanticipated by many. For example, the emergence of network-oriented solutions across the realms of Enterprise 2.0-based Collaboration and Web 2.0-based Software-as-a-Service (SaaS), combined with the overarching influence of Social Media, are producing more heterogeneously-structured and disparately-located data sources than people can effectively process.

As is often the case, a variety of problem and product monikers have emerged for the data access and integration challenges outlined above. Contemporary examples include Enterprise Information Integration, Master Data Management, and Data Virtualization. Labeling aside, the fundamental issues of the unresolved Data Integration challenge boil down to the following:

Data Model Heterogeneity
Data Quality (Cleanliness)
Semantic Variance across Contexts (e.g., weights and measures).

Effectively solving today's data integration challenges requires a move away from monolithic application architecture to loosely-coupled, network-centric application architectures. Basically, we need a ubiquitous network-centric application protocol that lends itself to loosely-coupled across-the-wire orchestration of data interactions. In short, this will be what revitalizes the art of application development and deployment.

The World Wide Web is built around a network application protocol called HTTP. This protocol intrinsically separates the five layers listed earlier, thereby enabling:

Use of Generic HTTP URIs as Data Object (Entity) Identifiers;
Identifier Co-reference, such that multiple Data Object Identifiers may reference the same Data Object;
Use of the Entity-Attribute-Value Model to describe Data Objects using real world modeling friendly conceptual graphs;
Use of HTTP URLs to Identify Locations of Resources that bear (host) Data Object Descriptions (Representations);
Data Access mechanism for retrieving Data Object Representations from persistent or transient storage locations.

What is Virtuoso?

A uniquely designed to address today's escalating Data Access and Integration challenges without compromising performance, security, or platform independence. At its core lies an unrivaled commitment to industry standards combined with unique technology innovation that transcends erstwhile distinct realms such as:

Data Management (Relational, RDF Graph, or Document),
Data Access Middleware,
Web Application & Services Deployment,
Linked Data Deployment, and
Messaging.

When Virtuoso is installed and running, HTTP-based Data Objects are automatically created as a by-product of its powerful data virtualization, transcending data sources and data representation formats. The benefits of such power extend across profiles such as:

Product Benefits Summary

Enterprise Agility — Virtuoso lets you mix-&-match best-of-class combinations of Operating Systems, Programming Environments, Database Engines and Data-Access Middleware when building or tweaking your IS infrastructure, without the typical impedance of vendor-lock-in.
Data Model Dexterity — By supporting multiple protocols and data models in a single product, Virtuoso protects you against costly vulnerabilities such as: perennial acquisition and accumulation of expensive data model specific DBMS products that still operate on the fundamental principle of: proprietary technology lock-in, at a time when heterogeneity continues to intrinsically define the information technology landscape.
Cost-effectiveness — By providing a single point of access (and single-sign-on, SSO) to a plethora of Web 2.0-style social networks, Web Services, and Content Management Systems, and by using Data Object Identifiers as units of Data Virtualization that become the focal points of all data access, Virtuoso lowers the cost to exploit emerging frontiers such as socially-enhanced enterprise collaboration.
Speed of Exploitation — Virtuoso provides the ability to rapidly assemble 360-degree conceptual views of data, across internal line-of-business application (CRM, ERP, ECM, HR, etc.) data and/or external data sources, whether these are unstructured, semi-structured, or fully structured.

Bottom line, Virtuoso delivers unrivaled flexibility and scalability, without compromising performance or security.

Re-introducing the Virtuoso Virtual Database Engine

Wed, 17 Feb 2010 21:46:53 GMT

In recent times a lot of the commentary and focus re. Virtuoso has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked is the sophisticated Virtual Database Engine that provides the foundation for all of Virtuoso's data integration capabilities.

In this post I provide a brief re-introduction to this essential aspect of Virtuoso.

What is it?

This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).

Why is it important?

In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools

In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.

How do I use it?

The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:

Relational Database Federation

You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!

Conceptual Level Data Access using the RDF Model

You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).

You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).

It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.

Conceptual Level Data Access using ADO.NET Entity Frameworks

As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.

One Technology That Will Rock 2010 (Update 1)

Mon, 01 Feb 2010 14:02:41 GMT

Thanks to the TechCrunch post titled: Ten Technologies That Will Rock 2010, I've been able to quickly construct a derivative post that condenses the ten item list down to a Single Technology That Will Rock 2010 :-)

Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:

The Tablet: a new form factor addition re. Internet and Web application hosts which is just another way of saying: Linked Data will be accessible from Tablet applications.
Geo: GPS chips are now standard features of mobile phones, so geolocation is increasingly becoming a necessary feature for any killer app. Thus, GeoSpatial Linked Data and GeopSpatial Queries are going to be a critical success factor for any endeavor that seeks to engage mobile applications developers and ultimately their end-users. Basiacally, you want to be able to perform Esoteric Search from these devices of the form: Find Vendors of a Camcorder (e.g., with a Zoom Factor: Weight Ratio of X) within a 2km Radius of my current location. Or how many items from my WishList are available from a Vendor within a 2km radius of my current location. Conversely, provide Vendors with the ability to spot potential Customers within a 2km of a given "clicks & mortar" location (e.g. BestBuy store).
Realtime Search: Rich Structured Profiles that leverage standards such as FOAF and FOAF+SSL will enable Highly Personalized Realtime Search (HPRS) without compromisng privacy. Tecnically, this is about WebIDs securely bound to X.509 Certificates, providing access to verifiable and highly navigable Personal Profile Data Spaces that also double as personal search index entry points.
Chrome OS: Just another operating system for exploiting the burgeoning Web of Linked Data
HTML5: Courtesy of RDFa, just another mechanism for exposing Linked Data by making HTML+RDFa a bona fide markup for metadata (i.e., format for describing real world objects via their attribute-value graphs)
Mobile Video: Simplifies the production and sharing of Video annotations (comments, reviews etc.) en route to creating rich Linked Discourse Data Spaces.
Augmented Reality: Ditto
Mobile Transactions: As per points 1&2 above, Vendor Discovery and Transaction Conusmation will increasingly be driven by high SDQ applications. The "Funnel Effect" (more choices based on individual preferences) will be a critical success factor for any one operating in the Mobile Transaction realm. Note, without Linked Data you cannot deliver scalable solutions that handle the combined requirements of: SDQ, "Funnel Effect", and Mobile Device form factor, will simply maginify the importance of Web accessible Linked Data.
Android: An additional platform for items 1-8; basically, 2010 isn't going to be an iPhone only zone. Personally, this reminds me of a battle from the past i.e., Microsoft vs Apple, re. desktop computing dominance. Google has studied history very well :-)
Social CRM: this is simply about applying points 1-9 alongide the construction of Linked Data from eCRM Data Spaces.

As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:

Data Item or Object Identity
Data Structure -- Data Models
Data Representation -- Data Model Entity & Relationships Representation mechanism (as delivered by metadata oriented markup)
Data Storage -- Database Management Systems
Data Access -- Data Access Protocols
Data Presentation -- How you present Views and Reports from Structured Data Sources
Data Security -- Data Access Policies

The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.

Conclusion

I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.

Getting The Linked Data Value Pyramid Layers Right (Update #2)

Sun, 31 Jan 2010 22:47:04 GMT

One of the real problems that pervades all routes to Linked Data value prop. incomprehension stems from the layering of its value pyramid; especially when communicating with -initially detached- end-users.

Note to Web Programmers: Linked Data is about Data (Wine) and not about Code (Fish). Thus, it isn't a "programmer only zone", far from it. More than anything else, its inherently inclusive and spreads its participation net widely across: Data Architects, Data Integrators, Power Users, Knowledge Workers, Information Workers, Data Analysts, etc.. Basically, everyone that can "click on a link" is invited to this particular party; remember, it is about "Linked Data" not "Linked Code", after all. :-)

Problematic Value Pyramid Layering

Here is an example of a Linked Data value pyramid that I am stumbling across --with some frequency-- these days (note: 1 being the pyramid apex):

SPARQL Queries
RDF Data Stores
RDF Data Sets
HTTP scheme URIs

Basically, Linked Data deployment (assigning de-referencable HTTP URIs to DBMS records, their attributes, and attribute values [optionally] ) is occurring last. Even worse, this happens in the context of Linked Open Data oriented endeavors, resulting in nothing but confusion or inadvertent perpetuation of the overarching pragmatically challenged "Semantic Web" stereotype.

As you can imagine, hitting SPARQL as your introduction to Linked Data is akin to hitting SQL as your introduction to Relational Database Technology, neither is an elevator-style value prop. relay mechanism.

In the relational realm, killer demos always started with desktop productivity tools (spreadsheets, report-writers, SQL QBE tools etc.) accessing, relational data sources en route to unveiling the "Productivity" and "Agility" value prop. that such binding delivered i.e., the desktop application (clients) and the databases (servers) are distinct, but operating in a mutually beneficial manner to all, courtesy of a data access standards such as ODBC (Open Database Connectivity).

In the Linked Data realm, learning to embrace and extend best practices from the relational dbms realm remains a challenge, a lot of this has to do with hangovers from a misguided perception that RDF databases will somehow completely replace RDBMS engines, rather than compliment them. Thus, you have a counter productive variant of NIH (Not Invented Here) in play, taking us to the dreaded realm of: Break the Pot and You Own It (exemplified by the 11+ year Semantic Web Project comprehension and appreciation odyssey).

From my vantage point, here is how I believe the Linked Data value pyramid should be layered, especially when communicating the essential value prop.:

HTTP URLs -- LINKs to documents (Reports) that users already appreciate, across the public Web and/or Intranets
HTTP URIs -- typically not visually distinguishable from the URLs, so use the Data exposed by de-referencing a URL to show how each Data Item (Entity or Object) is uniquely identified by a Generic HTTP URI, and how clicking on the said URIs leads to more structured metadata bearing documents available in a variety of data representation formats, thereby enabling flexible data presentation (e.g., smarter HTML pages)
SPARQL -- when a user appreciates the data representation and presentation dexterity of a Generic HTTP URI, they will be more inclined to drill down an additional layer to unravel how HTTP URIs mechanically deliver such flexibility
RDF Data Stores -- at this stage the user is now interested data sources behind the Generic HTTP URIs, courtesy of natural desire to tweak the data presented in the report; thus, you now have an engaged user ready to absorb the "How Generic HTTP URIs Pull This Off" message
RDF Data Sets -- while attempting to make or tweak HTTP URIs, users become curious about the actual data loaded into the RDF Data Store, which is where data sets used to create powerful Lookup Data Spaces (e.g., DBpedia) come into play such as those from the LOD constellation as exemplified by DBpedia (extractions from Wikipedia).

What is the DBpedia Project? (Updated)

Sun, 31 Jan 2010 22:46:10 GMT

The recent Wikipedia imbroglio centered around DBpedia is the fundamental driver for this particular blog post. At time of writing this blog post, the DBpedia project definition in Wikipedia remains unsatisfactory due to the following shortcomings:

inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.

Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.

What is DBpedia?

A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.

Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.

When was it Created?

As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.

Who's Behind It?

OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).

How is it Constructed?

The steps are as follows:

RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia

In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.

Why is it Important?

It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.

How Do I Use it?

In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.

What Can I Use it For?

Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:

List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
Tagging - as a compliment to existing practices
Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.

Pre-loaded and Pre-configured instances of DBpedia 3.4 - via publicly shared Amazon Elastic Block Storage Snapshots
Virtuoso & DBpedia Tunning Guide
What's In a Name & The Linked Data Police.

Getting The Linked Data Value Pyramid Layers Right (Update #2)

Mon, 01 Feb 2010 14:02:14 GMT

Problematic Value Pyramid Layering

Here is an example of a Linked Data value pyramid that I am stumbling across --with some frequency-- these days (note: 1 being the pyramid apex):

SPARQL Queries
RDF Data Stores
RDF Data Sets
HTTP scheme URIs

From my vantage point, here is how I believe the Linked Data value pyramid should be layered, especially when communicating the essential value prop.:

HTTP URLs -- LINKs to documents (Reports) that users already appreciate, across the public Web and/or Intranets
HTTP URIs -- typically not visually distinguishable from the URLs, so use the Data exposed by de-referencing a URL to show how each Data Item (Entity or Object) is uniquely identified by a Generic HTTP URI, and how clicking on the said URIs leads to more structured metadata bearing documents available in a variety of data representation formats, thereby enabling flexible data presentation (e.g., smarter HTML pages)
SPARQL -- when a user appreciates the data representation and presentation dexterity of a Generic HTTP URI, they will be more inclined to drill down an additional layer to unravel how HTTP URIs mechanically deliver such flexibility
RDF Data Stores -- at this stage the user is now interested data sources behind the Generic HTTP URIs, courtesy of natural desire to tweak the data presented in the report; thus, you now have an engaged user ready to absorb the "How Generic HTTP URIs Pull This Off" message
RDF Data Sets -- while attempting to make or tweak HTTP URIs, users become curious about the actual data loaded into the RDF Data Store, which is where data sets used to create powerful Lookup Data Spaces (e.g., DBpedia) come into play such as those from the LOD constellation as exemplified by DBpedia (extractions from Wikipedia).

What is the DBpedia Project? (Updated)

Wed, 15 Sep 2010 22:10:51 GMT

inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.

Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.

What is DBpedia?

When was it Created?

Who's Behind It?

How is it Constructed?

The steps are as follows:

RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia

In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist without a fully populated SPARQL compliant Quad Store. Last but not least, it doesn't exist if you have a fully loaded SPARQL compliant Quad Store isn't up to the cocktail of challenges (query load and complexity) presented by live Web database accessibility.

Why is it Important?

How Do I Use it?

In the most basic sense, simply browse the HTML based resource decriptor pages en route to discovering erstwhile undiscovered relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records), comprised of HTTP URIs from both realms e.g., via owl:sameAs relations.

What Can I Use it For?

Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:

List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
Tagging - as a compliment to existing practices
Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.

Pre-loaded and Pre-configured instances of DBpedia 3.4 - via publicly shared Amazon Elastic Block Storage Snapshots
Virtuoso & DBpedia Tunning Guide
What's In a Name & The Linked Data Police.

The URI, URL, and Linked Data Meme's Generic HTTP URI (Updated)

Sun, 28 Mar 2010 16:19:00 GMT

Situation Analysis

As the "Linked Data" meme has gained momentum you've more than likely been on the receiving end of dialog with Linked Open Data community members (myself included) that goes something like this:

"Do you have a URI", "Get yourself a URI", "Give me a de-referencable URI" etc..

And each time, you respond with a URL -- which to the best of your Web knowledge is a bona fide URI. But to your utter confusion you are told: Nah! You gave me a Document URI instead of the URI of a real-world thing or object etc..

What's up with that?

Well our everyday use of the Web is an unfortunate conflation of two distinct things, which have Identity: Real World Objects (RWOs) & Address/Location of Documents (Information bearing Resources).

The "Linked Data" meme is about enhancing the Web by unobtrusively reintroducing its core essence: the generic HTTP URI, a vital piece of Web Architecture DNA. Basically, its about so realizing the full capabilities of the Web as a platform for Open Data Identification, Definition, Access, Storage, Representation, Presentation, and Integration.

What is a Real World Object?

People, Places, Music, Books, Cars, Ideas, Emotions etc..

What is a URI?

A Uniform Resource Identifier. A global identifier mechanism for network addressable data items. Its sole function is Name oriented Identification.

URI Generic Syntax

The constituent parts of a URI (from URI Generic Syntax RFC) are depicted below:

What is a URL?

A location oriented HTTP scheme based URI. The HTTP scheme introduces a powerful and inherent duality that delivers:

Resource Address/Location Identifier
Data Access mechanism for an Information bearing Resource (Document, File etc..)

So far so good!

What is an HTTP based URI?

The kind of URI Linked Data aficionados mean when they use the term: URI.

An HTTP URI is an HTTP scheme based URI. Unlike a URL, this kind of HTTP scheme URI is devoid of any Web Location orientation or specificity. Thus, Its inherent duality provides a more powerful level of abstraction. Hence, you can use this form of URI to assign Names/Identifiers to Real World Objects (RWO). Even better, courtesy of the Identity/Address duality of the HTTP scheme, a single URI can deliver the following:

RWO Identfier/Name
RWO Metadata document Locator (courtesy of URL aspect)
Negotiable Representation of the Located Document (courtesy of HTTP's content negotiation feature).

What is Metadata?

Data about Data. Put differently, data that describes other data in a structured manner.

How Do we Model Metadata?

The predominant model for metadata is the Entity-Attribute-Value + Classes & Relationships model (EAV/CR). A model that's been with us since the inception of modern computing (long before the Web).

What about RDF?

The Resource Description Framework (RDF) is a framework for describing Web addressable resources. In a nutshell, its a framework for adding Metadata bearing Information Resources to the current Web. Its comprised of:

Entity-Attribute-Value (aka. Subject-Predictate-Object) plus Classes & Relationships (Data Dictionaries e.g., OWL) metadata model
A plethora of instance data representation formats that include: RDFa (when doing so within (X)HTML docs), Turtle, N3, TriX, RDF/XML etc.

What's the Problem Today?

The ubiquitous use of the Web is primarily focused on a Linked Mesh of Information bearing Documents. URLs rather than generic HTTP URIs are the prime mechanism for Web tapestry; basically, we use URLs to conduct Information -- which is inherently subjective -- instead of using HTTP URIs to conduct "Raw Data" -- which is inherently objective.

Note: Information is "data in context", it isn't the same thing as "Raw Data". Thus, if we can link to Information via the Web, why shouldn't we be able to do the same for "Raw Data"?

How Does the Link Data meme solve the problem?

The meme simply provides a set of guidelines (best practices) for producing Web architecture friendly metadata. Meaning: when producing EAV/CR model based metadata, endow Subjects, their Attributes, and Attribute Values (optionally) with HTTP URIs. By doing so, a new level of Link Abstraction on the Web is possible i.e., "Data Item to Data Item" level links (aka hyperdata links). Even better, when you de-reference a RWO hyperdata link you end up with a negotiated representations of its metadata.

Conclusion

Linked Data is ultimately about an HTTP URI for each item in the Data Organization Hierarchy :-)

History of how "Resource" became part of URI - historic account by TimBL
Linked Data Design Issues Document - TimBL's initial Linked Data Guide
Linked Data Rules Simplified - My attempt at simplifying the Linked Data Meme without SPARQL & RDF distraction
Linked Data & Identity - another related post
The Linked Data Meme's Value Proposition
So What Does "HREF" stand for anyway?
My Del.icio.us hosted Bookmark Data Space for Identity Schemes
TimBL's Ted Talk re. "Raw Linked Data"
Resource Oriented Architecture
More Famous Than Simon Cowell .

Exploring the Value Proposition of Linked Data

Fri, 24 Jul 2009 12:20:01 GMT

What is Linked Data?

The primary topic of a meme penned by TimBL in the form of a Design Issues Doc (note: this is how TimBL has shared his thoughts since the Beginning of the Web).

There are a number of dimensions to the meme, but its primary purpose is the reintroduction of the HTTP URI -- a vital component of the Web's core architecture.

What's Special about HTTP URIs?

They possess an intrinsic duality that combines persistent and unambiguous Data Identity with platform & representation format independent Data Access. Thus, you can use a string of characters that look like a contemporary Web URL to unambiguously achieve the following:

Identity or Name Anything of Interest
Describe Anything of Interest by associating the Description Subject's Identity with a constellation of Attribute and Value pairs (technically: an Entity-Attribute-Value or Subject-Predicate-Object graph)
Make the Description of Named Things of Interest discoverable on the Web by implicitly binding the aforementioned to Documents that hold their descriptions (technically: metadata documents or information resources)

What's the basic value proposition of the Linked Data meme?

Enabling more productive use of the Web by users and developers alike. All of which is achieved by tweaking the Web's Hyperlinking feature such that it now includes Hypertext and Hyperdata as link types.

Note: Hyperdata Linking is simply what an HTTP URI facilitates.

Examples problems solved by injecting Linked Data into the Web:

Federated Identity by enabling Individuals to unambiguously Identify themselves (Profiles++) courtesy of existing Internet and Web protocols (e.g., FOAF+SSL's WebIDs which combine Personal Identity with X.509 certificates and HTTPs based client side certification)
Security and Privacy challenge alleviation by delivering a mechanism for policy based data access that feeds off federated individual identity and social network (graph) traversal
Spam Busting via the above
Increasing the Serendipitous Discovery Quotient (SDQ) of Web accessible resources by embedding Rich Metadata into (X)HTML Documents e.g., structured descriptions of your "WishLists" and "OfferLists" via a common set of terms offered by vocabularies such as GoodRelations and SIOC
Coherent integration of disparate data across the Web and/or within the Enterprise via "Data Meshing" rather than "Data Mashing"
Moving beyond imprecise statistically driven "Keyword Search" (e.g. Page Rank) to "Precision Find" driven by typed link based Entity Rank plus Entity Type and Entity Property filters.

Conclusion

If all of the above still falls into the technical mumbo-jumbo realm, then simply consider Linked Data as delivering Open Data Access in granular form to Web accessible data -- that goes beyond data containers (documents or files).

The value proposition of Linked Data is inextricably linked to the value proposition of the World Wide Web. This is true, because the Linked Data meme is ultimately about an enhancement of the current Web; achieved by reintroducing its architectural essence -- in new context -- via a new level of link abstraction, courtesy of the Identity and Access duality of HTTP URIs.

As a result of Linked Data, you can now have Links on the Web for a Person, Document, Music, Consumer Electronics, Products & Services, Business Opening & Closing Hours, Personal "WishLists" and "OfferList", an Idea, etc.. in addition to links for Properties (Attributes & Values) of the aforementioned. Ultimately, all of these links will be indexed in a myriad of ways providing the substrate for the next major period of Internet & Web driven innovation, within our larger human-ingenuity driven innovation continuum.

Recipes for Describing Your Business and its Offerings using the GoodRelations Vocabulary / Schema
Solving Real Problems with RDF based Linked Data
Other Linked Data Posts from this Blog oriented Linked Data Space (goes back a few years!)
Various practical Linked Data demo links from my Del.icio.us Bookmark oriented Data Space
My personal WebID which is conduit to a Linked Data mesh covering vast variety of things I've opted to share with others via the Web (best viewed using a Linked Data aware User Agent like ODE).

Simple Explanation of RDF and Linked Data Dynamics

Fri, 24 Apr 2009 21:14:41 GMT

What is RDF?

The acronym stands for: Resource Description Framework. And that's just what it is.

RDF is comprised of a Data Model (EAV/CR Graph) and Data Representation Formats such as: N3, Turtle, RDF/XML etc.

RDF's essence is about: "Entities" and "Attributes" being URI based, while "Values" may be URI or Literals (typed or untyped) based.

URIs are Entity Identifiers.

What is Linked Data?

Short for "Web of Linked Data" or "Linked Data Web".

A term coined by TimBL that describes an HTTP based "data access by reference pattern" that uses a single pointer or handle for "referring to" and "obtaining actual data about" an entity.

Linked Data uses the deceptively simple messaging scheme of HTTP to deliver a granular entity reference and access mechanism that transcends traditional computing boundaries such as: operating system, application, database engines, and networks.

How are Linked Data & RDF Related?

Linked Data simply mandates the following re. RDF:

URIs should be HTTP based so that you can "refer to" (Reference) an Entity, its Attributes, or URI based Attribute values via the Web (infact any HTTP based network e.g., Intranets and Extranets)
URIs should also be HTTP based so that you can use them to de-reference resource descriptions via the Web (or Intranets and Extranets).

Note: by Entity I am also referring to: a resource (Web parlance), data item, data object, real-world object, or datum.

Linked Data is also about, using URIs and HTTP's content negotiation feature to separate: presentation, representation, access, and identity of data items. Even better, content negotiation can be driven by user agent and/or data server based quality of service algorithms (representation preference order schemes).

To conclude, Linked Data is ultimately about the realization that: Data is the new Electricity, and it's conductors are URIs :-)

Tip to governments of the world: we are in exponential times, the current downturn is but one side of the "exponential times ledger", the other side of the "exponential times ledger" is simply about unleashing "raw data" -- in structured form -- into the Web, so that "citizen analysts" can blossom and ultimately deliver the transparency desperately sought at every level of the economic value chain. Think: "raw data ready" whenever you ponder about "shovel ready" infrastructure projects!

How Linked Data will change Advertising

Wed, 25 Mar 2009 12:30:58 GMT

This post is a reply to Jason Kolb's post titled: Using Advertising to Take Over the World. Jason's post is a response to Robert Scoble's post titled: Why Facebook has never listened and why it definitely won’t start now.

Jason:

Scoble is sensing what comes next, but in my opinion, describes it using an old obtrusive advertising model anecdote.

I've penned a post or two about the "Magic of You" which is all about the new Web power broker (Entity: "You").

Personally, I've long envisaged a complete overhaul of advertising where obtrusive advertising simply withers away; ultimately replaced by an unobtrusive model that is driven by individualized relevance and high doses of serendipity. Basically, this is ultimately about "taking the Ad out of item placement in Web pages".

The fundamental ingredients of an unobtrusive advertising landscape would include the following Human facts:

We are social beings and need stuff from time to time
We know what we need and would like to "Find stuff" when we are in "I Need Stuff" mode.

Ideally, we would like to be able to simply state the following, via a Web accessible profile:

Here are my "Wants" or "Needs" (my Wish-List)
Here are the products and services that I "Offer" (my Offer-List).

Now put the above into the context of an evolving Web where data items are becoming more visible by the second, courtesy of the "Linked Data" meme. Thus, things that weren't discernable via the Web: "People", "Places", "Music", "Books", "Products", etc., become much easier to identify and describe.

Assuming the comments above hold true re. the Web's evolution into a collection of Linked Data Spaces, and the following occur:

Structured profile pages become the basic units of Web presence
Wish-Lists and Offer-Lists are exposed by profile pages

Wish-Lists and Offer-Lists will gradually start bonding with increasing degrees of serendipity courtesy of exponential growth in Linked Data Web density.

So based on what I've stated so far, Scoble would simply browse the Web or visit his profile page, and in either scenario enjoy a "minority report" style of experience albeit all under his control (since he is the one driving his Web user agent).

What I describe above simply comes down to "Wish-lists" and associated recommendations becoming the norm outside the confines of Amazon's data space on the Web. Serendipitous discovery, intelligent lookups, and linkages are going to be the fundamental essence of Linked Data Web oriented applications, services, agents.

Beyond Scoble, it's also important to note that access to data will be controlled by entity "You". Your data space on the Web will be something you will controll access to in a myriad of ways, and it will include the option to provide licensed access to commercial entities on your terms. Naturally, you will also determine the currency that facilitates the value exchange :-)

Simple Compare & Contrast of Web 1.0, 2.0, and 3.0 (Update 1)

Wed, 29 Apr 2009 17:21:25 GMT

Here is a tabulated "compare and contrast" of Web usage patterns 1.0, 2.0, and 3.0.

	Web 1.0	Web 2.0	Web 3.0
Simple Definition	Interactive / Visual Web	Programmable Web	Linked Data Web
Unit of Presence	Web Page	Web Service Endpoint	Data Space (named structured data enclave)
Unit of Value Exchange	Page URL	Endpoint URL for API	Resource / Entity / Object URI
Data Granularity	Low (HTML)	Medium (XML)	High (RDF)
Defining Services	Search	Community (Blogs to Social Networks)	Find
Participation Quotient	Low	Medium	High
Serendipitous Discovery Quotient	Low	Medium	High
Data Referencability Quotient	Low (Documents)	Medium (Documents)	High (Documents and their constituent Data)
Subjectivity Quotient	High	Medium (from A-list bloggers to select source and partner lists)	Low (everything is discovered via URIs)
Transclusence	Low	Medium (Code driven Mashups)	HIgh (Data driven Meshups)
What You See Is What You Prefer (WYSIWYP)	Low	Medium	High (negotiated representation of resource descriptions)
Open Data Access (Data Accessibility)	Low	Medium (Silos)	High (no Silos)
Identity Issues Handling	Low	Medium (OpenID)	High (FOAF+SSL)
Solution Deployment Model	Centralized	Centralized with sprinklings of Federation	Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia)
Data Model Orientation	Logical (Tree based DOM)	Logical (Tree based XML)	Conceptual (Graph based RDF)
User Interface Issues	Dynamically generated static interfaces	Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath)	Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF
Data Querying	Full Text Search	Full Text Search	Full Text Search + Structured Graph Pattern Query Language (SPARQL)
What Each Delivers	Democratized Publishing	Democratized Journalism & Commentary (Citizen Journalists & Commentators)	Democratized Analysis (Citizen Data Analysts)
Star Wars Edition Analogy	Star Wars (original fight for decentralization via rebellion)	Empire Strikes Back (centralization and data silos make comeback)	Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation")

Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)

Web 3.0 The Best Official Definition Imaginable -- Nova Spivack's

Time for RDBMS Primacy Downgrade is Nigh! (No Embedded Images Edition - Update 1)

Tue, 17 Mar 2009 15:50:58 GMT

As the world works it way through a "once in a generation" economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the apex of the data access and data management pyramid is nigh.

What is the Data Access, and Data Management Value Pyramid?

As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.

See: AVF Pyramid Diagram.

The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.

In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.

Why has RDBMS Primacy has Endured?

Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.

See: RDBMS Primacy Diagram.

For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:

"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."
-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.

"One size fits all: A concept whose time has come and gone

They are direct descendants of System R and Ingres and were architected more than 25 years ago

They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.

-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.

Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".

Circumstantial Pain

As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).

Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:

Government (Globally) -

Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.

Enterprises -

Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.

In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.

Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).

Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.

Technology

There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:

Query language standardization - nothing close to SQL standardization
Data Access API standardization - nothing close to ODBC, JDBC, OLE-DB, or ADO.NET
Wire protocol standardization - nothing close to HTTP
Distributed Identity infrastructure - nothing close to the non-repudiatable digital Identity that foaf+ssl accords
Use of Identifiers as network based pointers to data sources - nothing close to RDF based Linked Data
Negotiable data representation - nothing close to Mime and HTTP based Content Negotiation
Scalability especially in the era of Internet & Web scale.

Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models

A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.

What Comes Next?

The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:

The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore e.g., the current global economic crisis remains centered on the inability to connect dots across "Open World" and "Closed World" data frontiers
Entity-Attribute-Value with Classes & Relationships (EAV/CR) based DBMS models are more effective when dealing with disparate data associated with disparate schemas, across disparate DBMS engines, host operating systems, and networks.

Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:

Every item of data (Datum/Entity/Object/Resource) has Identity
Identity is achieved via Identifiers that aren't locked at the DBMS, OS, Network, or Application levels
Object Identifiers and Object values are independent (extricably linked by association)
Object values should be de-referencable via Object Identifier
Representation of de-referenced value graph (entity, attributes, and values mesh) must be negotiable (i.e. content negotiation)
Structured query language must provide mechanism for Creation, Deletion, Updates, and Querying of data objects
Performance & Scalability across "Closed World" (enterprise) and "Open World" (Internet & Web) realms.

Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.

The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.

It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.

EAV/CR Oriented Data Access & Management Technology

Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:

Resource Description Framework (RDF) - an EAV/CR based framework
RDF Linked Data - EAV/CR based framework that mandates de-referencable HTTP based Identifiers
ADO.NET Entity Frameworks - Microsoft .NET based EAV/CR framework
Core Data Services - Mac OS X based EAV/CR framework that evolved from NeXT's Enterprise Object Frameworks (EOF).

The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.

See: New EAV/CR Primacy Diagram.

The Time for RDBMS Primacy Downgrade is Nigh!

Wed, 03 Jun 2009 22:09:58 GMT

What is the Data Access, and Data Management Value Pyramid?

Why has RDBMS Primacy has Endured?

"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."
-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.

"One size fits all: A concept whose time has come and gone

They are direct descendants of System R and Ingres and were architected more than 25 years ago

They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.

-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.

Circumstantial Pain

Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:

Government (Globally) -

Enterprises -

Technology

Query language standardization - nothing close to SQL standardization
Data Access API standardization - nothing close to ODBC, JDBC, OLE-DB, or ADO.NET
Wire protocol standardization - nothing close to HTTP
Distributed Identity infrastructure - nothing close to the non-repudiatable digital Identity that foaf+ssl accords
Use of Identifiers as network based pointers to data sources - nothing close to RDF based Linked Data
Negotiable data representation - nothing close to Mime and HTTP based Content Negotiation
Scalability especially in the era of Internet & Web scale.

Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models

What Comes Next?

The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore e.g., the current global economic crisis remains centered on the inability to connect dots across "Open World" and "Closed World" data frontiers
Entity-Attribute-Value with Classes & Relationships (EAV/CR) based DBMS models are more effective when dealing with disparate data associated with disparate schemas, across disparate DBMS engines, host operating systems, and networks.

Every item of data (Datum/Entity/Object/Resource) has Identity
Identity is achieved via Identifiers that aren't locked at the DBMS, OS, Network, or Application levels
Object Identifiers and Object values are independent (extricably linked by association)
Object values should be de-referencable via Object Identifier
Representation of de-referenced value graph (entity, attributes, and values mesh) must be negotiable (i.e. content negotiation)
Structured query language must provide mechanism for Creation, Deletion, Updates, and Querying of data objects
Performance & Scalability across "Closed World" (enterprise) and "Open World" (Internet & Web) realms.

Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.

EAV/CR Oriented Data Access & Management Technology

Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:

Resource Description Framework (RDF) - an EAV/CR based framework
RDF Linked Data - EAV/CR based framework that mandates de-referencable HTTP based Identifiers
ADO.NET Entity Frameworks - Microsoft .NET based EAV/CR framework
Core Data Services - Mac OS X based EAV/CR framework that evolved from NeXT's Enterprise Object Frameworks (EOF).

The Semantic Way - Alan Cho's Summary of PwC 2009 tech forecast report on the Semantic Web
Is the RDBMS Doomed - ReadWriteWeb Article
Anti-RDBMS: a list of Distributed Key-Value Stores - by Richard Jones (CTO Last.FM)
How & Why Glue is Using Amazon SimpleDB
Object Database Manifesto (Identity excerpt)
Database Models Overview
Ted Nelson Explaining Irregularity and Idiosyncrasy of Data Structures - ZigZag Demo

In Response to: This is Not the Future (Update #3)

Thu, 22 Jan 2009 00:02:47 GMT

As I cannot post directly to Glenn's blog titled: This is Not the Near Future (Either), I have to basically respond to him here, in blog post form :-(

What is our "Search" and "Find" demonstration about? It is about how you use the "Description" of "Things" to unambiguously locate things in a database at Web Scale.

To our perpetual chagrin, we are trying to demonstrate an engine -- not UI prowess -- but the immediate response is to jump to the UI aesthetics.

Google, Yahoo etc.. offer a simple input form for full text search patterns, they have a processing window for completing full text searches across Web Content indexed on their servers. Once the search patterns are processed, you get a page ranked result set (collection of Web pages basically that claim/state: we found N pages out of a document corpus of about M indexed pages).

Note: the estimate aspect of traditional search results in like "advertising small print" the user lives with the illusion that all possible documents on the Web (or even Internet) have been searched whereas in reality: 25% of the possible total is a major stretch; since the Web and Internet are fractal networks and scale-free, inherently growing at exponential rates "ad infinitum" across boundless dimensions of human comprehension.

The power of Linked Data ultimately comes down to the fact that the user constructs the path to what they seek via the properties of the "Things" in question. The routes are not hardwired since URI de-referencing (follow your nose pattern) is available to Linked Data aware query engines and crawlers.

We are simply trying to demonstrate how you can combine the best of full text search with the best of structured querying while reusing familiar interaction patterns from Google/Yahoo. Thus, you start with full text search, find get all the entities associated with the pattern, then use the entity types or entity properties to find what you seek.

You state in your post:

"To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed."

Correct.

"On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?".

Not exactly correct. We are performing aggregates using a configurable interactive time factor. Example: tell me how many entities of type: Person, with interest: Semantic Web, exist in this database within 2 seconds. Also understand that you could retry the same query and get different numbers within the same interactive time factor. It isn't your basic "query cut-off".

"And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can."

Yes, "Microsoft" was a poor example for sure, the example could have been pattern: "glenn mcdonald", which should demonstrate the fundamental utility of what we are trying to demonstrate i.e., entity disambiguation courtesy of entity properties and/or entity type filtering.

Compare Googles results for: Glenn McDonald with those from our demo (which dissambiguate "Glenn McDonald" via associated properties and/or types), assuming we both agree that your Web Site or Blog Home isn't the center of your entity graph or personal data space (i.e., data about you); so getting your home page at the top of the Google page rank offers limited value, in reality.

What are we bragging about? A little more than what you attempt to explain. Yes, we are showing that we can find stuff within a processing window, but understand the following:

Processing Time Window (or interactive time) is configurable
Data Corpus is a Billion+ Triples (from Billion Triples Challenge Data Set)
SPARQL doesn't have Aggregation capabilities by default (we have implemented SPARQL-BI to deliver aggregates for analytics against large data sets, we even handle the TPC-H industry standard benchmark with SPARQL-BI)
Paging isn't possible without aggregates, and doing aggregates on a Billion+ triples as part of a query processing cycle isn't trivial stuff (otherwise it would be everywhere due to inherent and obvious necessity).

I hope I've clarified what's going on with our demo? If not, pose your challenge via examples and I will respond with solutions or simply cry out loud: "no mas!".

As for your "Mac OX X Leopard" comments, I can only say this: I emphasized that this is a demo, the data is pretty old, and the input data has issues (i.e. some of the input data is bad as your example shows). The purpose of this demo is not about the text per se., it's about the size of the data corpus and faceted querying. We are going to have the entire LOD Cloud loaded into the real thing, and in addition to that our Sponger Middleware will be enabled, and then you can take issue with data quality as per your reference to "Cyndi Lauper" (btw - it takes one property filter to find information about her quickly using "dbpprop:name" after filtering for properties with text values).

Of all things, this demo had nothing to do with UI and Information presentation aesthetics. It was all about combining full text search and structured queries (sparql behind the scenes) against a huge data corpus en route to solving challenges associated with faceted browsing over large data sets. We have built a service that resides inside Virtuoso. The Service is naturally of the "Web Service" variety and can be used from any consumer / client environment that speaks HTTP (directly or indirectly).

To be continued ...

A Linked Data Web Approach To Semantic "Search" & "Find" (Updated)

Sat, 10 Jan 2009 18:55:56 GMT

The first salvo of what we've been hinting about re. server side faceted browsing over Unlimited Data within configurable Interactive Time-frames is now available for experimentation at: http://b3s.openlinksw.com/fct/facet.vsp.

Simple example / demo:

Enter search pattern: Microsoft

You will get the usual result from a full text pattern search i.e., hits and text excerpts with matching patterns in boldface. This first step is akin to throwing your net out to sea while fishing.

Now you have your catch, what next? Basically, this is where traditional text search value ends since regex or xpath/xquery offer little when the structure of literal text is the key to filtering or categorization based analysis of real-world entities. Naturally, this is where the value of structured querying of linked data starts, as you seek to use entity descriptions (combination of attribute and relationship properties) to "Find relevant things".

Continuing with the demo.

Click on "Properties" link within the Navigation section of the browser page which results in a distillation and aggregation of the properties of the entities associated with the search results. Then use the "Next" link to page through the properties until to find the properties that best match what you seek. Note, this particular step is akin to using the properties of the catch (using fishing analogy) for query filtering, with each subsequent property link click narrowing your selection further.

Using property based filtering is just one perspective on the data corpus associated with the text search pattern; thus, you can alter perspectives by clicking on the "Class" link so that you can filter you search results by entity type. Of course, in a number of scenarios you would use a combination of entity types and entity properties filters to locate the entities of interest to you.

A Few Notes about this demo instance of Virtuoso:

Lookup Data Size (Local Linked Data Corpus): 2 Billion+ Triples (entity-attribute-value tuples)
This is a *temporary* teaser / precursor to the LOD (Linking Open Data Cloud) variant of our Linked Data driven "Search" & "Find" service; we decided to implement this functionality prior to commissioning a larger and more up to date instance based on the entire LOD Cloud
The browser is simply using a Virtuoso PL function that also exists in Web Service form for loose binding by 3rd parties that have a UI orientation and focus (our UI is deliberately bare boned).
The properties and entity types (classes) links expose formal definitions and dictionary provenance information materialized in an HTML page (of course your browser or any other HTTP user agent can negotiation alternative representations of this descriptive information)
UMBEL based inference rules are enabled, giving you a live and simple demonstration of the virtues of Linked Data Dictionaries for example: click on the description link of any property or class from the foaf (friend-of-a-friend vocabulary), sioc (semantically-interlinked-online-communities ontology), mo (music ontology), bibo (bibliographic data ontology) namespaces to see how the data between these lower level vocabularies or ontologies are meshed with OpenCyc's upper level ontology.

Is Linked Data Always Relevant?

Wed, 31 Dec 2008 17:57:41 GMT

I pose the question above because I stumbled across an interesting claim about OpenLink Software and its representatives expressed in the ReadWriteWeb post titled: XBRL: Mashing Up Financial Statements, where the following claim is made:

"..There is evidence that they promote LINKED DATA at any expense without understanding the rationale behind other approaches...".

To answer the question above, Linked Data is always relevant as long as we are actually talking about "Data" which is simply the case all of the time, irrespective of interaction medium.

If XBRL can be disconnected in anyway from Linked Data, I desperately would like to be enlightened (as per my comments to the post). Why wouldn't anyone desire the ability to navigate the linked data inherent in any financial report? Every entity in an XBRL instance document is an entity, directly or indirectly related to other entities. Why "Mash" the data when you can harmonize XBRL data via a Generic Financial Dictionary (schema or ontology) such that descriptions of Balance Sheet, P&L, and other entities are navigable via their attributes and relationships? In short, why "Mash" (code based brute force joining across disparately shaped data) when you can "Mesh" (natural joining of structured data entities)?

"Linked Data" is about the ability to connect all our observations (data)? , perceptions (information), and inferences / conclusions (knowledge) across a spectrum of interaction media. And it just so happens that the RDF data model (Entity-Attribute-Vaue + Class Relationships + HTTP based Object Identifiers), a range of RDF data model serialization formats, and SPARQL (Query Language and Web Service combo) actually make this possible, in a manner consistent with the essence of the global space we know as the World Wide Web.

BBC's Britain from Above (core message: Data is Everything).

YODA & the Data FORCE

Tue, 20 Jul 2010 17:53:06 GMT

The original design document (by TimBL) that lead to the WWW (*an important read*) was very clear about the need to create an "information space" that connects heterogeneous data sources. Unfortunately, in trying to create a moniker to distinguish one aspect of the Web (the Linked Document Web) from the part that was overlooked (the Linked Data Web), we ended up with a project code name that's fundamentally a misnomer in the form of: "The Semantic Web".

If we could just take "The Semantic Web" moniker for what it was -- a code name for an aspect of the Web -- and move on, things will get much clearer, fast!

Basically, what is/was the "Semantic Web" should really have been code named: ("You" Oriented Data Access) as a play on: Yoda's appreciation of the FORCE (Fact ORiented Connected Entities) -- the power of inter galactic, interlinked, structured data, fashioned by the World Wide Web courtesy of the HTTP protocol.

As stated in a earlier post, the next phase of the Web is all about the magic of entity "You". The single most important item of reference to every Web user would be the Person Entity ID (URI). Just by remembering your Entity ID, you will have intelligent pathways across, and into, the FORCE that the Linked Data Web delivers. The quality of the pathways and increased density of the FORCE are the keys to high SDQ (tomorrows SEO). Thus, the SDQ of URIs will ultimately be the unit determinant of value to Web Users, along the following personal lines, hence the critical platform questions:

Does your platform give me Identity (a URI) with high SDQ?
Do the Data Source Names (URIs) in your Data Spaces deliver high SDQ?

While most industry commentators continue to ponder and pontificate about what "The Semantic Web" is (unfortunately), the real thing (the "FORCE") is already here, and self-enhancing rapidly.

Assuming we now accept the FORCE is simply an RDF based Linked Data moniker, and that RDF Linked Data is all about the Web as a structured database, we should start to move our attention over to practical exploitation of this burgeoning global database, and in doing so we should not discard knowledge from the past such as the many great examples available gratis from the Relational Database realm. For instance, we should start paying attention to the discovery, development, and deployment of high level tools such as query builders, report writers, and intelligence oriented analytic tools, none of which should -- at first point of interaction -- expose raw RDF or the SPARQL query language. Along similar lines of thinking, we also need development environments and frameworks that are counterparts to Visual Studio, ACCESS, File Maker, and the like.

Numerati & The Magic of You!

The Linked Data Market via a BCG Matrix (Updated)

Fri, 26 Sep 2008 16:36:56 GMT

The sweet spot of Web 3.0 (or any other Web.vNext moniker) is all about providing Web Users with a structured and interlinked data substrate that facilitates serendipitous discovery of relevant "Things" i.e., a Linked Data Web -- a Web of Linkable Entities that goes beyond documents and other information resource (data containers) types.

Understanding potential Linked Data Web business models, relative to other Web based market segments, is best pursued via a BCG Matrix diagram, such as the one I've constructed below:

Notes:

Link Density

Web 1.0's collection of "Web Sites" have relatively low link density relative to Web 2.0's user-activity driven generation of semi-structured linked data spaces (e.g., Blogs, Wikis, Shared Bookmarks, RSS/Atom Feeds, Photo Galleries, Discussion Forums etc..)
Semantic Technologies (i.e. "Semantics Inside style solutions") which are primarily about "Semantic Meaning" culled from Web 1.0 Pages also have limited linked density relative to Web 2.0
The Linked Data Web, courtesy of the open-ended linking capacity of URIs, matches and ultimately exceeds Web 2.0 link density.

Relevance

Web 1.0 and 2.0 are low relevance realms driven by hyperlinks to information resources ((X)HTML, RSS, Atom, OPML, XML, Images, Audio files etc.) associated with Literal Labels and Tagging schemes devoid of explicit property based resource description thereby making the pursuit of relevance mercurial at best
Semantic Technologies offer more relevance than Web 1.0 and 2.0 based on the increased context that semantic analysis of Web pages accords
The Linked Data Web, courtesy of URIs that expose self-describing data entities, match the relevance levels attained by Semantic Technologies.

Serendipity Quotient (SDQ)

Web 1.0 has next to no serendipity, the closest thing is Google's "I'm Feeling Lucky" button
Web 2.0 possess higher potential for serendipitous discovery than Web 1.0, but such potential is neutralized by inherent subjectivity due to its human-interaction-focused literal foundation (e.g., tags, voting schemes, wiki editors etc.)
Semantic Technologies produce islands-of-relevance with little scope for serendipitous discovery due to URI invisibility, since the prime focus is delivering more context to Web search relative to traditional Web 1.0 search engines.
The Linked Data Web's use of URIs as the naming and resolution mechanism for exposing structured and interlinked resources provides the highest potential for serendipitous discovery of relevant "Things"

To conclude, the Linked Data Web's market opportunities are all about the evolution of the Web into a powerful substrate that offers a unique intersection of "Link Density" and "Relevance", exploitable across horizontal and vertical market segments to solutions providers. Put differently, SDQ is how you take "The Ad" out of "Advertising" when matching Web users to relevant things :-)

Is the Semantic Web necessary (and feasible)?

Fri, 29 Aug 2008 15:08:12 GMT

Here is another "Linked Discourse" effort via a blog post that attempts to add perspective to a developing Web based conversation. In this case, the conversation originates from Juan Sequeda's recent interview with Jana Thompson titled: Is the Semantic Web necessary (and feasible)?

Jana: What are the benefits you see to the business community in adopting semantic technology?

Me: Exposure, exploitation, of untapped treasure trove of interlinked data, information, and knowledge across disparate IT infrastructure via conceptual entry points (Entity IDs / URIs / Data Source Names) that refer to as "Context Lenses".

Jana: Do you think these benefits are great enough for businesses to adopt the changes?

Me: Yes, infrastructural heterogeneity is a fact of corporate life (growth, mergers, acquisitions etc). Any technology that addresses these challenges is extremely important and valuable. Put differently, the opportunity costs associated with IT infrastructural heterogeneity remains high!

Jana: How large do you think this impact will actually be?

Me: Huge, enterprise have been aware of their data, information, and knowledge treasure troves etc. for eons. Tapping into these via a materialization of the "information at your fingertips" vision is something they've simply been waiting to pursue without any platform lock-in, for as long as I've been in this industry.

Jana: I’ve heard, from contacts in the Bay Area, that they are skeptical of how large this impact of semantic technology will actually be on the web itself, but that the best uses of the technology are for fields such as medical information, or as you mentioned, geo-spatial data.

Me: Unfortunately, those people aren't connecting the Semantic Web and open access to heterogeneous data sources, or the intrinsic value of holistic exploration location of entity based data networks (aka Linked Data).

Jana: Are semantic technologies going to be part of the web because of people championing the cause or because it is actually a necessary step?

Me: Linked Data technology on the Web is a vital extension of the current Web. Semantic Technology without the "Web" component, or what I refer to as "Semantics Inside only" solutions, simply offer little or no value as Web enhancements based on their incongruence with the essence of the Web i.e., "Open Linkage" and no Silos! A nice looking Silo is still a Silo.

Jana: In the early days of the web, there was an explosion of new websites, due to the ease of learning HTML, from a business to a person to some crackpot talking about aliens. Even today, CSS and XHTML are not so difficult to learn that a determined person can’t learn them from W3C or other tutorials easily. If OWL becomes the norm for websites, what do you think the effects will be on the web? Do you think it is easy enough to learn that it will be readily adopted as part of the standard toolkit for web developers for businesses?

Me: Correction, learning HTML had nothing to do with the Web's success. The value proposition of the Web simply reached critical mass and you simply couldn't afford to not be part of it. The easiest route to joining the Web juggernaut was a Web Page hosted on a Web Site. The question right now is: what's the equivalent driver for the Linked Data Web bearing in mind the initial Web bootstrap. My answer is simply this: Open Data Access i.e., getting beyond the data silos that have inadvertently emerged from Web 2.0.

Jana: Following the same theme, do you think this will lead to an internet full of corporate-controlled websites, with sites only written by developers rather than individuals?

Me: Not at all, we will have an Internet owned by it's participants i.e., You and the agents that work on your behalf.

Jana: So, you are imagining technologies such as Drupal or Wordpress, that allow users to manage sites without a great deal of knowledge of the nuts and bolts of current web technologies?

Me: Not at all! I envisage simple forms that provide conduits to powerful meshes of interlinked data spaces associated with Web users.

Jana: Given all of the buzz, and my own familiarity with ontology, I am just very curious if the semantic web is truly necessary?

Me:This question is no different than saying: I hear the Web is becoming a Database, and I wonder if a Data Dictionary is necessary, or even if access to structured data is necessary. It's also akin to saying: I accept "Search" as my only mechanism for Web interaction even though in reality, I really want to be able to "Find" and "Process" relevant things at a quicker rate than I do today, relative to the amount of information, and information processing time, at my disposal.

Jana: Will it be worth it to most people to go away from the web in its current form, with keyword searches on sites like Google, to a richer and more interconnected internet with potentially better search technology?

Me: As stated above, we need to add "Find" to the portfolio of functions we seek to perform against the Web. "Finding" and "Searching" are mutually inclusive pursuits at different ends of an activity spectrum.

Jana: For our more technical readers, I have a few additional questions: If no standardization comes about for mapping relational databases to domain ontologies, how do you see that as influencing the decisions about adoption of semantic technology by businesses? After all, the success of technology often lives or dies on its ease of adoption.

Me: Standardization of RDBMS to RDF Mapping is not the critical success factor here (of course it would be nice). As stated earlier, the issue of data integration that arises from IT infrastructural heterogeneity has been with decision makers in the enterprise for ever. The problem is now seeping into the broader consumer realm via Web ubiquity. The mistakes made in the enterprise realm are now playing out in the consumer Web realm. In both realms the critical success factors are:

Scalable productivity relative to exponential growth of data generated across Intranets, Extranets, and the Internet
Concept based Context Lenses that transcend logical and physical data heterogeneity by putting dereferencable URIs in front of the Line of Business Application Data and/or Web Data Spaces such as Blogs, Wikis, Discussion Forums etc.).

The Essence of the Matter re. Information Overload

Thu, 28 Aug 2008 19:56:20 GMT

The title of this post is an expression of my gut reaction to the quotes below, which originate from Leo Sauermann's post about the Nepomuk Semantic Desktop for KDE:

Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk's coordinator, explains, "The basic problem that we all face nowadays is how to handle vast amounts of information at a sensible rate." According to Bernardi, Nepomuk takes a traditional approach by creating a meta-data layer with well-defined elements that services can be built upon to create and manipulate the information.

The comment above echoes my sentiments about the imminence of "information overload" due to the vast amounts of user generated content on the Internet as a whole. We are going to need to process more an more data within a fixed 24 hour timeframe, while attempting to balance our professional and personal lives. Be rest assured, this is a very serious issue, and you cannot event begin to address it without a Web of Linked Data.

"The first idea of building the semantic desktop arose from the fact that one of our colleagues could not remember the girlfriends of his friends," Bernard says, more than half-seriously. "Because they kept changing -- you know how it is. The point is, you have a vast amount of information on your desktop, hidden in files, hidden in emails, hidden in the names and structures of your folders. Nepomuk gives a standard way to handle such information."

If you get a personal URI for Entity "You", via a Linked Data aware platform (e.g. OpenLink Data Spaces) that virtualizes data across your existing Web data spaces (blogs, feed subscriptions, wikis, shared bookmarks, photo galleries, calendars, etc.), you then only have to remember your URI whenever you need to "Find" something, imagine that!

To conclude, "information overload" is the imminent challenge of our time, and the keys to challenge alleviation lie in our ability to construct and maintain (via solutions) few context lenses (URIs) that provide coherent conduits into the dense mesh of structured Linked Data on the Web.

Crunchbase & Semantic Web Interview (Remix - Update 1)

Thu, 28 Aug 2008 00:35:15 GMT

After reading Bengee's interview with CrunchBase, I decided to knock up a quick interview remix as part of my usual attempt to add to the developing discourse.

CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?

Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).

CrunchBase: And what inspired you to create it?

Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).

Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.

Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.

CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?

Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.

CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?

Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.

CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?

Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).

SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.

CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?

Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).

Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.

CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?

Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.

Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:

The Future of the Desktop

Thu, 21 Aug 2008 19:59:25 GMT

Jason Kolb (who initially nudged me to chime in), and then ReadWriteWeb, and of course Nova's Twine about the topic, have collectively started an interesting discussion about Web.vNext (3.0 and beyond) under the heading: The Future of the Desktop.

My contribution to the developing discourse takes the form of a Q&A session. I've taken the questions posed and provided answers that express my particular points of view:

Q: Is the desktop of the future going to just be a web-hosted version of the same old-fashioned desktop metaphors we have today?

A: No, it's going to be a more Web Architecture aware and compliant variant exposed by appropriate metaphors.

Q: The desktop of the future is going to be a hosted web service

A: A vessel for exploiting the virtues of the Linked Data Web.

Q: The Browser is Going to Swallow Up the Desktop

A: Literally, of course not! Metaphorically, of course! And then the Browser metaphor will decomposes into function specific bits of Web interaction amenable to orchestration by its users.

Q: The focus of the desktop will shift from information to attention

A: No! Knowledge, Information, and Data sharing courtesy of Hyperdata & Hypertext Linking.

Q: Users are going to shift from acting as librarians to acting as daytraders

A: They were Librarians at Web 1.0, Journalist at Web 2.0, and Analysts in Web 3.0 (i.e, analyze structured and interlinked data), and CEOs in Web 4.0 (i.e. get Agents to do stuff intelligently en route to making decisions).

Q: The Webtop will be more social and will leverage and integrate collective intelligence

A: The Linked Data Web vessel will only require you to fill in your profile (once) and then serendipitous discovery and meshing of relevant data will simply happen (the serendipity quotient will grow in line with Linked Data Web density).

Q: The desktop of the future is going to have powerful semantic search and social search capabilities built-in

A: It is going to be able to "Find" rather than "Search" for stuff courtesy of the Linked Data Web.

Q: Interactive shared spaces will replace folders

A: Data Spaces and their URIs (Data Source Names) replace everything. You simply choose the exploration metaphor that best suits you space interaction needs.

Q: The Portable Desktop

A: Ubiquitous Desktop i.e. do the same thing (all answers above) on any device connected to the Web.

Q: The Smart Desktop

A: Vessels with access to Smart Data (Linked Data + Action driven Context sprinklings).

Q: Federated, open policies and permissions

A: More federation for sure, XMPP will become a lot more important, and OAuth will enable resurgence of the federated aspects of the Web and Internet.

Q: The personal cloud

A: Personal Data Spaces plugged into Clouds (Intranet, Extranet, Internet).

Q: The WebOS

A: An operating system endowed with traditional Database and Host Operating system functionality such as: RDF Data Model, SPARQL Query Language, URI based Pointer mechanism, and HTTP based message Bus.

Q: Who is most likely to own the future desktop?

A: You! And all you need is a URI (an ID or Data Source Name for "Entity You") and a Profile Page (a place where "Entity You" is Describe by You).

One Last Thing

You can get a feel for the future desktop by downloading and then installing the OpenLink Data Explorer plugin for Firefox, which allows you to switch viewing modes between Web Page and Linked Data behind the page. :-)

OpenLink Data Spaces
Get Yourself a URI in 5 Minutes or Less
Linked Data Spaces & Data Portability
Linked Data Conference Keynote (RDFa based remix edition that includes vital bits from TimBL's Linked Data Planet presentation).

Response to: Whole Data Post (Update 3)

Fri, 15 Aug 2008 22:31:48 GMT

This post is in response to Glenn McDonald's post titled: Whole Data, where he highlights a number of issues relating to "Semantic Web" marketing communications and overall messaging, from his perspective.

By coincidence, Glenn and I presented at this month's Cambridge Semantic Web Gathering.

I've provided a dump of Glenn's issues and my responses below:

Issue - RDF

Ingenious data decomposition idea, but:
too low-level; the assembly language of data, where we need Java or Ruby
"resource" is not the issue; there's no such thing as "metadata", it's all data; "meta" is a perspective
lists need to be effortless, not painful and obscure
nodes need to be represented, not just implied; they need types and literals in a more pervasive, integrated way.

Response:

RDF is a Graph based Data Model it stands for Resource Description Framework. The Metadata data angle comes from it's Meta Content Framework (MCF) origins. You can express and serialize data based on the RDF Data Model using: Turtle, N3, TriX, N-Triples, and RDF/XML.

Issue - SPARQL (and Freebase's MQL)

These are just appeasement:
- old query paradigm: fishing in dark water with superstitiously tied lures; only works well in carefully stocked lakes
- we don't ask questions by defining answer shapes and then hoping they're dredged up whole.

Response:

SPARQL, MQL, and Entity-SQL are Graph Model oriented Query Languages. Query Languages always accompany Database Engines. SQL is the Relational Model equivalent.

Issue - Linked Data

Noble attempt to ground the abstract, but:
- URI dereferencing/namespace/open-world issues focus too much technical attention on cross-source cases where the human issues dwarf the technical ones anyway
- FOAF query over the people in this room? forget it.
- link asymmetry doesn't scale
- identity doesn't scale
- generating RDF from non-graph sources: more appeasement, right where the win from actually converting could be biggest!

Response:

Innovative use of HTTP to deliver "Data Access by Reference" to the Linked Data Web.

When you have a Data Model, Database Engine, and Query Language, the next thing you need is a Data Access mechanism that provides "Data Access by Reference". ODBC and JDBC (amongst others) provide "Data Access by Reference" via Data Source Names. Linked Data is about the same thing (URIs are Data Source Names) with the following differences:

Naming is scoped to the entity level rather than container level
HTTP's use within the data source naming scheme expands the referencability of the Named Entity Descriptions beyond traditional confines such as applications, operating systems, and database engines.

Issue - Giant Global Graph

Hugely motivating and powerful idea, worthy of a superhero (Graphius!), but:
- giant and global parts are too hard, and starting global makes every problem harder
- local projects become unmanageable in global context (Cyc, Freebase data-modeling lists...). And my thus my plea, again. Forget "semantic" and "web", let's fix the database tech first:
- node/arc data-model, path-based exploratory query-model
- data-graph applications built easily on top of this common model; building them has to be easy, because if it's hard, they'll be bad
- given good database tech, good web data-publishing tech will be trivial!
- given good tools for graphs, the problems of uniting them will be only as hard as they have to be.

Response:

Giant Global Graph is just another moniker for a "Web of Linked Data" or "Linked Data Web".

Multi-Model Database technology that meshes the best of the Graph & Relational Models exist. In a nutshell, this is what Virtuoso is all about and it's existed for a very long time :-)

Virtuoso is also a Virtual DBMS engine (so you can see Heterogeneous Relational Data via Graph Model Context Lenses). Naturally, it is also a Linked Data Deployment platform (or Linked Data Sever).

The issue isn't the "Semantic Web" moniker per se., it's about how Linked Data (foundation layer of Semantic Web) gets introduced to users. As I said during the MIT Gathering: "The Web is experienced via Web Browsers primarily, so any enhancement to the Web must be exposed via traditional Web Browsers", which is why we've opted to simply add "View Linked Data Sources" to the existing set of common Browser options that includes:

View page in rendered form (default)
View page source (i.e., how you see the markup behind the page)

By exposing the Linked Data Web option as described above, you enable the Web user to knowingly transition from the traditional Rendered (X)HTML page view to the Linked Data View (i.e., structured data behind the page). This simple "User Interaction" tweak makes the notion of exploiting a Structured Web becomes somewhat clearer.

The Linked Data Web isn't a panacea. It's just an addition to the existing Web that enrichens the things you can do with the Web. It's predominance, like any application feature, will be subject to the degrees to which it delivers tangible value or matrializes internal and external opportunity costs.

Note: The Web isn't ubiquitous today becuase all it's users groked HTML Markup. It's ubquitity is a function of opportunity costs: there simply came a point in the Web boostrap when nobody could afford the opportunity costs associated with being off the Web. The same thing will play out with Linked Data and the broader Semantic Web vision.

Links:

Linked Data Journey part of my Linked Data Planet Presentation Remix(from slides 15 to 22 - which include bits from TimBL's presentation)
OpenLink Data Explorer
OpenLink Data Explorer Screenshots and examples.

CrunchBase gets hooked up with the Linked Data Web!

Wed, 30 Jul 2008 01:43:27 GMT

It's getting really hot in Linked Data land! Two days ago Benjamin Nowack pinged the LOD community about his RDFization of Crunchbase (sample (X)HTML view: http://cb.semsol.org/company/opera-software) courtesy of Crounchbase releasing an API. As you know, I've always equated Web Service API to Database CLIs (ODBC, JDBC, ADO.NET etc.) as both offer code level hooks into Data Spaces.

Naturally, we've decided to join the Crunchbase RDFization party, and have just completed a Virtuoso Sponger Cartridge (an RDFizer) for Crouncbase. What we add in our particular cartridge is additional meshing with DBpedia and Wikicompany Linked Data Spaces, plus RDFizaton of the Crunchbase (X)HTML pages :-)

As I've postulated for a while, Linked Data is about data "Meshing" and "Meshups". This isn't a buzzword play. I am pointing out an important distinction between "Mashups" and "Meshpus". Which goes as follows: "Mashups" are about code level joining devoid of structured modelling, hence the revelation of code as opposed to data when you look behind a "Mashup". "Meshups" on the other hand, are about joining disparate structured data sources across the Web. And when you look behind a "Meshup" you see structured data (preferably Linked Data) that enables further "Meshing".

I truly believe that we are now inches away from critical mass re. Linked Data, and because we are dealing with data, the network-effect will be sky-high! I shudder to think about the state of the Linked Data Web in 12 months time. Yes, I am giving the explosion 12 months (or less). These are very exciting times.

Demo Links:

For best experience I encourage you to look at the OpenLink Data Explorer extension for Firefox (2.x - 3.x). This enables you to go to Crunchbase (X)HTML pages (and other sites on the Web of course), and then simply use the "View | Linked Data Sources" main or context menu sequence to unveil the Linked Data Sources associated with any Web Page.

Of course there is much more to come!

Linked Data in Action: Library of Congress

Wed, 11 Jun 2008 17:16:31 GMT

As I start my countdown to the upcoming Linked Data Planet conference, here is the first of a series of posts geared towards showcasing practical use of the burgeoning Linked Data Web.

First up, the Library of Congress, take a look at the following pages which are "Human" and machine based "User Agent" friendly:

Key point: The pages above are served up in line with Linked Data deployment and publishing tenets espoused by the Linking Open Data Community (LOD) which include (in my preferred terminology):

Giving "Names" to things you observe (aka Data Source Names or "DSNs" for short)
Use HTTP URLs in your data source naming scheme so that "access by reference" to your data sources exploits the expanse of the HTTP driven Web i.e make your DSNs "Linked Data Source Names" (LDNS)
Remember that Documents / Pages are compound in nature, and they aren't the only data sources we would want to name; a document's LDSN must be distinct from the LDSNs used for the subject matter concepts and/or named entities associated with a document
Use the RDF Data Model to express structure within your data source(s)
Use LDSNs when constructing statements/claims/assertions/records (triples) inside your structured data sources
When publishing Web Pages related to your data sources; use at least one of the following to methods to guide user agents to data sources associated with your published page; the HTML LINK tag, RDFa, GRDDL, or Content Negotiation.

The items above are features that users and decision makers should start to hone into when seeking, and evaluating, platforms that facilitate cost-effective exploitation of the Linked Data Web.

Context, Tagging, Semantic Web, and Linked Data (Updated)

Tue, 27 May 2008 22:36:37 GMT

Courtesy of Nova Spivack's post titled: Tagging and the Semantic Web: Tags as Objects, I stumbled across a related post by John Clarke titled: Tagging and the Semantic Web. Both of these posts use the common practice of tagging to shed light on the increasing realization that "The Pursuit of Context" is the fusion point between the current Web and its evolution into a structured Web of Linked Data.

How Semantic Tagging Works (from a 1000 feet)

When tagging a document, the semantic tagging service passes the content of a target document through a processing pipeline (a distillation process of sorts) that results in automagic extraction of the following:

Named Entities

Subject matter Entities

Subject matter Concepts

Once the extraction phase is completed, a user is presented with a list of "suggested tags" using a variety of user interaction techniques. The literal values of elected Tags are then associated with one or more Tag and Tag Meaning Data Objects, with each Object type endowed with a unique Identifier.

Issues to Note

Broad acceptance that: "Context is king", is gradually taking shape. That said, "Context" landlocked within Literal values offers little over what we have right now (e.g. at Del.icio.us or Technorati), long term. By this I mean: if the end product of semantically enhanced tagging leaves us with: Literal Tag values only, Tags associated with Tag Data Objects endowed with platform specific Identifiers, or Tag Data Objects with any other Identity scheme that excludes HTTP, the ability of Web users to discern or derive multiple perspectives from the base Context (exposed by semantically enhanced Tags) will be lost, or severely impeded at best.

The shape, form, and quality of the lookup substrate that underlies semantic tagging services, ultimately affects "context fidelity" matters such as Entity Disambiguation. The importance of quality lookup infrastructure on the burgeoning Linked Data Web is the reason why OpenLink Software is intimately involved with the DBpedia and UMBEL projects.

Conclusions

I am immensely happy to see that the Web 2.0 and Semantic Web communities are beginning to coalesce around the issue of "Context". This was the case at the WWW2008 Linked Data Workshop, I am feeling a similar vibe emerging from the Semantic Web Technologies conference currently nearing completion in San Jose. Of course, I will be talking about, and demonstrating practical utility of all of this, at the upcoming Linked Data Planet conference.

My Data Space Tag Cloud

ODBC & WODBC Comparison

Tue, 20 May 2008 19:46:11 GMT

ODBC delivers open data access (by reference) to a broad range of enterprise databases via a 'C' based API. Thanks to the iODBC and unixODBC projects, ODBC is available across broad range of platforms beyond Windows.

ODBC identifies data sources using Data Source Names (DSNs).

WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.

ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.

WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).

ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).

WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!

So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!

Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.

URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.

I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.

The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.

By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)

Commercializing the Semantic Web

Sun, 18 May 2008 14:58:26 GMT

Unfortunately, I could only spend 4 days at the recent WWW2008 event in Beijing (I departed the morning following the Linked Data Workshop), so I couldn't take my slot on the "Commercializing the Semantic Web panel" etc.. Anyway, thanks to the Web I can still inject my points of view in the broad Web based discourse. Well so I hoped, when I attempted to post a comment to Paul Miller's ZDNet domain hosted blog thread titled: Commercialising the Semantic Web.

Unfortunately, the cost of completing ZDNet's unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I'll settle for a trackback ping instead.

What follows is the cut and paste of my intended comment contributions to Paul's post.

Paul,

As discussed earlier this week during our podcast session, commercialization of Semantic Web technology shouldn't be a mercurial matter at this stage in the game :-) It's all about looking at how it provides value :-)

From the Linked Data angle, the ability to produce, dispatch, and exploit "Context" across an array of "Perspectives" from a plethora of disparate data sources on the Web and/or behind corporate firewalls, offers immense commercial value.

Yahoo's Searchmonkey effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as "value consumption tickets" (Data Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to data encountered on the Web. Yahoo! is about to put this light on in a big way (imho).

The "self annotating" nature of the Web is what ultimately drives the manifestation of the long awaited Semantic Web. I believe I postulated about "Self Annotation & the Semantic Web" in a number of prior posts which, by the way, should be DataRSS compatible right now due to Yahoo's support of OpenSearch Data Providers (which this Blog Space has been for eons).

Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, Tag, Weblog, Shared Bookmark, Wikiword, Microformat, Microformat++ (eRDF or RDFa), GRDDL stylesheet, and RDFizer etc.. is a piece of structured data.

Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)

Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (information resource), and then you add Structure to the information resource (RSS, Atom, microformats, RDFa, eRDF, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to Linked Data) is a synch thanks to RDF Middleware (as per earlier RDF middleware posts).

Commercializing the Semantic Web

Fri, 16 May 2008 20:15:29 GMT

What follows is the cut and paste of my intended comment contributions to Paul's post.

Paul,

Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)

Semantic Web Advocate of Tribe Linked Data! (Updated)

Thu, 20 Mar 2008 20:29:47 GMT

These days I increasingly qualify myself and my Semantic Web advocacy as falling under the realm Linked Data. Thus, I tend to use the following introduction: I am Kingsley Idehen, of the Tribe Linked Data.

The aforementioned qualification is increasingly necessary for the following reasons:

The Semantic Web vision is broad and comprised of many layers
A new era of confusion is taking shape just as we thought we had quelled the prior AI dominated realm of confusion
None of the Semantic Web vision layers are comprehensible in practical ways without a basic foundation
Open Data Access is the foundation of the Semantic Web (in prior post I used the term: Semantic Web Layer 1)
URIs units of Open Data Access in Semantic Web parlance i.e.. each datum on the Web must have an ID (minted by the host Data Space).

The terms GGG, Linked Data, Data Web, Web of Data, and Web 3.0 (when I use this term) all imply URI driven Open Data Access for the Web Database (maybe call this ODBC for the Web) -- ability to point to records across data spaces without any adverse effect to the remote data spaces. It's really important to note that none of the aforementioned terms have nothing to do with the "Linguistic Meaning of blurb". Building a smarter document exposed via a URL without exposing descriptive data links doesn't provide open access to information data sources.

As human beings we are all endowed with reasoning capability. But we can't reason without access to data. Dearth of openly accessible structured data is the source of many ills in cyberspace and across society in general. Today we still have Subjectivity reigning over Objectivity due to the prohibitive costs of open data access.

We can't cost-effectively pursue objectivity without cost-effective infrastructure for creating alternative views of the data behind information sources (e.g. Web Pages). More Objectivity and less Subjectivity is what the next Web Frontier is about. At OpenLink we simply use the moniker: Analysis for All! Everyone becomes a data analyst in some form, and even better, the analysis are easily accessible to anyone connected to the Web. Of course, you will be able to share special analysis with your private network of friends and family, or if you so choose, not at all :-)

Recap, it's important to note that Linked Data is the foundation layer of the Semantic Web vision. It's not only facilitates open data access, it also enables data integration (Meshing as opposed to Mashing) across disparate data schemas

As demonstrated by DBpedia and the Linked Data Solar system emerging around it, if you URI everything, then everything is Cool.

Linked Data and Information Silos are mutually exclusive concepts. Thus, you cannot produce a web accessible Information Silo and then refer to it as "Semantic Web" technology. Of course, it might be very Semantic, but it's fundamentally devoid of critical "Semantic Web" essence (DNA).

My acid test for any Semantic Web solution is simply this (using a Web User Agent or Client):

go to the profile page of the service
ask for an RDF representation of my profile (by this I mean "get me the raw data in structured form")
attempt to traverse the structured data graph (RDF) that the service provides via live de-referncable URIs.

Here is the Acid test against my Data Space:

My Profile Page (HTML representation dispatched via an instance of OpenLink Data Spaces)
Click on the "Linked Data Tab" (HTML representation endowed with Data Links the link to information resources containing other structured descriptions of things).

My 5 Favorite Things about Linked Data on the Web

Sun, 09 Mar 2008 15:48:35 GMT

End to Buzzword Blur - how buzzwords are used to obscure comprehension of core concepts. Let SKOS, MOAT, SCOT reign!
End of Data Silos - you don't own me, my data, my data's mobility (import/export), or accessibility (by reference) just because I signed up for Yet Another Software as Service (ySaaS)
End of Misinformation - Sins of omission will no longer go unpunished the era of self induced amnesia due to competitive concerns is over, Co-opetition shall reign (Ray Noorda always envisoned this reality)
Serendipitous information and data discovery gets cheaper by the second - you're only a link away for a universe of relevant and accessible data
Rise of Quality - Contrary to historic president (due to all of the above) well engineered solutions will no longer be sure indicators of commercial failure

BTW - Benjamin Nowack penned an interesting post titled: Semantic Web Aliases, that covers a variety of labels used to describe the Semantic Web. The great thing about this post is that it provides yet another demonstration-in-the-making for the virtues of Linked Data :-)

Labels are harmless when their sole purpose is the creation of routes of comprehension for concepts. Unfortunately, Labels aren't always constructed with concept comprehension in mind, most of the time they are artificial inflectors and deflectors servicing marketing communications goals.

Anyway, irrespective of actual intent, I've endowed all of the labels from Bengee's post with URIs as my contribution important disambiguation effort re. the Semantic Web:

Semantic Web (timbl)
Web of Data (timbl)
lowercase semantic [wW]eb (tantek)
Semantic Web 2.0 (by stefandecker, IIRC)
Web 3.0 (by nova and others)
Semantic Graph (by nova and others)
Hyperdata (by danja) Linked Data (by timbl, and implemented by the Chris Bizer and Richard Cyganiak inspired, Linking Open Data Community and it's poster project DBpedia)
Linked Data Web (by kidehen)
Structured Web (by mkbergman)
Semantic Data Web (by kidehen)
SemWeb (by the developer community)
GGG - The Giant Global Graph (by timbl) Web 3G (by iand)

As per usual this post is best appreciated when processed via an Linked Data aware user agent.

Driving Lanes on the Web based Information Super Highway

Tue, 04 Mar 2008 23:17:56 GMT

Post absorption of Web 3G commentary emanating from the Talis blog space. Ian Davis appears to be expending energy on the definition of, and timeframes for, the next Web Frontier (which is actually here btw) :-)

Daniel Lewis also penned an interesting post in response to Ian's, that actually triggered this post.

I think definition time has long expired re. the Web's many interaction dimensions, evolutionary stages, and versions.

On my watch it's simply demo / dog-food time. Or as Dan Brickley states: Just Show It.

Below, I've created a tabulated view of the various lanes on the Web's Information Super Highway. Of course, this is a Linked Data demo should you be interested in the universe of data exposed via the links embedded in this post :-)

The Web's Information Super Highway Lanes

	1.0	2.0	3.0
Desire	Information Creation & Retrieval	Information Creation, Retrieval, and Extraction	Distillation of Data from Information
Meme	Information Linkage (Hypertext)	Information Mashing (Mash-ups)	Linked Data Meshing (Hyperdata)
Enabling Protocol	HTTP	HTTP	HTTP
Markup	HTML	(X)HTML& various XML based formats (RSS, ATOM, others)	Turtle, N3, RDF/XML, others
Basic Data Unit	Resource (Data Object) of type "Document"	Resource (Data Object) of type "Document"	Resource (Data Object) that may be one of a variety of Types: Person, Place, Event, Music etc.
Basic Data Unit Identity	Resource URL (Web Data Object Address)	Resource URL (Web Data Object Address)	Unique Identifier (URI) that is indepenent of actual Resource (Web Data Object) Address. Note: An Identifier by itself has no utility beyond Identifying a place around which actual data may be clustered.
Query or Search	Full Text Search patterns	Full Text Search patterns	Structured Querying via SPARQL
Deployment	Web Server (Document Server)	Web Server + Web Services Deployment modules	Web Server + Linked Data Deployment modules (Data Server)
Auto-discovery			, basic and/or transparent content negotiation
Target User	Humans	Humans & Text extraction and manipulation oriented agents (Scrappers)	Agents with varying degrees of data processing intelligence and capacity
Serendipitous Discovery Quotient (SDQ)	Low	Low	High
Pain	Information Opacity	Information Silos	Data Graph Navigability (Quality)

Contd: Why we need Linked Data

Tue, 26 Feb 2008 13:16:43 GMT

Increasingly, I am encountering commentary from the ReadWriteWeb data space that highlights critical problems solved by a Linked Data Web. Unfortunately, most of the time, there is a disconnect between the problem and the solution. By this I mean: technology in the Semantic Web realm isn't seen as the solution.

A while back, I wrote a post titled:Why we need Linked Data. The aim of the post was to bring attention to the implications of exponential growth of User Generated Content (typically, semi-structured and unstructured data) on the Web. The growth in question is occurring within a fixed data & information processing timeframe (i.e. there will always be 24hrs in a day), which sets the stage for Information Overload as expressed in a recent post from ReadWriteWeb titled: Visualizing Social Media Fatigue.

The emerging "Web of Linked Data" augments the current "Web of Linked Documents", by providing a structured data corpus partitioned by containers I prefer to call: Data Spaces. These spaces enable Linked Data aware solutions to deliver immense value such as, complex data graph traversal, starting from document beachheads, that expose relevant data within a faction of the time it would take to achieve the same thing using traditional document web methods such as full text search patterns, scraping, and mashing etc.

Remember, our DNA based data & information system far exceeds that of any inorganic system when it comes to reasoning, but it remains immensely incapable of accurately and efficiently processing huge volumes of data & information -- irrespective of data model.

The Idea behind the Semantic Web has always been about an evolution of the Web into a structured data collective comprised of interlinked Data items and Data Containers (Data Spaces). Of course we can argue forever about the Semantics of the solution (ironically), but we can't shirk away from the impending challenges that "Information Overload" is about to unleash on our limited processing time and capabilities.

For those looking for a so called "killer application" for the Semantic Web, I would urge you to align this quest with the "Killer Problem" of our times, because when you do so you will that all routes lead to: Linked Data that leverages existing Web Architecture.

Once you understand the problem, you will hopefully understand that we all need some kind of "Data Junction Box" that provides a "Data Access Focal Point" for all of the data we splatter across the net as we sign up for the next greatest and latest Web X.X hosted service, or as we work on a daily basis with a variety of tools within enterprise Intranets.

BTW - these "Data Junction Boxes" will also need to be unobtrusively bound to our individual Identities.

Additional OpenLink Data Spaces Features

Mon, 11 Feb 2008 16:38:03 GMT

Daniel Lewis has published another post about OpenLink Data Spaces (ODS) functionality titled:A few new features in OpenLink Data Spaces, that exposes additional features (some hot out the oven).

OpenLink Data Spaces (ODS) now officially supports:

Attention Profiling Markup Language (APML).

Meaning of a Tag (MOAT) in conjunction with Simple Knowledge Organisation System (SKOS) and Social-Semantic Cloud of Tags (SCOT).

OAuth - an Open Authentication Protocol

Which means that OpenLink Data Spaces support all of the main standards being discussed in the DataPortability Interest Group!

APML Example:

All users of ODS automatically get a dynamically created APML file, for example: APML profile for Kingsley Idehen

The URI for an APML profile is: http://myopenlink.net/dataspace//apml.xml

Meaning of a Tag Example:

All users of ODS automatically have tag cloud information embedded inside their SIOC file, for example: SIOC for Kingsley Idehen on the Myopenlink.net installation of ODS.

But even better, MOAT has been implemented in the ODS Tagging System. This has been demonstrated in a recent test blog post by my colleague Mitko Iliev, the blog post comes up on the tag search: http://myopenlink.net/dataspace/imitko/weblog/Mitko%27s%20Weblog/tag/paris

Which can be put through the OpenLink Data Browser:

OpenLink Data Browser with Mitko Iliev’s Paris Blog Tag

OAuth Example:

OAuth Tokens and Secrets can be created for any ODS application. To do this:

you can log in to MyOpenlink.net beta service, the Live Demo ODS installation, an EC2 instance, or your local installation

then go to ‘Settings’

and then you will see ‘OAuth Keys’

you will then be able to choose the applications that you have instantiated and generate the token and secret for that app.

Related Document (Human) Links

OpenLink Data Spaces Official Page

OpenLink Software Page

OpenLink Data Spaces Wikipedia Page

Attention Profiling Markup Language Project Website

Meaning of a Tag Project Website

Simple Knowledge Organisation Systems Project Website

Social-Semantic Cloud of Tags Project Website

OAuth Protocol Website

DataPortability.org Website

Semantically Interlinked Online Communities Project Website

Remember (as per my most recent post about ODS), ODS is about unobtrusive fusion of Web 1.0, 2.0, and 3.0+ usage and interaction patterns. Thanks to a lot of recent standardization in the Semantic Web realm (e.g SPARQL), we are now employ the MOAT, SKOS, and SCOT ontologies as vehicles for Structured Tagging.

Structured Tagging?

This is how we take a key Web 2.0 feature (think 2D in a sense), bend it over, to create a Linked Data Web (Web 3.0) experience unobtrusively (see earlier posts re. Dimensions of Web). Thus, nobody has to change how they tag or where they tag, just expose ODS to the URLs of your Web 2.0 tagged content and it will produce URIs (Structured Data Object Identifiers) and a lnked data graph for your Tags Data Space (nee. Tag Cloud). ODS will construct a graph which exposes tag subject association, tag concept alignment / intended meaning, and tag frequencies, that ultimately deliver "relative disambiguation" of intended Tag Meaning (i.e. you can easily discern the taggers meaning via the Tags actual Data Space which is associated with the tagger). In a nutshell, the dynamics of relevance matching, ranking, and the like, change immensely without futile timeless debates about matters such as:

What's the Linked Data value proposition?

What's the Linked Data business model?

What's the Semantic Web Killer application?

We can just get on with demonstrating Linked Data value using what exists on the Web today. This is the approach we are deliberately taking with ODS.

Related Items

Stefano Mazzocch

response to Clay Shirky's 2005 talk

Ontology is Overrated: Links, Tags and Post-hoc Metadata

Tom Gruber

Ontology of Folksonomy: A Mash-up of Apples and Oranges

Tip: This post is best viewed via an RDF aware User Agent (e.g. a Browser or Data Viewer). I say this because the permalink of this post is a URI in a Linked Data Space (My Blog) comprised of more data than meets the eye (i.e. what you see when you read this post via a Document Web Browser) :-)

10 Reasons to use OpenLink Data Spaces (ODS)

Fri, 08 Feb 2008 22:08:43 GMT

Via post by Daniel Lewis, titled:10 Reasons to use OpenLink Data Spaces

There are quite a few reasons to use OpenLink Data Spaces (ODS). Here are 10 of the reasons why I use ODS:

Its native support of DataPortability Recommendations such as RSS, Atom, APML, Yadis, OPML, Microformats, FOAF, SIOC, OpenID and OAuth.

Its native support of Semantic Web Technologies such as: RDF and SPARQL/SPARUL for querying.

Everything in ODS is an Object with its own URI, this is due to the underlying Object-Relational Architecture provided by Virtuoso.

It has all the social media components that you could need, including: blogs, wikis, social networks, feed readers, CRM and a calendar.

It is expandable by installing pre-configured components (called VADs), or by re-configuring a LAMP application to use Virtuoso. Some examples of current VADs include: MediaWiki, Wordpress and Drupal.

It works with external webservices such as: Facebook, del.icio.us and Flickr.

Everything within OpenLink Data Spaces is Linked Data, which provides more meaningful information than just plain structural information. This meaningful information could be used for complex inferencing systems, as ODS can be seen as a Knowledge Base.

ODS builds bridges between the existing static-document based web (aka ‘Web 1.0‘), the more dynamic, services-oriented, social and/or user-orientated webs (aka ‘Web 2.0‘) and the web which we are just going into, which is more data-orientated (aka ‘Web 3.0’ or ‘Linked Data Web’).

It is fully supportive of Cloud Computing, and can be installed on Amazon EC2.

Its released free under the GNU General Public License (GPL). [note]However, it is technically dual licensed as it lays on top of the Virtuoso Universal Server which has both Commercial and GPL licensing[/note]

The features above collectively provide users with a Linked Data Junction Box that may reside with corporate intranets or "out in the clouds" (Internet). You can consume, share, and publish data in a myriad of formats using a plethora of protocols, without any programming. ODS is simply about exposing the data from your Web 1.0, 2.0, 3.0 application interactions in structured from, with Linking, Sharing, and ultimately Meshing (not Mashing) in mind.

Note: Although ODS is equipped with a broad array of Web 2.0 style Applications, you do not need to use native ODS apps in order to exploit it's power. It binds to anything that supports the relevant protocols and data formats.

2008, Facebook Data Portability, and the Giant Global Graph of Linked Data

Mon, 07 Jan 2008 16:44:42 GMT

As 2007 came to a close I repeatedly mulled over the idea of putting together a usual "year in review" and a set of predictions for the coming year etc. Anyway, the more I pondered, the smaller the list became. While pondering (as 2008 rolled around), the Blogosphere was set ablaze with the Robert Scoble's announcement of his account suspension by Facebook. Of course, many chimed in expressing views either side of the ensuing debate: Who is right -- Scoble or Facebook. The more I assimilated the views expressed about this event, the more ironic I found the general discourse, for the following reasons:

Web 2.0 is fundamentally about Web Services as the prime vehicle for interactions across "points of Web presence"
Facebook is a Web 2.0 hosted service for social networking that provides Web Services APIs for accessing data in the Facebook data space. You have to do so "on the fly" within clearly defined constraints i.e you can interact with data across your social network via Facebook APIs, but you cannot cache the data (perform an export style dump of the data)
Facebook is a main driver of the term: "social graph", but their underlying data model is relational and the Web Services response (data you get back) doesn't return a data graph, instead it returns an tree (i.e XML)
Scoble's had a number of close encounters with Linked Data Web | Semantic Data Web | Web 3.0 aficionados in various forms throughout 2007, but still doesn't quite make the connection between Web Services APIs as part of a processing pipeline that includes structured data extraction from XML data en route to producing Data Graphs comprised of Data Objects (Entities) endowed with: Unique Identifiers, Classification or Categorization schemes, Attributes, and Relationships prescribed by one or more shared Data Dictionaries/Schemas/Ontologies
A global information bus that exposes a Linked Data mesh comprised of Data Objects, Object Attributes, and Object Relationships across "points of Web presence" is what TimBL described in 1998 (Semantic Web Roadmap) and more recently in 2007 (Giant Global Graph)
The Linked Data mesh (i.e Linked Data Web or GGG) is anchored by the use of HTTP to mint Location, Structure, and Value independent Object Identifiers called URIs or IRIs. In addition, the Linked Data Web is also equipped with a query language, protocol, and results serialization format for XML and JSON called: SPARQL.

So, unlike Scoble, I am able to make my Facebook Data portable without violating Facebook rules (no data caching outside Facebook realm) by doing the following:

Use an RDFizer for Facebook to convert XML response data from Facebook Web Services into RDF "on the fly" Ensure that my RDF is comprised of Object Identifiers that are HTTP based and thereby dereferencable (i.e. I can use SPARQL to unravel the Linked Data Graph in my Facebook data space)
The act of data dereferencing enables me to expose my Facebook Data as Linked Data associated with my Personal URI
This interaction only occurs via my data space and in all cases the interactions with data work via my RDFizer middleware (e.g the Virtuoso Sponger) that talks directly to Facebook Web Services.

In a nutshell, my Linked Data Space enables you to reference data in my data space via Object Identifiers (URIs), and some cases the Object IDs and Graphs are constructed on the fly via RDFization middleware.

Here are my URIs that provide different paths to my Facebook Data Space:

Personal URI

My Facebook Data Space

Linked Data Browser/Viewer

My Facebook Photo Gallery -- WWW2007 Photo Collection

Linked Data Browser/Viewer

To conclude, 2008 is clearly the inflection year during which we will final unshackle Data and Identity from the confines of "Web Data Silos" by leveraging the HTTP, SPARQL, and RDF induced virtues of Linked Data.

2008 and the Rise of Linked Data
Scoble Right, Wrong, and Beyond
Scoble interviewing TimBL (note to Scoble: re-watch your interview since he made some specific points about Linked Data and URIs that you need to grasp)
Prior Blog posts my this Blog Data Space that include the literal patterns: Scoble Semantic Web

Discussion: OpenLink Data Spaces

Sat, 01 Dec 2007 20:26:12 GMT

I've been a little busier than usual, of late. So busy, that even minimal blog based discourse participation has been a challenge. Anyway, during this quiet period, a number of interesting data streams have come my way that relate to OpenLink Data Spaces (ODS). Thus, in typical fashion, I'll use this post (via URIs) to contribute a few nodes to the Giant Global Graph that is the Web of Structured Linked Data, also known as the Data Web, Semantic Data Web, or Web of Data (also see prior Data Web posts).

Here goes:

Alan Wilensky recalls his early encounters with OpenLink Data Spaces (circa. 2004)
Daniel Lewis shares his "state of the Semantic Data Web" findings
Daniel Lewis experiences OpenLink Data Space first hand en route to creating Data Spaces in the Clouds (the Fourth Platform).

In addition, in one week, courtesy of the Web, UK Semnantic Web Gatherings in Bristol and Oxford, I discover, interview, and employ Daniel :-) Imagine how long this would have taken to pull off via the Document Web, assuming I would even discover Daniel.

As with all things these days, the Web and Internet change everything, which includes talent discovery and recruitment.

A Global Social graph that is a mesh of Linked Data enables the process of recruitment, marketing, and other elements of busines management to be condensed down to a sending powerful beams across the aforementioned Graph :-) The only variable pieces are the traversal paths exposed to your beam via the beam's entry point URI. In my case, I have a single URI that exposes a Graph of critical paths for the Blogosphere (i.e data spaces of RSS Atom Feeds). Thus, I can discover if your profile matches the requirements associated with an opening at OpenLink Software (most of the time) before you do :-)

BTW - I just noticed that John Breslin described ODS as social-graph++ in his recent post, titled: Tales from the SIOC-o-sphere, part 6. In a funny way, this reminds of a post from the early blogosphere days about platforms and Weblog APIs (circa. 2003) about ODS (then exposed via the Blog Platform realm of Virtuoso).

Reminder: Why We Need Linked Data!

Fri, 02 Nov 2007 22:52:34 GMT

"The phrase Open Social implies portability of personal and social data. That would be exciting but there are entirely different protocols underway to deal with those ideas. As some people have told me tonight, it may have been more accurate to call this "OpenWidget" - though the press wouldn't have been as good. We've been waiting for data and identity portability - is this all we get?"
[Source: Read/Write Web's Commentary & Analysis of Google's OpenSocial API]

..Perhaps the world will read the terms of use of the API, and realize this is not an open API; this is a free API, owned and controlled by one company only: Google. Hopefully, the world will remember another time when Google offered a free API and then pulled it. Maybe the world will also take a deeper look and realize that the functionality is dependent on Google hosted technology, which has its own terms of service (including adding ads at the discretion of Google), and that building an OpenSocial application ties Google into your application, and Google into every social networking site that buys into the Dream. Hopefully the world will remember. Unlikely, though, as such memories are typically filtered in the Great Noise....
[Source: Poignant commentary excerpt from Shelly Power's Blog (as always)]

The "Semantic Data Web" vision has always been about "Data & Identity" portability across the Web. Its been that and more from day one.

In a nutshell, we continue to exhibit varying degrees of Cognitive Dissonance re the following realities:

The Network is the Computer (Internet/Intranet/Extranet depending on your TCP/IP usage scenarios)
The Web is the OS (ditto) and it provides a communications subsystem (Information BUS) comprised of

HTTP

URI

HTTP based Interprocess (i.e Web Apps are processes when you discard the HTML UI and interact with the application logic containers called "Web Services" behind the pages) ultimately hit data
Web Data is best Modeled as a Graph (RDF, Containers/Items/Item Types, Property & Value Pairs associated with something, and other labels)
Network are Graphs and vice versa
Social Networks are graphs where nodes are connected via social connectors ( [x]--knows-->[y] )
The Web is a Graph that exposes a People and Data Network (to the degree we allude to humans not being data containers i.e. just nodes in a network, otherwise we are talking about a Data Network)
Data access and manipulation depends inherently on canonical Data Access mechanisms such as Data Source Identifiers / Names (time-tested practice in various DBMS realms)
Data is forever, it is the basis of Information, and it is increasing exponentially due to proliferation of Web Services induced user activities (User Generated Content)
Survival, Vitality, Longevity, Efficiency, Productivity etc.. are all depend on our ability to process data effectively in a shrinking time continuum where Data and/or Information overload is the alternative.

The Data Web is about Presence over Eyeballs due to the following realities:

Eyeballs are input devices for a DNA based processing system (Humans). The aforementioned processing system can reason very well, but simply cannot effectively process masses of data or information
Widgets offer little value long term re. the imminent data and information overload dilemma, ditto Web pages (however pretty), and any other Eyeballs-only centric Web Apps
Computers (machines) are equipped with inorganic (non DNA) based processing power, they are equipped to process huge volumes of data and/or information, but they cannot reason
To be effective in the emerging frontier comprised of a Network Computer and a Web OS, we need an effective mechanism that makes best use of the capabilities possessed by humans and machines, by shifting the focus to creation and interaction with points of "Data Web Presence" that openly expose "Structured Linked Data".

This is why we need to inject a mesh of Linked Data into the existing Web. This is what the often misunderstood vision of the "Semantic Data Web" or "Web of Data" or "Web or Structured Data" is all about.

As stated earlier (point 10 above), "Data is forever" and there is only more of it to come! Sociality and associated Social Networking oriented solutions are at best a spec in the Web's ocean of data once you comprehend this reality.

Note: I am writing this post as an early implementor of GData and an implementor of RDF Linked Data technology and a "Web Purist".

OpenSocial implementation and support across our relevant product families: Virtuoso (i.e the Sponger Middleware for RDF component), OpenLink Data Spaces (Data Space Controller / Services), and the OpenLink Ajaxt Toolkit (i.e OAT Widgets and Libraries), is a triviality now that the OpenSocial APIs are public.

The concern I have, and the problem that remains mangled in the vast realms of Web Architecture incomprehension, is the fact that GData and GData based APIs cannot deliver Structured Linked Data in line with the essence of the Web without introducing "lock-in" that ultimately compromises the "Open Purity" of the Web. Facebook and Google's OpenSocial response to the Facebook juggernaut (i.e. open variant of the Facebook Activity Dashboard and Social Network functionality realms, primarily), are at best icebergs in the ocean we know as the "World Wide Web". The nice and predictable thing about icebergs is that they ultimately melt into the larger ocean :-)

On a related note, I had the pleasure of attending the W3C's RDF and DBMS Integration Workshop, last week. The event was well attended by organizations with knowledge, experience, and a vested interested in addressing the issues associated with exposing none RDF data (e.g. SQL) as RDF, and the imminence of data and/or information overload covered in different ways via the following presentations:

RDF Views of SQL Data

Orri Erling

Computer Science 2.0

Experiences re. solving SPARQL Access to Distributed Data Sources

Virtuoso 5.0.2 Released!

Mon, 08 Oct 2007 14:27:27 GMT

A new release of Virtuoso is now available in both Open Source and Commercial variants. The main features and Enhancements associated with this release include:

* 64-bit Integer Support

* RDF Sink Folders for WebDAV - enabling RDF Quad Store population by simply dropping RDF files into WebDAV or via HTTP (meaning you can use CURL as an RDF in put mechanism for instance)

* Additional Sponger Cartridges from Audio binary files (i.e ID3 tag extraction and Music Ontology mapping which exposes the fine details of music as RDF based Structured Data; one for the DJs & Remixers out there!)

* New Sponger Cartridges for Facebook, Freebase, Wikipedia, GRDDL, RDFa, eRDF and more

* Support for PHP 5.2 runtime hosting (Virtuoso is a bona fide deployment platform for: Wordpress, MediaWiki, phpBB, Drupal etc.)

RDF Linked Data

SQL-RDF Views

* Tutorial Application includes Linked Data style SQL-RDF Views for the Northwind SQL DBMS schema (which is the same as the standard Virtuoso demo atabase schema)

* SQL-RDF Views implementation of the TPC-D benchmark (Yes, we can run this grueling SQL benchmark via RDF views of SQL Data!)

OpenLink Data Spaces

OpenLink Ajax Toolkit

Download Lnks:

Open Source Edition

Commercial Edition

RDF Browser View of My Hyperdata & Linked Data Post

Thu, 20 Sep 2007 01:26:02 GMT

Bearing in mind we are all time challenged, here are links to OpenLink and Zitgist RDF Browser views of my earlier blog post re. Hyperdata & Linked Data.

OpenLink RDF Browser view of Hyperdata & Linked Data post

Zitgist Browser view of Hyperdata & Linked Data post

Both browsers should lead you to the posts from Danny, Nova, and Tim. In both cases the URI < xmlns="http" www.openlinksw.com="www.openlinksw.com" dataspace="dataspace" kidehen="kidehen" openlinksw.com="openlinksw.com" weblog="weblog" s="s" blog="blog" b127="b127" d="d"> is a pointer to structured data (in my Blog Data Space) if your user agent (browser or other Web Client) requests an RDF representation of this post via its HTTP request payload (what the Browser are doing via the "Accept:" headers).

As you can see the Data Web is actually here! Without RDF generation upheaval (or Tax).

Social-Networking & Semantic Web (update)

Wed, 15 Aug 2007 22:14:36 GMT

An update for RSS crawlers that don't have synchronization capability etc..

"Reasonable Humans" continue to struggle with the congruency of social-networking and the Semantic Web. Funnily enough, our less capable counterparts (in the reasoning department) don't have such struggles. Here is my little interaction with an IRC Bot called "Phenny" (possibly mother of the Agent Web or Web of Agents or Web 4.0):

kidehen: .w web

phenny: web n. 1: An intricate network suggesting something that was formed by weaving or interweaving
phenny: web n. 2: An intricate trap that entangles or ensnares its victim
phenny: web v. 1: Construct or form a web, as if by weaving.

kidehen: .w network

phenny: network n. 1: An interconnected system of things or people
phenny: network n. 2: (broadcasting) a communication system consisting of a group of broadcasting stations that all transmit the same programs
phenny: network v. 1: Communicate with and within a group.

kidehen: .w social

phenny: social n. 1: A party of people assembled to promote sociability and communal activity
phenny: social a. 1: Relating to human society and its members
phenny: social a. 2: Living together or enjoying life in communities or organized groups.

kidehen: .w semantic

phenny: semantic a. 1: Of or relating to meaning or the study of meaning.

kidehen: .w graph

phenny: graph n. 1: A visual representation of the relations between certain quantities plotted with reference to a set of axes
phenny: graph v. 1: Represent by means of a graph
phenny: graph v. 2: Plot upon a graph.

Note: Wikipedia's Graph definition is the one the applies to RDF :-)

To conclude, the whole Semantic Data Web vision and Open social-networking go hand-in-hand. You can't attain "Open social-networking" without appreciating the underlying core concepts (Structured Data, Graphs, and Triples). There is no way around the reality that all networks are graphs. Of course, this doesn't imply that solution users (end-users) have to master graph theory en route to network participation and exploitation. It simply implies that solution developers need some graph data model appreciation, or a graph data model aware platform, when developing social-networking solutions that seek to leverage the Web.

Funnily enough, my session with the Phenny Bot was triggered by a prior session between Chimezie Ogbuji (who also trains a Bot called "Emeka" that does SPARQL) and said Bot:

chimezie: .w tautology

phenny: tautology n. 1: (logic) a statement that is necessarily true
phenny: tautology n. 2: Useless repetition.

chimezie: .ety tautology

phenny: "1579, from L.L. tautologia 'representation of the same thing' (c.350), from Gk. tautologia, from tautologos 'repeating what has been said,' from tauto 'the same' + -logos 'saying,' related to legein 'to say' (see lecture)." - http://etymonline.com/?term=tautology

That lead me to the following sequence (preceding the initial IRC session dump in this post):

kidehen: .w conflagration

phenny: conflagration n. 1: A very intense and uncontrolled fire.

kidehen: .w imbroglio

phenny: imbroglio n. 1: An intricate and confusing interpersonal or political situation

phenny: imbroglio n. 2: A very embarrassing misunderstanding.

kidehen: .w buzzword

phenny: buzzword n. 1: Stock phrases that have become nonsense through endless repetition.

In sense, proposing the Semantic Data Web as a solution to open social-networiing challenges, more often than not results in your "No Semantic Web here" imbroglio. In a sense, the shortest path to a buzzword fueled conflagration :-)

Injecting Facebook Data into the Semantic Data Web

Wed, 11 Feb 2009 12:40:11 GMT

I now have the first cut of a Facebook application called: Dynamic Linked Data Pages.

What is a Dynamic Linked Data Page (DLD)?

A dynamically generated Web Page comprised of Semantic Data Web style data links (formally typed links) and traditional Document Web links (generic links lacking type specificity).

Linked Data Pages will ultimately enable Facebook users to inject their public data into the Semantic Data Web as RDF based Linked Data. For instance, my Facebook Profile & Photo albums data is now available as RDF, without paying a cent of RDF handcrafting tax, thanks to the Virtuoso Sponger (middleware for producing RDF from non RDF data sources) which is now equipped with a new RDFizer Cartridger for the Facebook Query Language (FQL) and RESTful Web Service.

Demo Notes:

When you click on a link in DLD pages, you will be presented with a lookup that exposes the different interaction options associated with a given URI. Examples include:

Explore - find attributes and relationships that apply to the clicked URI
Dereference (get the attributes of the clicked URI)
Bookmark - store the URI for subsequent use e.g meshing with other URIs from across the Web
(X)HTML Page Open - traditional Document Web link (i.e. just opens another Web document as per usual)

Remember, the facebook URLs (links to web pages) are being converted, on the fly, into RDF based Structured Data ( graph model database) i.e Entity Sets that possess formally defined characteristics (attributes) and associations (relationships).

Dynamic Linked Data Pages

Saved RDF Browser Sessions

Saved SPARQL Query Definitions

Terminology & Specificity

Tue, 05 Feb 2008 01:47:01 GMT

Terminology is a pain to construct, and an even bigger pain to diffuse effectively, when dealing with large collections of superficially heterogeneous, and factually homogeneous, interlinked individuals.

In my "Linked Data & Web Information BUS" post (plus a few LOD mailing list posts), I had the delight and displeasure (on the brain primarily) of attempting to get terminology right with regards to Information- and Non-Information Web Resources. I eventually settled for Data Sources instead of the simpler and more obvious term: Data Resources :-)

Thus, I redefine the URIs from earlier past as follows:

http://demo.openlinksw.com/Northwind/Customer/ALFKI (Information Resource)

http://demo.openlinksw.com/Northwind/Customer/ALFKI#this (Data Resource)

Thanks to today's internet connectivity, it took a simple Skype ping from Mike Bergman, and a 30 minute (or so) session that followed for us to arrive at "Data Resource" as a clearer term for Non Information Resources.

Mike has promised to write a detailed post covering our Linked Data and the Structured Web terminology meshing odyssey.

Kingsley Idehen's Blog Data Space

A Structured Web of Data Picture....

Data Web and Major League Baseball

Structured Data vs. Unstructured Data

7 Things Brought to You by HTTP-based Hypermedia

Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 5

Some Related Work

Data 3.0 manifesto

Related

Web 1.0, 2.0, and 3.0 (Yet Again)

Related

XBRL Ontology Project

Amazon.com RSS Feeds

Linked Data Web Collaborators: Introducing Structured Dynamics

Related

Web of Linked Data & Hyperdata

Another Paper Discussing RDF Data Publishing

Linked Data & The Web Information BUS

What is an Information BUS?

What are Web Information Payloads?

What about Structured Data?

Where is this all headed?

Conclusions

Related Items

URIBurner: Painless Generation & Exploitation of Linked Data (Update 1 - Demo Links Added)

What is URIBurner?

Why is it Important?

How Do I Use It?

Content Publisher

Examples

Content Consumer

Examples

JSON:

Notation 3 (N3):

RDF/XML:

Conclusion

Related:

Revisiting HTTP based Linked Data (Update 1 - Demo Video Links Added)

What is Linked Data?

Why is it Important?

How is Linked Data Delivered?

How are Linked Data Object Representations Structured?

How Do I Create Linked Data oriented Hypermedia Resources?

Practical Examples of Linked Data Objects Enable

How Do OpenLink Software Products Enable Linked Data Exploitation?

Related

Response to: What is Web 3.0 and Why Should I Care?

What's the critical infrastructure supporting Web 3.0?

Related

Zero-based Cognition (Difference between Humans & Machines)

W3C's SPARQLing Data Access Ingenuity

What is SPARQL?

Why is it Important?

Example:

How Do I use It?

Where are it's implementations?

Is this really a big deal?

Related items

Structured writing, structured search

Semantic Web Value Proposition

Situation Analysis

The Semantic Data Web's value to Individuals

The Semantic Data Web's value to Organizations

Conclusion

The Power of Structured Data Exposure via RDFa

Search Engine Challenges Posed by the Semantic Web

Breaking the Web Wide Open!

Breaking the Web Wide Open! (complete story)

The Difference Between Information and Knowledge

XML, the New Database Heresy

Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5)

What is SPARQL?

Why is it important?

How do I use it, generally?

How do I use SPARQL with Virtuoso?

Installation Steps

Troubleshooting

Data Loading Steps

SPARQL Endpoint Discovery

Related