Data Spaces and Web of Databases

Note: An updated version of a previously unpublished blog post:

Continuing from our recent Podcast conversation, Jon Udell sheds further insight into the essence of our conversation via a “Strategic Developer” column article titled: Accessing the web of databases.

Below, I present an initial dump of a DataSpace FAQ below that hopefully sheds light on the DataSpace vision espoused during my podcast conversation with Jon.

What is a DataSpace?

A moniker for Web-accessible atomic containers that manage and expose Data, Information, Services, Processes, and Knowledge.

What would you typically find in a Data Space? Examples include:

Raw Data - SQL, HTML, XML (raw), XHTML, RDF etc.
Information (Data In Context) - XHTML (various microformats), Blog Posts (in RSS, Atom, RSS-RDF formats), Subscription Lists (OPML, OCS, etc), Social Networks (FOAF, XFN etc.), and many other forms of applied XML.

Web Services (Application/Service Logic) - REST or SOAP based invocation of application logic for context sensitive and controlled data access and manipulation.

Persisted Knowledge - Information in actionable context that is also available in transient or persistent forms expressed using a Graph Data Model. A modern knowledgebase would more than likely have RDF as its Data Language, RDFS as its Schema Language, and OWL as its Domain Definition (Ontology) Language. Actual Domain, Schema, and Instance Data would be serialized using formats such as RDF-XML, N3, Turtle etc).

How do Data Spaces and Databases differ?
Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled.

How do Data Spaces and Content Management Systems differ?
Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity.

How do Data Spaces and Knowledgebases differ?
A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models.

What Architectural Components make up a Data Space?

ORDBMS Engine - for Data Modeling agility (via complex purpose specific data types and data access methods), Data Atomicity, Data Concurrency, Transaction Isolation, and Durability (aka ACID).
Virtual Database Engine - for creating a single view of, and access point to, heterogeneous SQL, XML, Free Text, and other data. This is all about Virtualization at the Data Access Level.

Web Services Platform - enabling controlled access and manipulation (via application, service, or protocol logic) of Virtualized or Disparate Data. This layer handles the decoupling of functionality from monolithic wholes for function specific invocation via Web Services using either the SOAP or REST approach.

Where do Data Spaces fit into the Web's rapid evolution?
They are an essential part of the burgeoning Data Web / Semantic Web. In short, they will take us from data “Mash-ups” (combining web accessible data that exists without integration and repurposing in mind) to “Mesh-ups” (combining web accessible data that exists with integration and repurposing in mind).

Where can I see a DataSpace along the lines described, in action?

Just look at my blog, and take the journey as follows:

Front Door (Web 1.0)
Lounge (Web 2.0) via GData or OpenSearch
Floor Plan via FOAF or SIOC RDF Data Sets (Graphs)
Rest of the house (beyond Web 2.0) sending SPARQL Queries to a SPARQL Endpoint.

What about other Data Spaces?

There are several and I will attempt to categorize along the lines of query method available:
Type 1 (Free Text Search over HTTP):
Google, MSN, Yahoo!, Amazon, eBay, and most Web 2.0 plays .

Type 2 (Free Text Search and XQuery/XPath over HTTP)
A few blogs and Wikis (Jon Udell's and a few others)

Type 3 (RDF Data Sets and SPARQL Queryable):

SIOC enabled sites (aka points of semantic web presence)
PingTheSemantic

Type 4 (Generic Free Text Search, OpenSearch, GData, XQuery/XPath, and SPARQL):
Points of Semantic Web presence such as the Data Spaces at:

My Blog Data Space (as stated earlier in this post)
My General Data Space - (ditto; note that this is currently experimental)

What About Data Space aware tools?

OpenLink Ajax Toolkit - provides Javascript Control level binding to Query Services such as XMLA for SQL, GData for Free Text, OpenSearch for Free Text, SPARQL for RDF, in addition to service specific Web Services (Web 2.0 hosted solutions that expose service specific APIs)
Semantic Radar - a Firefox Extension
PingTheSemantic - the Semantic Webs equivalent of Web 2.0's weblogs.com
PiggyBank - a Firefox Extension

Kingsley Idehen's Blog Data Space

Details

Subscribe

Tag Cloud

Post Categories

Subscribe

Recent Articles

Comments

Post Comment

Kingsley Idehen's Blog Data Space

Details

Subscribe

Tag Cloud

Post Categories

Subscribe

Recent Articles

Related

Comments

Post Comment