There is increasing coalescence around the idea that HTTP-based Linked Data adds a tangible dimension to the World Wide Web (Web). This Data Dimension grants end-users, power-users, integrators, and developers the ability to experience the Web not solely as a Information Space or Document Space, but now also as a Data Space.

Here is a simple What and Why guide covering the essence of Data Spaces.

What is a Data Space?

A Data Space is a point of presence on a network, where every Data Object (item or entity) is given a Name (e.g., a URI) by which it may be Referenced or Identified.

In a Data Space, every Representation of those Data Objects (i.e., every Object Representation) has an Address (e.g., a URL) from which it may be Retrieved (or "gotten").

In a Data Space, every Object Representation is a time variant (that is, it changes over time), streamable, and format-agnostic Resource.

An Object Representation is simply a Description of that Object. It takes the form of a graph, pictorially constructed from sets of 3 elements which are themselves named Subject, Predicate, and Object (or SPO); or Entity, Attribute, and Value (or EAV). Each Entity+Attribute+Value or Subject+Predicate+Object set (or triple), is one datum, one piece of data, one persisted observation about a given Subject or Entity.

The underlying Schema that defines and constrains the construction of Object Representations is based on Logic, specifically First-Order Logic. Each Object Representation is a collection of persisted observations (Data) about a given Subject, which aid observers in materializing their perception (Information), and ultimately comprehension (Knowledge), of that Subject.

Why are Data Spaces important?

In the real-world -- which is networked by nature -- data is heterogeneously (or "differently") shaped, and disparately located.

Data has been increasing at an alarming rate since the advent of computing; the interWeb simply provides context that makes this reality more palpable and more exploitable, and in the process virtuously ups the ante through increasingly exponential growth rates.

We can't stop data heterogeneity; it is endemic to the nature of its producers -- humans and/or human-directed machines. What we can do, though, is create a powerful Conceptual-level "bus" or "interface" for data integration, based on Data Description oriented Logic rather than Data Representation oriented Formats. Basically, it's possible for us to use a Common Logic as the basis for expressing and blending SPO- or EAV-based Object Representations in a variety of Formats (or "dialects").

The roadmap boils down to:

  1. Assigning unambiguous Object Names to:

    • Every record (or, in table terms, every row);

    • Every record attribute (or, in table terms, every field or column);

    • Every record relationship (that is, every relationship between one record and another);

    • Every record container (e.g., every table or view in a relational database, every named graph, every spreadsheet, every text file, etc.);

  2. Making each Object Name resolve to an Address through which Create, Read, Update, and Delete ("CRUD") operations can be performed against (can access) the associated Object Representation graph.