There is increasing coalescence around the idea that HTTP-based
Linked Data adds a tangible dimension to
the World Wide Web (Web). This
Data
Dimension grants end-users, power-users, integrators, and
developers the ability to experience the Web not solely as a
Information Space or Document
Space, but now also as a Data Space.
Here is a simple What and Why guide covering the essence of Data
Spaces.
What is a Data Space?
A Data Space is a point of presence on a network, where every
Data Object (item or entity) is given a Name (e.g., a
URI) by which it may be Referenced or
Identified.
In a Data Space, every Representation of those Data
Objects (i.e., every Object Representation) has an
Address (e.g., a URL) from which it may be Retrieved (or
"gotten").
In a Data Space, every Object Representation is a time variant
(that is, it changes over time), streamable, and format-agnostic
Resource.
An Object Representation is simply a Description of that Object.
It takes the form of a graph, pictorially constructed from sets of
3 elements which are themselves named Subject,
Predicate, and Object (or SPO); or
Entity, Attribute, and Value (or EAV).
Each Entity+Attribute+Value or
Subject+Predicate+Object set (or triple), is one datum, one
piece of data, one persisted observation about a given Subject or
Entity.
The underlying Schema that defines and constrains the
construction of Object Representations is based on Logic,
specifically First-Order Logic. Each Object Representation
is a collection of persisted observations (Data) about a
given Subject, which aid observers in materializing their
perception (Information), and ultimately comprehension
(Knowledge), of that Subject.
Why are Data Spaces important?
In the real-world -- which is networked by nature -- data is
heterogeneously (or "differently") shaped, and disparately
located.
Data has been increasing at an alarming rate since the advent of
computing; the interWeb simply provides context that makes this reality more
palpable and more exploitable, and in the process virtuously ups
the ante through increasingly exponential growth rates.
We can't stop data heterogeneity; it is endemic to the nature of
its producers -- humans and/or human-directed machines. What we can
do, though, is create a powerful Conceptual-level "bus" or
"interface" for data integration, based on Data Description
oriented Logic rather than Data Representation oriented
Formats. Basically, it's possible for us to use a Common Logic as the basis for
expressing and blending SPO- or EAV-based Object Representations in
a variety of Formats (or "dialects").
The roadmap boils down to:
-
Assigning unambiguous Object Names to:
-
Every record (or, in table terms, every row);
-
Every record attribute (or, in table terms, every field or
column);
-
Every record relationship (that is, every relationship between
one record and another);
-
Every record container (e.g., every table or view in a
relational database, every named graph, every spreadsheet, every
text file, etc.);
-
Making each Object Name resolve to an Address through which
Create, Read, Update, and Delete ("CRUD") operations can be
performed against (can access) the associated Object
Representation graph.