Here is a simple What and Why guide covering the essence of Data Spaces.
A Data Space is a point of presence on a network, where every Data Object (item or entity) is given a Name (e.g., a URI) by which it may be Referenced or Identified.
In a Data Space, every Representation of those Data Objects (i.e., every Object Representation) has an Address (e.g., a URL) from which it may be Retrieved (or "gotten").
In a Data Space, every Object Representation is a time variant (that is, it changes over time), streamable, and format-agnostic Resource.
An Object Representation is simply a Description of that Object. It takes the form of a graph, pictorially constructed from sets of 3 elements which are themselves named Subject, Predicate, and Object (or SPO); or Entity, Attribute, and Value (or EAV). Each Entity+Attribute+Value or Subject+Predicate+Object set (or triple), is one datum, one piece of data, one persisted observation about a given Subject or Entity.
The underlying Schema that defines and constrains the construction of Object Representations is based on Logic, specifically First-Order Logic. Each Object Representation is a collection of persisted observations (Data) about a given Subject, which aid observers in materializing their perception (Information), and ultimately comprehension (Knowledge), of that Subject.
In the real-world -- which is networked by nature -- data is heterogeneously (or "differently") shaped, and disparately located.
Data has been increasing at an alarming rate since the advent of computing; the interWeb simply provides context that makes this reality more palpable and more exploitable, and in the process virtuously ups the ante through increasingly exponential growth rates.
We can't stop data heterogeneity; it is endemic to the nature of its producers -- humans and/or human-directed machines. What we can do, though, is create a powerful Conceptual-level "bus" or "interface" for data integration, based on Data Description oriented Logic rather than Data Representation oriented Formats. Basically, it's possible for us to use a Common Logic as the basis for expressing and blending SPO- or EAV-based Object Representations in a variety of Formats (or "dialects").
The roadmap boils down to:
Assigning unambiguous Object Names to:
Every record (or, in table terms, every row);
Every record attribute (or, in table terms, every field or column);
Every record relationship (that is, every relationship between one record and another);
Every record container (e.g., every table or view in a relational database, every named graph, every spreadsheet, every text file, etc.);
Making each Object Name resolve to an Address through which Create, Read, Update, and Delete ("CRUD") operations can be performed against (can access) the associated Object Representation graph.
Introducing a new preloaded and preconfigured Virtuoso (Cluster Edition) AMI for the Amazon EC2 Cloud that hosts combined Linked Datasets from:
Predictably instantiate a powerful database with high quality data and cross links within minutes, for personal or service specific use.
Simply follow the instructions in our Amazon EC2 guide for the BBC + DBpedia 3.6 Linked Dataset guide.
Your installation steps are as follows:
The DBpedia + BBC Combo Linked Dataset is a preconfigured Virtuoso Cluster (4 Virtuoso Cluster Nodes, each comprised of one Virtuoso Instance; initial deployment is to a single Cluster Host, but license may be converted for physically distributed deployment), available via the Amazon EC2 Cloud, preloaded with the following datasets:
The BBC has been publishing Linked Data from its Web Data Space for a number of years. In line with best practices for injecting Linked Data into the World Wide Web (Web), the BBC datasets are interlinked with other datasets such as DBpedia and MusicBrainz.
Typical follow-your-nose exploration using a Web Browser (or even via sophisticated SPARQL query crawls) isn't always practical once you get past the initial euphoria that comes from comprehending the Linked Data concept. As your queries get more complex, the overhead of remote sub-queries increases its impact, until query results take so long to return that you simply give up.
Thus, maximizing the effects of the BBC's efforts requires Linked Data that shares locality in a Web-accessible Data Space â i.e., where all Linked Data sets have been loaded into the same data store or warehouse. This holds true even when leveraging SPARQL-FED style virtualization â there's always a need to localize data as part of any marginally-decent locality-aware cost-optimization algorithm.
This DBpedia + BBC dataset, exposed via a preloaded and preconfigured Virtuoso Cluster, delivers a practical point of presence on the Web for immediate and cost-effective exploitation of Linked Data at the individual and/or service specific levels.
Download Virtuoso installer archive(s). You must deploy the Personal or Enterprise Edition; the Open Source Edition does not support Shared-Nothing Cluster Deployment.
Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell and install directory):
. /opt/virtuoso/virtuoso-enterprise.sh
Optional: To keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME
environment variable to a different directory, e.g.,
export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Note: You will have to adjust this setting every time you shift between this cluster setup and your single-server setup. Either may be made your environment's default through the virtuoso-enterprise.sh
and related scripts.
Set up your cluster by running the mkcluster.sh
script. Note that initial deployment of the DBpedia + BBC Combo requires a 4 node cluster, which is the default for this script.
Start the Virtuoso Cluster with this command:
virtuoso-start.sh
Stop the Virtuoso Cluster with this command:
virtuoso-stop.sh
Navigate to your installation directory.
Download the combo dataset installer script â bbc-dbpedia-install.sh
.
For best results, set the downloaded script to fully executable using this command:
chmod 755 bbc-dbpedia-install.sh
Shut down any Virtuoso instances that may be currently running.
Optional: As above, if you have decided to keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME
environment variable appropriately, e.g.,
export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Run the combo dataset installer script with this command:
sh bbc-dbpedia-install.sh
The combo dataset typically deploys to EC2 virtual machines in under 90 minutes; your time will vary depending on your network connection speed, machine speed, and other variables.
Once the script completes, perform the following steps:
Verify that the Virtuoso Conductor (HTTP-based Admin UI) is in place via:
http://localhost:[port]/conductor
Verify that the Virtuoso SPARQL endpoint is in place via:
http://localhost:[port]/sparql
Verify that the Precision Search & Find UI is in place via:
http://localhost:[port]/fct
Verify that the Virtuoso hosted PivotViewer is in place via:
http://localhost:[port]/PivotViewer
A simple guide usable by any Perl developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.
# # Demonstrating use of a single query to populate a # Virtuoso Quad Store via Perl. # # # HTTP URL is constructed accordingly with CSV query results format as the default via mime type. # use CGI qw/:standard/; use LWP::UserAgent; use Data::Dumper; use Text::CSV_XS; sub sparqlQuery(@args) { my $query=shift; my $baseURL=shift; my $format=shift; %params=( "default-graph" => "", "should-sponge" => "soft", "query" => $query, "debug" => "on", "timeout" => "", "format" => $format, "save" => "display", "fname" => "" ); @fragments=(); foreach $k (keys %params) { $fragment="$k=".CGI::escape($params{$k}); push(@fragments,$fragment); } $query=join("&", @fragments); $sparqlURL="${baseURL}?$query"; my $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); my $req = HTTP::Request->new(GET => $sparqlURL); my $res = $ua->request($req); $str=$res->content; $csv = Text::CSV_XS->new(); foreach $line ( split(/^/, $str) ) { $csv->parse($line); @bits=$csv->fields(); push(@rows, [ @bits ] ); } return \@rows; } # Setting Data Source Name (DSN) $dsn="http://dbpedia.org/resource/DBpedia"; # Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET using the IRI in # FROM clause as Data Source URL en route to DBMS # record Inserts. $query="DEFINE get:soft \"replace\"\n # Generic (non Virtuoso specific SPARQL # Note: this will not add records to the # DBMS SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; $data=sparqlQuery($query, "http://localhost:8890/sparql/", "text/csv"); print "Retrieved data:\n"; print Dumper($data);
Retrieved data: $VAR1 = [ [ 's', 'p', 'o' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2002/07/owl#Thing' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://dbpedia.org/ontology/Work' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://dbpedia.org/class/yago/Software106566077' ], ...
CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Perl developer that already knows how to use Perl for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
DBpedia is a community effort to provide a contemporary deductive database derived from Wikipedia content. Project contributions can be partitioned as follows:
Comprising the nucleus of the Linked Open Data effort, DBpedia also serves as a fulcrum for the burgeoning Web of Linked Data by delivering a dense and highly-interlinked lookup database. In its most basic form, DBpedia is a great source of strong and resolvable identifiers for People, Places, Organizations, Subject Matter, and many other data items of interest. Naturally, it provides a fantastic starting point for comprehending the fundamental concepts underlying TimBL's initial Linked Data meme.
Depending on your particular requirements, whether personal or service-specific, DBpedia offers the following:
OpenLink Software has preloaded the DBpedia 3.6 datasets into a preconfigured Virtuoso Cluster Edition database, and made the package available for easy installation.
The DBpedia+Virtuoso package provides a cost-effective option for personal or service-specific incarnations of DBpedia.
For instance, you may have a service that isn't best-served by competing with the rest of the world for ad-hoc query time and resources on the live instance, which itself operates under various restrictions which enable this ad-hoc query service to be provided at Web Scale.
Now you can easily commission your own instance and quickly exploit DBpedia and Virtuoso's database feature set to the max, powered by your own hardware and network infrastructure.
Pre-requisites are simply:
To install the Virtuoso Cluster Edition simply perform the following steps:
Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell):
. /opt/virtuoso/virtuoso-enterprise.sh
mkcluster.sh
script which defaults to a 4 node cluster
VIRTUOSO_HOME
environment variable -- if you want to start cluster databases distinct from single server databases via distinct root directory for database files (one that isn't adjacent to single-server database directories)
virtuoso-start.sh
virtuoso-stop.sh
To install your personal or service specific edition of DBpedia simply perform the following steps:
dbpedia-install.sh
)
chmod 755 dbpedia-install.sh
VIRTUOSO_HOME
environment variable, e.g., to the current directory, via command (this may vary depending on your shell):
export VIRTUOSO_HOME=`pwd`
sh dbpedia-install.sh
Once the installation completes (approximately 1 hour and 30 minutes from start time), perform the following steps:
http://localhost:[port]/conductor
http://localhost:[port]/fct
http://localhost:[port]/resource/DBpedia
A simple guide usable by any Javascript developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.
/* Demonstrating use of a single query to populate a # Virtuoso Quad Store via Javascript. */ /* HTTP URL is constructed accordingly with JSON query results format as the default via mime type. */ function sparqlQuery(query, baseURL, format) { if(!format) format="application/json"; var params={ "default-graph": "", "should-sponge": "soft", "query": query, "debug": "on", "timeout": "", "format": format, "save": "display", "fname": "" }; var querypart=""; for(var k in params) { querypart+=k+"="+encodeURIComponent(params[k])+"&"; } var queryURL=baseURL + '?' + querypart; if (window.XMLHttpRequest) { xmlhttp=new XMLHttpRequest(); } else { xmlhttp=new ActiveXObject("Microsoft.XMLHTTP"); } xmlhttp.open("GET",queryURL,false); xmlhttp.send(); return JSON.parse(xmlhttp.responseText); } /* setting Data Source Name (DSN) */ var dsn="http://dbpedia.org/resource/DBpedia"; /* Virtuoso pragma "DEFINE get:soft "replace" instructs Virtuoso SPARQL engine to perform an HTTP GET using the IRI in FROM clause as Data Source URL with regards to DBMS record inserts */ var query="DEFINE get:soft \"replace\"\nSELECT DISTINCT * FROM <"+dsn+"> WHERE {?s ?p ?o}"; var data=sparqlQuery(query, "/sparql/");
Place the snippet above into the <script/> section of an HTML document to see the query result.
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Javascript developer that already knows how to use Javascript for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any PHP developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. PHP.
#!/usr/bin/env php <?php # # Demonstrating use of a single query to populate a # Virtuoso Quad Store via PHP. # # HTTP URL is constructed accordingly with JSON query results format in mind. function sparqlQuery($query, $baseURL, $format="application/json") { $params=array( "default-graph" => "", "should-sponge" => "soft", "query" => $query, "debug" => "on", "timeout" => "", "format" => $format, "save" => "display", "fname" => "" ); $querypart="?"; foreach($params as $name => $value) { $querypart=$querypart . $name . '=' . urlencode($value) . "&"; } $sparqlURL=$baseURL . $querypart; return json_decode(file_get_contents($sparqlURL)); }; # Setting Data Source Name (DSN) $dsn="http://dbpedia.org/resource/DBpedia"; #Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET #using the IRI in FROM clause as Data Source URL $query="DEFINE get:soft \"replace\" SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; $data=sparqlQuery($query, "http://localhost:8890/sparql/"); print "Retrieved data:\n" . json_encode($data); ?>
Retrieved data: {"head": {"link":[],"vars":["s","p","o"]}, "results": {"distinct":false,"ordered":true, "bindings":[ {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/www.w3.org\/2002\/07\/owl#Thing"}}, {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/dbpedia.org\/ontology\/Work"}}, {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/dbpedia.org\/class\/yago\/Software106566077"}}, ...
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a PHP developer that already knows how to use PHP for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Python developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Python.
#!/usr/bin/env python # # Demonstrating use of a single query to populate a # Virtuoso Quad Store via Python. # import urllib, json # HTTP URL is constructed accordingly with JSON query results format in mind. def sparqlQuery(query, baseURL, format="application/json"): params={ "default-graph": "", "should-sponge": "soft", "query": query, "debug": "on", "timeout": "", "format": format, "save": "display", "fname": "" } querypart=urllib.urlencode(params) response = urllib.urlopen(baseURL,querypart).read() return json.loads(response) # Setting Data Source Name (DSN) dsn="http://dbpedia.org/resource/DBpedia" # Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET # using the IRI in FROM clause as Data Source URL query="""DEFINE get:soft "replace" SELECT DISTINCT * FROM <%s> WHERE {?s ?p ?o}""" % dsn data=sparqlQuery(query, "http://localhost:8890/sparql/") print "Retrieved data:\n" + json.dumps(data, sort_keys=True, indent=4) # # End
Retrieved data: { "head": { "link": [], "vars": [ "s", "p", "o" ] }, "results": { "bindings": [ { "o": { "type": "uri", "value": "http://www.w3.org/2002/07/owl#Thing" }, "p": { "type": "uri", "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" }, "s": { "type": "uri", "value": "http://dbpedia.org/resource/DBpedia" } }, ...
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Python developer that already knows how to use Python for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Ruby developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Ruby.
#!/usr/bin/env ruby # # Demonstrating use of a single query to populate a # Virtuoso Quad Store. # require 'net/http' require 'cgi' require 'csv' # # We opt for CSV based output since handling this format is straightforward in Ruby, by default. # HTTP URL is constructed accordingly with CSV as query results format in mind. def sparqlQuery(query, baseURL, format="text/csv") params={ "default-graph" => "", "should-sponge" => "soft", "query" => query, "debug" => "on", "timeout" => "", "format" => format, "save" => "display", "fname" => "" } querypart="" params.each { |k,v| querypart+="#{k}=#{CGI.escape(v)}&" } sparqlURL=baseURL+"?#{querypart}" response = Net::HTTP.get_response(URI.parse(sparqlURL)) return CSV::parse(response.body) end # Setting Data Source Name (DSN) dsn="http://dbpedia.org/resource/DBpedia" #Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET #using the IRI in FROM clause as Data Source URL query="DEFINE get:soft \"replace\" SELECT DISTINCT * FROM <#{dsn}> WHERE {?s ?p ?o} " #Assume use of local installation of Virtuoso #otherwise you can change URL to that of a public endpoint #for example DBpedia: http://dbpedia.org/sparql data=sparqlQuery(query, "http://localhost:8890/sparql/") puts "Got data:" p data # # End
Got data: [["s", "p", "o"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://www.w3.org/2002/07/owl#Thing"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://dbpedia.org/ontology/Work"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://dbpedia.org/class/yago/Software106566077"], ...
CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Ruby developer that already knows how to use Ruby for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).
SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.
Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.
Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.
What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:
Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:
Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.
Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:
Looking retrospectively at any technology failure -- enterprises or industry at large -- you will eventually discover -- at the core -- messy conflation of at least one of the following:
The Internet & World Wide Web (InterWeb) are massive successes because their respective architectural cores embody the critical separation outlined above.
The Web of Linked Data is going to become a global reality, and massive success, because it leverages inherently sound architecture -- bar conflationary distractions of RDF. :-)
]]>The problems typically take the following form:
To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.
Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.
Many options allow you to easily and quickly generate RDF data from other data sources:
Install the Faceted Browser VAD package (fct_dav.vad
) which delivers the following:
Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --
http://<cname>[:<port>]/describe/?uri=<entity-uri>
<cname>[:<port>]
gets replaced by the host and port of your Virtuoso instance<entity-uri>
gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.
The fundamental steps to creating Linked Data are as follows:
Choose a Name Reference Mechanism â i.e., URIs.
Choose a Data Model with which to Structure your Data â minimally, you need a model which clearly distinguishes
Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, OData, OpenGraph, and many others.
Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.
Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:
You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more.
Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.
The fundamental steps to creating Linked Data are as follows:
Choose a Name Reference Mechanism â i.e., URIs.
Choose a Data Model with which to Structure your Data â minimally, you need a model which clearly distinguishes
Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, and OData; there are many others.
Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.
Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:
You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more.
How Does Linked Data Address This Problem? It provides critical infrastructure for the WebID Protocol that enables an innovative tweak of SSL/TLS.
What about OpenID? The WebID Protocol embraces and extends OpenID (in an open and positive way) via the WebID + OpenID Hybrid variant of the protocol -- basic effect is that OpenID calls are re-routed to the WebID aspect which simply removes Username and Password Authentication from the authentication challenge interaction pattern.
I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.
Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).
It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.
A service from OpenLink Software, available at: http://uriburner.com, that enables anyone to generate structured descriptions -on the fly- for resources that are already published to HTTP based networks. These descriptions exist as hypermedia resource representations where links are used to identify:
The hypermedia resource representation outlined above is what is commonly known as an Entity-Attribute-Value (EAV) Graph. The use of generic HTTP scheme based Identifiers is what distinguishes this type of hypermedia resource from others.
The virtues (dual pronged serendipitous discovery) of publishing HTTP based Linked Data across public (World Wide Web) or private (Intranets and/or Extranets) is rapidly becoming clearer to everyone. That said, the nuance laced nature of Linked Data publishing presents significant challenges to most. Thus, for Linked Data to really blossom the process of publishing needs to be simplified i.e., "just click and go" (for human interaction) or REST-ful orchestration of HTTP CRUD (Create, Read, Update, Delete) operations between Client Applications and Linked Data Servers.
In similar vane to the role played by FeedBurner with regards to Atom and RSS feed generation, during the early stages of the Blogosphere, it enables anyone to publish Linked Data bearing hypermedia resources on an HTTP network. Thus, its usage covers two profiles: Content Publisher and Content Consumer.
The steps that follow cover all you need to do:
That's it! The discoverability (SDQ) of your content has just multiplied significantly, its structured description is now part of the Linked Data Cloud with a reference back to your site (which is now a bona fide HTTP based Linked Data Space).
HTML+RDFa based representation of a structured resource description:
<link rel="describedby" title="Resource Description (HTML)"type="text/html" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
JSON based representation of a structured resource description:
<link rel="describedby" title="Resource Description (JSON)" type="application/json" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
N3 based representation of a structured resource description:
<link rel="describedby" title="Resource Description (N3)" type="text/n3" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
RDF/XML based representations of a structured resource description:
<link rel="describedby" title="Resource Description (RDF/XML)" type="application/rdf+xml" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
As an end-user, obtaining a structured description of any resource published to an HTTP network boils down to the following steps:
If you are a developer, you can simply perform an HTTP operation request (from your development environment of choice) using any of the URL patterns presented below:
HTML:URIBurner is a "deceptively simple" solution for cost-effective exploitation of HTTP based Linked Data meshes. It doesn't require any programming or customization en route to immediately realizing its virtues.
If you like what URIBurner offers, but prefer to leverage its capabilities within your domain -- such that resource description URLs reside in your domain, all you have to do is perform the following steps:
When you install your own URIBurner instances, you also have the ability to perform customizations that increase resource description fidelity in line with your specific needs. All you need to do is develop a custom extractor cartridge and/or meta cartridge.
SPARQL Endpoint: Linked Open Data Cache (8.5 Billion+ Quad Store which includes data from Geonames and the Linked GeoData Project Data Sets) .
At the end of the marathon session, it was clear to me that a blog post was required for future reference, at the very least :-)
"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.
Information makes the world tick!
Information doesn't exist without data to contextualize.
Information is inaccessible without a projection (presentation) medium.
All information (without exception, when produced by humans) is subjective. Thus, to truly maximize the innate heterogeneity of collective human intelligence, loose coupling of our information and associated data sources is imperative.
Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources bearing structured representations of data object descriptions. Remember, you have a single Identifier abstraction (generic HTTP URI) that embodies: Data Object Name and Data Representation Location (aka URL).
A structured representation of data exists when an Entity (Datum), its Attributes, and its Attribute Values are clearly discernible. In the case of a Linked Data Object, structured descriptions take the form of a hypermedia based Entity-Attribute-Value (EAV) graph pictorial -- where each Entity, its Attributes, and its Attribute Values (optionally) are identified using Generic HTTP URIs.
Examples of structured data representation formats (content types) associated with Linked Data Objects include:
You markup resources by expressing distinct entity-attribute-value statements (basically these a 3-tuple records) using a variety of notations:
You can achieve this task using any of the following approaches:
Our data access middleware heritage (which spans 16+ years) has enabled us to assemble a rich portfolio of coherently integrated products that enable cost-effective evaluation and utilization of Linked Data, without writing a single line of code, or exposing you to the hidden, but extensive admin and configuration costs. Post installation, the benefits of Linked Data simply materialize (along the lines described above).
Our main Linked Data oriented products include:
Anyway, Socialtext and Mike 2.0 (they aren't identical and juxtaposition isn't seeking to imply this) provide nice demonstrations of socially enhanced collaboration for individuals and/or enterprises is all about:
As is typically the case in this emerging realm, the critical issue of discrete "identifiers" (record keys in sense) for data items, data containers, and data creators (individuals and groups) is overlooked albeit unintentionally.
Rather than using platform constrained identifiers such as:
It enables you to leverage the platform independence of HTTP scheme Identifiers (Generic URIs) such that Identifiers for:
simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked Data Objects endowed with High SDQ (Serendipitious Discovery Quotient). For example my Personal WebID is all anyone needs to know if they want to explore:
Even when you reach a point of equilibrium where: your daily activities trigger orchestratestration of CRUD (Create, Read, Update, Delete) operations against Linked Data Objects within your socially enhanced collaboration network, you still have to deal with the thorny issues of security, that includes the following:
FOAF+SSL, an application of HTTP based Linked Data, enables you to enhance your Personal HTTP scheme based Identifer (or WebID) via the following steps (peformed by a FOAF+SSL compliant platform):
Contrary to conventional experiences with all things PKI (Public Key Infrastructure) related, FOAF+SSL compliant platforms typically handle the PKI issues as part of the protocol implementation; thereby protecting you from any administrative tedium without compromising security.
Understanding how new technology innovations address long standing problems, or understanding how new solutions inadvertently fail to address old problems, provides time tested mechanisms for product selection and value proposition comprehension that ultimately save scarce resources such as time and money.
If you want to understand real world problem solution #1 with regards to HTTP based Linked Data look no further than the issues of secure, socially aware, and platform independent identifiers for data objects, that build bridges across erstwhile data silos.
If you want to cost-effectively experience what I've outlined in this post, take a look at OpenLink Data Spaces (ODS) which is a distributed collaboration engine (enterprise of individual) built around the Virtuoso database engines. It simply enhances existing collaboration tools via the following capabilities:
Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items (if missing)
Since the beginning of the modern IT era, each period of innovation has inadvertently introduced its fair share of Data Silos. The driving force behind this anomaly remains an overemphasis on the role of applications when selecting problem solutions. Unfortunately, most solution selecting decision makers remain oblivious to the fact that most applications are architecturally monolithic; i.e., they fail to separate the following five layers that are critical to all solutions:
The rise of the Internet, and its exponentially-growing user-friendly enclave known as the World Wide Web, is bringing the intrinsic costs of the monolithic application architecture anomaly to bear -- in manners unanticipated by many. For example, the emergence of network-oriented solutions across the realms of Enterprise 2.0-based Collaboration and Web 2.0-based Software-as-a-Service (SaaS), combined with the overarching influence of Social Media, are producing more heterogeneously-structured and disparately-located data sources than people can effectively process.
As is often the case, a variety of problem and product monikers have emerged for the data access and integration challenges outlined above. Contemporary examples include Enterprise Information Integration, Master Data Management, and Data Virtualization. Labeling aside, the fundamental issues of the unresolved Data Integration challenge boil down to the following:
Effectively solving today's data integration challenges requires a move away from monolithic application architecture to loosely-coupled, network-centric application architectures. Basically, we need a ubiquitous network-centric application protocol that lends itself to loosely-coupled across-the-wire orchestration of data interactions. In short, this will be what revitalizes the art of application development and deployment.
The World Wide Web is built around a network application protocol called HTTP. This protocol intrinsically separates the five layers listed earlier, thereby enabling:
A uniquely designed to address today's escalating Data Access and Integration challenges without compromising performance, security, or platform independence. At its core lies an unrivaled commitment to industry standards combined with unique technology innovation that transcends erstwhile distinct realms such as:
When Virtuoso is installed and running, HTTP-based Data Objects are automatically created as a by-product of its powerful data virtualization, transcending data sources and data representation formats. The benefits of such power extend across profiles such as:
Bottom line, Virtuoso delivers unrivaled flexibility and scalability, without compromising performance or security.
Â
]]>In this post I provide a brief re-introduction to this essential aspect of Virtuoso.
This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).
In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.
The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!
You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).
You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).
It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.
As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.
Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.
A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.
As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.
OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).
The steps are as follows:
In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.
It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.
In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.
Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:
Note to Web Programmers: Linked Data is about Data (Wine) and not about Code (Fish). Thus, it isn't a "programmer only zone", far from it. More than anything else, its inherently inclusive and spreads its participation net widely across: Data Architects, Data Integrators, Power Users, Knowledge Workers, Information Workers, Data Analysts, etc.. Basically, everyone that can "click on a link" is invited to this particular party; remember, it is about "Linked Data" not "Linked Code", after all. :-)
Here is an example of a Linked Data value pyramid that I am stumbling across --with some frequency-- these days (note: 1 being the pyramid apex):
Basically, Linked Data deployment (assigning de-referencable HTTP URIs to DBMS records, their attributes, and attribute values [optionally] ) is occurring last. Even worse, this happens in the context of Linked Open Data oriented endeavors, resulting in nothing but confusion or inadvertent perpetuation of the overarching pragmatically challenged "Semantic Web" stereotype.
As you can imagine, hitting SPARQL as your introduction to Linked Data is akin to hitting SQL as your introduction to Relational Database Technology, neither is an elevator-style value prop. relay mechanism.
In the relational realm, killer demos always started with desktop productivity tools (spreadsheets, report-writers, SQL QBE tools etc.) accessing, relational data sources en route to unveiling the "Productivity" and "Agility" value prop. that such binding delivered i.e., the desktop application (clients) and the databases (servers) are distinct, but operating in a mutually beneficial manner to all, courtesy of a data access standards such as ODBC (Open Database Connectivity).
In the Linked Data realm, learning to embrace and extend best practices from the relational dbms realm remains a challenge, a lot of this has to do with hangovers from a misguided perception that RDF databases will somehow completely replace RDBMS engines, rather than compliment them. Thus, you have a counter productive variant of NIH (Not Invented Here) in play, taking us to the dreaded realm of: Break the Pot and You Own It (exemplified by the 11+ year Semantic Web Project comprehension and appreciation odyssey).
From my vantage point, here is how I believe the Linked Data value pyramid should be layered, especially when communicating the essential value prop.:
Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.
A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.
As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.
OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).
The steps are as follows:
In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist without a fully populated SPARQL compliant Quad Store. Last but not least, it doesn't exist if you have a fully loaded SPARQL compliant Quad Store isn't up to the cocktail of challenges (query load and complexity) presented by live Web database accessibility.
It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.
In the most basic sense, simply browse the HTML based resource decriptor pages en route to discovering erstwhile undiscovered relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records), comprised of HTTP URIs from both realms e.g., via owl:sameAs relations.
Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:
When trying to understand HTTP based Linked Data, especially if you're well versed in DBMS technology use (User, Power User, Architect, Analyst, DBA, or Programmer) think:
Remember the need for Data Access & Integration technology is the by product of the following realities:
In a nutshell, the AWS Cloud infrastructure simplifies the process of generating Federated presence on the Internet and/or World Wide Web. Remember, centralized networking models always end up creating data silos, in some context, ultimately! :-)
]]>2009 is over. Yeah, sure, trueg, we know that, it has been over for a while now! Ok, ok, I am a bit late, but still I would like to get this one out - if only for my archive. So here goes.
Letâs start with the major topic of 2009 (and also the beginning of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the same problems: you either used the sesame2 backend which depends on Java and steals all of your memory or you were stuck with Redland which had the worst performance and missed some SPARQL features making important parts of Nepomuk like queries unusable. So more than a year ago I had the idea to use the one GPLâed database server out there that supported RDF in a professional manner: OpenLinkâs Virtuoso. It has all the features we need, has a very good performance, and scales up to dimensions we will probably never reach on the desktop (yeah, right, and 64k main memory will be enough forever!). So very early I started coding the necessary Soprano plugin which would talk to a locally running Virtuoso server through ODBC. But since I ran into tons of small problems (as always) and got sidetracked by other tasks I did not finish it right away. OpenLink, however, was very interested in the idea of their server being part of every KDE installation (why wouldnât they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable for the desktop but also helped in debugging all the problems that I had left. Many test runs, patches, and a Virtuoso 5.0.12 release later I could finally announce the Virtuoso integration as usable.
Then end of last year I dropped the support for sesame2 and redland. Virtuoso is now the only supported database backend. The reason is simple: Virtuoso is way more powerful than the rest - not only in terms of performance - and it is fully implemented in C(++) without any traces of Java. Maybe even more important is the integration of the full text index which makes the previously used CLucene index unnecessary. Thus, we can finally combine full text and graph queries in one SPARQL query. This results in a cleaner API and way faster return of search results since there is no need to combine the results from several queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss later.
So now the only thing I am waiting for is the first bugfix release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make 6.0.0 fail with Nepomuk. Should be out any day now. :)
Querying data in Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the very limited capabilities of the ResourceManager to list resources with certain properties or of a certain type; or 2. Write your own SPARQL query using ugly QString::arg replacements.
With the introduction of Virtuoso and its awesome power we can now do pretty much everything in one query. This allowed me to finally create a query API for KDE: Nepomuk::Query::Query and friends. I wonât go into much detail here since I did that before.
All in all you should remember one thing: whenever you think about writing your own SPARQL query in a KDE application - have a look at libnepomukquery. It is very likely that you can avoid the hassle of debugging a query by using the query API.
The first nice effect of the new API (apart from me using it all over the place obviously) is the new query interface in Dolphin. Internally it simply combines a bunch of Nepomuk::Query::Term objects into a Nepomuk::Query::AndTerm. All very readable and no ugly query strings.
An important part of the Nepomuk research project was the creation of a set of ontologies for describing desktop resources and their metadata. After the Xesam project under the umbrella of freedesktop.org had been convinced to use RDF for describing file metadata they developed their own ontology. Thanks to Evgeny (phreedom) Egorochkin and Antonie Mylka both the Xesam ontology and the Nepomuk Information Elements Ontology were already very close in design. Thus, it was relatively easy to merge the two and be left with only one ontology to support. Since then not only KDE but also Strigi and Tracker are using the Nepomuk ontologies.
At the Gran Canaria Desktop Summit I met some of the guys from Tracker and we tried to come up with a plan to create a joint project to maintain the ontologies. This got off to a rough start as nobody really felt responsible. So I simply took the initiative and released the shared-desktop-ontologies version 0.1 in November 2009. The result was a s***-load of hate-mails and bug reports due to me breaking KDE build. But in the end it was worth it. Now the package is established and other projects can start to pick it up to create data compatible to the Nepomuk system and Tracker.
Today the ontologies (and the shared-desktop-ontologies package) are maintained in the Oscaf project at Sourceforge. The situation is far from perfect but it is a good start. If you need specific properties in the ontologies or are thinking about creating one for your own application - come and join us in the bug trackerâ¦
It was at the Akonadi meeting that Will Stephenson and myself got into talking about mimicking some Zeitgeist functionality through Nepomuk. Basically it meant gathering some data when opening and when saving files. We quickly came up with a hacky patch for KIO and KFileDialog which covered most cases and allowed us to track when a file was modified and by which application. This little experiment did not leave that state though (it will, however, this year) but another one did: Zeitgeist also provides a fuse filesystem which allows to browse the files by modification dates. Well, whatever fuse can do, KIO can do as well. Introducing the timeline:/ KIO slave which gives a calendar view onto your files.
Well, I thought I would mention the Tips And Tricks section I wrote for the techbase. It might not be a big deal but I think it contains some valuable information in case you are using Nepomuk as a developer.
This time around I had the privilege to mentor two students in the Google Summer of Code. Alessandro Sivieri and Adam Kidder did outstanding work on Improved Virtual Folders and the Smart File Dialog.
Adamâs work lead me to some heavy improvements in the Nepomuk KIO slaves myself which I only finished this week (more details on that coming up). Alessandro continued his work on faceted file browsing in KDE and created:
Alessandro is following up on his work to make faceted file browsing a reality in 2010 (and KDE SC 4.5). Since it was too late to get faceted browsing into KDE SC 4.4 he is working on Sembrowser, a stand-alone faceted file browser which will be the grounds for experiments until the code is merged into Dolphin.
In 2009 I organized the first Nepomuk workshop in Freiburg, Germany. And also the second one. While I reported properly on the first one I still owe a summary for the second one. I will get around to that - sooner or later. ;)
Soprano gives us a nice command line tool to create a C++ namespace from an ontology file: onto2vocabularyclass. It produces nice convenience namespaces like Soprano::Vocabulary::NAO. Nepomuk adds another tool named nepomuk-rcgen. Both were a bit clumsy to use before. Now we have nice cmake macros which make it very simple to use both.
See the techbase article on how to use the new macros.
Without my knowledge (imagine that!) Andrew Lake created an amazing new media player named Bangarang - a Jamaican word for noise, chaos or disorder. This player is Nepomuk-enabled in the sense that it has a media library which lets you browse your media files based on the Nepomuk data. It remembers the number of times a song or a video has been played and when it was played last. It allows to add detail such as the TV series name, season, episode number, or actors that are in the video - all through Nepomuk (I hope we will soon get tvdb integration).
I am especially excited about this since finally applications not written or mentored by me start contributing Nepomuk data.
2009 was also the year of the first Gnome-KDE joint-conference. Let me make a bulletin for completeness and refer to my previous blog post reporting on my experiences on the island.
Well, that was by far not all I did in 2009 but I think I covered most of the important topics. And after all it is âjust a blog entryâ - there is no need for completeness. Thanks for reading.
Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:
As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:
The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.
I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.
A few months ago, Aldo Bucchi posted a message to the LOD mailing list seeking a discussion space for more business and marketing oriented topic, in relation to Linked Data. At the time, my assumption was that the existing LOD mailing list served that purpose absolutely fine, but in due course I came to realize that Aldo's request had a much lager foundation than I initially suspected.
Linked Data, like its umbrella Semantic Web Project, has suffered from an inadvertent oversight on the parts of many of its enthusiasts (myself included): 100% of the discussion spaces are created by, geared towards, or dominated by researchers (from Academia primarily) and/or developers. Thus, at the very least, we've been operating in an echo chamber that only feed the existing void between the core community and those who are more interested in discussing business and marketing related topics.
The new discussion space seeks to cover the following:
How Do I Join The Conversation? Simply sign up on the Google hosted BOLD mailing list, introduce yourself (ideally), and then start conversing! :-)
]]>Enjoy!
]]>Enjoy!
]]>The journey towards this watershed moment started with the Semantic Web Project, gained focus and pragmatism via the Linked Data meme, attained substance & credibility via efforts such as DBpedia and the resulting cloud of Open Linked Data Spaces, and finally arrived at the most important destination of all: broad comprehension and coherence, via RDFa.
Over the years, I've chronicled the journey above via entries in this particular data space (my blog) and most recently, via my rapid-fire comments and debates on Twitter (basically hastag #linkeddata account: kidehen).
On a parallel front re. my chronicles, I've periodically had conversations with Jon Udell, who has always provided a coherent sounding board and reconciliation framework for my world views and open data access vision; naturally, this has a lot to do with his holistic grasp of the big picture issues, associated technical details, and special communication prowess :-)
Against this backdrop, I refer you to my most recent podcast conversation with Jon, which is about how the tandem of HTML+RDFa and the GoodRelations vocabulary deliver the critical missing links re. broad comprehension of the Semantic Web vision en route to mass exploitation.
As the "Linked Data" meme has gained momentum you've more than likely been on the receiving end of dialog with Linked Open Data community members (myself included) that goes something like this:
"Do you have a URI", "Get yourself a URI", "Give me a de-referencable URI" etc..
And each time, you respond with a URL -- which to the best of your Web knowledge is a bona fide URI. But to your utter confusion you are told: Nah! You gave me a Document URI instead of the URI of a real-world thing or object etc..
Well our everyday use of the Web is an unfortunate conflation of two distinct things, which have Identity: Real World Objects (RWOs) & Address/Location of Documents (Information bearing Resources).
The "Linked Data" meme is about enhancing the Web by unobtrusively reintroducing its core essence: the generic HTTP URI, a vital piece of Web Architecture DNA. Basically, its about so realizing the full capabilities of the Web as a platform for Open Data Identification, Definition, Access, Storage, Representation, Presentation, and Integration.
People, Places, Music, Books, Cars, Ideas, Emotions etc..
A Uniform Resource Identifier. A global identifier mechanism for network addressable data items. Its sole function is Name oriented Identification.
The constituent parts of a URI (from URI Generic Syntax RFC) are depicted below:
A location oriented HTTP scheme based URI. The HTTP scheme introduces a powerful and inherent duality that delivers:
So far so good!
The kind of URI Linked Data aficionados mean when they use the term: URI.
An HTTP URI is an HTTP scheme based URI. Unlike a URL, this kind of HTTP scheme URI is devoid of any Web Location orientation or specificity. Thus, Its inherent duality provides a more powerful level of abstraction. Hence, you can use this form of URI to assign Names/Identifiers to Real World Objects (RWO). Even better, courtesy of the Identity/Address duality of the HTTP scheme, a single URI can deliver the following:
Data about Data. Put differently, data that describes other data in a structured manner.
The predominant model for metadata is the Entity-Attribute-Value + Classes & Relationships model (EAV/CR). A model that's been with us since the inception of modern computing (long before the Web).
The Resource Description Framework (RDF) is a framework for describing Web addressable resources. In a nutshell, its a framework for adding Metadata bearing Information Resources to the current Web. Its comprised of:
The ubiquitous use of the Web is primarily focused on a Linked Mesh of Information bearing Documents. URLs rather than generic HTTP URIs are the prime mechanism for Web tapestry; basically, we use URLs to conduct Information -- which is inherently subjective -- instead of using HTTP URIs to conduct "Raw Data" -- which is inherently objective.
Note: Information is "data in context", it isn't the same thing as "Raw Data". Thus, if we can link to Information via the Web, why shouldn't we be able to do the same for "Raw Data"?
The meme simply provides a set of guidelines (best practices) for producing Web architecture friendly metadata. Meaning: when producing EAV/CR model based metadata, endow Subjects, their Attributes, and Attribute Values (optionally) with HTTP URIs. By doing so, a new level of Link Abstraction on the Web is possible i.e., "Data Item to Data Item" level links (aka hyperdata links). Even better, when you de-reference a RWO hyperdata link you end up with a negotiated representations of its metadata.
Linked Data is ultimately about an HTTP URI for each item in the Data Organization Hierarchy :-)
Today, the we put stuff on the Web because we want it do be discovered as part of a "sharing act". Likewise, we make regular use of Search Engine Services because we want to "Find" stuff in a productive manner.
Putting, the above in context, you don't need to be Einstein to figure out that to date the Web hasn't enabled vendors to describe their products and services clearly. Likewise, it hasn't enabled us to describe what we want, when we want it, and how much we are willing to pay etc. Basically, the SDQ of Web Content is excruciatingly low!
The Linked Data meme is about using the essence of the Web -- HTTP URIs -- as the mechanism for conducting data across the Web that unambiguously unveils basic things like:
A Web of Linked Data enables a complete redefinition of eCommerce, and that's just for starters :-)
The primary topic of a meme penned by TimBL in the form of a Design Issues Doc (note: this is how TimBL has shared his thoughts since the Beginning of the Web).
There are a number of dimensions to the meme, but its primary purpose is the reintroduction of the HTTP URI -- a vital component of the Web's core architecture.
They possess an intrinsic duality that combines persistent and unambiguous Data Identity with platform & representation format independent Data Access. Thus, you can use a string of characters that look like a contemporary Web URL to unambiguously achieve the following:
Enabling more productive use of the Web by users and developers alike. All of which is achieved by tweaking the Web's Hyperlinking feature such that it now includes Hypertext and Hyperdata as link types.
Note: Hyperdata Linking is simply what an HTTP URI facilitates.
Examples problems solved by injecting Linked Data into the Web:
If all of the above still falls into the technical mumbo-jumbo realm, then simply consider Linked Data as delivering Open Data Access in granular form to Web accessible data -- that goes beyond data containers (documents or files).
The value proposition of Linked Data is inextricably linked to the value proposition of the World Wide Web. This is true, because the Linked Data meme is ultimately about an enhancement of the current Web; achieved by reintroducing its architectural essence -- in new context -- via a new level of link abstraction, courtesy of the Identity and Access duality of HTTP URIs.
As a result of Linked Data, you can now have Links on the Web for a Person, Document, Music, Consumer Electronics, Products & Services, Business Opening & Closing Hours, Personal "WishLists" and "OfferList", an Idea, etc.. in addition to links for Properties (Attributes & Values) of the aforementioned. Ultimately, all of these links will be indexed in a myriad of ways providing the substrate for the next major period of Internet & Web driven innovation, within our larger human-ingenuity driven innovation continuum.
The HTTP URI is the secret sauce of the Web that is powerfully and unobtrusively reintroduced via the Linked Data meme (classic back to the future act). This powerful sauce possess a unique power courtesy of its inherent duality i.e., how it uniquely combines Data Item Identity (think keys in traditional DBMS parlance) with Data Access (e.g. access to negotiable representations of associated metadata).
As you can see, I've made no mention of RDF or SPARQL, and I can still articulate the inherent value of the "Linked Data" dimension that the "Linked Data" meme adds to the World Wide Web.
As per usual this post is a live demonstration of Linked Data (dog-food style) :-)
If you perform the steps above, on any HTTP network (e.g. World Wide Web), you implicitly bind the Names/Identifiers of things to negotiable representations of their metadata (description) bearing documents.
Also note, you can create and deploy the resulting RDF metadata using any of the following approaches:
Dr. Dre is one of the artists in the Linked Data Space we host for the BBC. He is also referenced in music oriented data spaces such as DBpedia, MusicBrainz and Last.FM (to name a few).
How do I obtain a holistic view of the entity "Dr. Dre" across the BBC, MusicBrainz, and Last.FM data spaces? We know the BBC published Linked Data, but what about Last.FM and MusicBrainz? Both of these data spaces only expose XML or JSON data via REST APIs?
The following took place:
The new enhanced URI for Dr. Dre now provides a rich holistic view of the aforementioned "Artist" entity. This URI is usable anywhere on the Web for Linked Data Conduction :-)
As a leading media organization, the BBC's use of Linked Data provides a clear beacon to other media players re. the imminence of a serious Linked Data induced sector inflection. In a nutshell, every Web Site has to evolve into a Linked Data Space: a location on the Web that provides granular access to discrete data items in line with the core principles of the Linked Data meme.
Remember, the essence of the Linked Data meme is simply this: you reference data items and access their metadata, in variety of formats via a single HTTP based URI. This approach to Web data publishing is compatible with any HTTP aware user agent (e.g., your Web Browser or tools & applications that provide abstracted access to HTTP).
There a number of very powerful things available to end-users and developers alike.
The most powerful feature of our variant of the BBC's Linked Data Space is the exposure of Faceted Find (think Search++ and beyond). Thus, you can go the the home page of the service and commence data discovery and exploration via any of the following interfaces:
Once you are comfortable with at least one of the items above, you can exploit the system further by performing any of the following:
In line with the time-tested "embrace and extend" pattern, we provide Full Text search capability, but unlike Google, Yahoo!, Bing and other search engines, we don't use use "Page Rank" algorithm to sort results; instead, we use an "Entity Rank" algorithm since we are dealing with an RDF based Graph model DBMS where links exist between entities across instance data and data dictionary (vocabularies, schemas, ontologies) boundaries. In addition, when you get results (by clicking "show values" or "show values with distinct counts") that list entities associated with a full text search pattern, we take a quantum leap beyond search engines by allowing you to use "Entity Type" and/or "Entity Properties" (all of these have HTTP URIs too) to set your own context for what you seek.
Much more to come in the form of BBC specific demo queries and tutorials :-)
The observations above triggered a discussion thread on Twitter that involved: @edsu, @iand, and moi. Naturally, it morphed into a live demonstration of: human vs machine, interpretation of claims expressed in the RDF graph.
It showcases (in Man vs Machine style) the issue of unambiguously discerning the meaning of the owl:sameAs claim expressed in the LCSH Linked Data Space.
From the Linked Data perspective, it may spook a few people to see owl:sameAs values such as: "info:lc/authorities/sh95000541", that cannot be de-referenced using HTTP.
It may confuse a few people or user agents that see URI de-referencing as not necessarily HTTP specific, thereby attempting to de-reference the URI.URN on the assumption that it's associated with a "handle system", for instance.
It may even confuse RDFizer / RDFization middleware that use owl:sameAs as a data provider attribution mechanism via hint/nudge URI values derived from original content / data URI.URLs that de-reference to nothing e.g., an original resource URI.URL plus "#this" which produces URI.URN-URL -- think of this pattern as "owl:shameAs" in a sense :-)
Simply bring OWL reasoning (inference rules and reasoners) into the mix, thereby negating human dialogue about interpretation which ultimately unveils a mesh of orthogonal view points. Remember, OWL is all about infrastructure that ultimately enables you to express yourself clearly i.e., say what you mean, and mean what you say.
The SPARQL queries against the Graph generated and automatically populated by the Sponger reveal -- without human intervention-- that: "info:lc/authorities/sh95000541", is just an alternative name for < xmlns="http" id.loc.gov="id.loc.gov" authorities="authorities" sh95000541="sh95000541" concept="concept">, and that the graph produced by LCSH is self-describing enough for an OWL reasoner to figure this all out courtesy of the owl:sameAs property :-).
Hopefully, this post also provides a simple example of how OWL facilitates "Reasonable Linked Data".
section from IETF's Domain Keys spec. (paraphrased by me)
.
The Linked Data meme is based on the use of HTTP based URIs as reference / identifier labels associated with the "identity abstraction" referred to above. Thus, when you de-reference (request information about) an HTTP based URI you ultimately end up with a resource URL that exposes the "constellation of characteristics" mentioned above, in a representation negotiated at request time -- between an HTTP client and server e.g., (X)HTML, JSON, XML, RDF/XML, N3, Turtle, Trix, others :-)
Note: In proper Web parlance, a data object is referred to as a resource.
In the Linked Data realm, If you want to make a reference to the Linked Data meme in a blog post, you are better off using the resource URI: http://dbpedia.org/resource/Linked_Data, instead of the Web page URL: http://dbpedia.org/page/Linked_Data, which is the address of a physical document (an information conveying artifact) that at best visually presents the negotiated representation of a resource description.
In the simplest sense, you only have one focal point for referencing (referring to) and de-referencing (retrieving data about) a given Web resource. It protects you from the impact of Web document location changes (amongst many other things).
Remember, a single URI is a conduit into a realm where the identity, access, representation, presentation, and storage of a resource (data object) are completely distinct. It's the mechanism for conducting data across network, machine, operating system, dbms engine, application, and service (API) boundaries. Thus, without "linked data meme" prescribed URI referencing and de-referencing, we are simply back to "business as usual" re. the industry at large, where networks, operating systems, dbms engines, applications, and services (APIs) become the basis for "data lock-in" and silo construction.
Take a second to think about the profound virtues of the ubiquitous Web of Linked Document URLs that we have today, and then apply that thinking to the burgeoning Web of Linked Data URIs, that has just turned corner and heading in everyone's direction at full blast.
Note to "Social Media" players: Who you know isn't the canonical object of sociality. What you are i.e., your description and the data objects it exposes, are real objects of your sociality :-)
The acronym stands for: Resource Description Framework. And that's just what it is.
RDF is comprised of a Data Model (EAV/CR Graph) and Data Representation Formats such as: N3, Turtle, RDF/XML etc.
RDF's essence is about: "Entities" and "Attributes" being URI based, while "Values" may be URI or Literals (typed or untyped) based.
URIs are Entity Identifiers.
Short for "Web of Linked Data" or "Linked Data Web".
A term coined by TimBL that describes an HTTP based "data access by reference pattern" that uses a single pointer or handle for "referring to" and "obtaining actual data about" an entity.
Linked Data uses the deceptively simple messaging scheme of HTTP to deliver a granular entity reference and access mechanism that transcends traditional computing boundaries such as: operating system, application, database engines, and networks.
Linked Data simply mandates the following re. RDF:
Note: by Entity I am also referring to: a resource (Web parlance), data item, data object, real-world object, or datum.
Linked Data is also about, using URIs and HTTP's content negotiation feature to separate: presentation, representation, access, and identity of data items. Even better, content negotiation can be driven by user agent and/or data server based quality of service algorithms (representation preference order schemes).
To conclude, Linked Data is ultimately about the realization that: Data is the new Electricity, and it's conductors are URIs :-)
Tip to governments of the world: we are in exponential times, the current downturn is but one side of the "exponential times ledger", the other side of the "exponential times ledger" is simply about unleashing "raw data" -- in structured form -- into the Web, so that "citizen analysts" can blossom and ultimately deliver the transparency desperately sought at every level of the economic value chain. Think: "raw data ready" whenever you ponder about "shovel ready" infrastructure projects!
]]>Your Life, Profession, Web, and Internet do not need to become mutually exclusive due to "information overload".
A platform or service that delivers a point of online presence that embodies the fundamental separation of: Identity, Data Access, Data Representation, Data Presentation, by adhering to Web and Internet protocols.
Typical post installation (Local or Cloud) task sequence:
I've just outlined a snippet of the capabilities of the OpenLink Data Spaces platform. A platform built using OpenLink Virtuoso, architected to deliver: open, platform independent, multi-model, data access and data management across heterogeneous data sources.
All you need to remember is your URI when seeking to interact with your data space.
At the current time we have loaded 100% of all the very large data sets from the LOD Cloud. As result, we can start the process of exposing Linked Data virtues in a manner that's palatable to users, developers, and database professionals across the Web 1.0, 2.0, and 3.0 spectrums.
You can use the "Search & Find" or"URI Lookup" or SPARQL endpoint associated with the LOD cloud hosting instance to perform the following tasks:
If you don't want to use the SPARQL based Web Service, or other Linked Data Web oriented APIs for interacting with the LOD cloud programmatically, you can simply use the powerful REST style Web Service that provides URL parameters for performing full text oriented "Search", entity oriented "Find" queries, and faceted navigation over the huge data corpus with results data returned in JSON and XML formats.
Amazon have agreed to add all the LOD Cloud data sets to their existing public data sets collective. Thus, the data sets we are loading will be available in "raw data" (RDF) format on the public data sets page via Named Elastic Block Storage (EBS) Snapshots); meaning, you can make an EC2 AMI (e.g. a Linux, Windows, Solaris) and install an RDF quad or triple store of choice into your AMI, then simply load data from the LOD cloud based on your needs.
In addition to the above, we are also going to offer a Virtuoso 6.0 Cluster Edition based LOD Cloud AMI (as we've already done with DBpedia, MusicBrainz, NeuroCommons, and Bio2Rdf) that will enable you to simply instantiate a personal and service specific edition of Virtuoso with all the LOD data in place and fully tuned for performance and scalability; basically, you will simply press "Instantiate AMI" and a LOD cloud data space, in true Linked Data from, will be at your disposal within minutes (i.e. the time it takes the DB to start).
Work on the migration of the LOD data to EC2 starts this week. Thus, if you are interested in contributing an RDF based data set to the LOD cloud now is the time to get your archive links in place on the (see: ESW Wiki page for LOD Data Sets).
]]>Jason:
Scoble is sensing what comes next, but in my opinion, describes it using an old obtrusive advertising model anecdote.
I've penned a post or two about the "Magic of You" which is all about the new Web power broker (Entity: "You").
Personally, I've long envisaged a complete overhaul of advertising where obtrusive advertising simply withers away; ultimately replaced by an unobtrusive model that is driven by individualized relevance and high doses of serendipity. Basically, this is ultimately about "taking the Ad out of item placement in Web pages".
The fundamental ingredients of an unobtrusive advertising landscape would include the following Human facts:
Ideally, we would like to be able to simply state the following, via a Web accessible profile:
Now put the above into the context of an evolving Web where data items are becoming more visible by the second, courtesy of the "Linked Data" meme. Thus, things that weren't discernable via the Web: "People", "Places", "Music", "Books", "Products", etc., become much easier to identify and describe.
Assuming the comments above hold true re. the Web's evolution into a collection of Linked Data Spaces, and the following occur:
Wish-Lists and Offer-Lists will gradually start bonding with increasing degrees of serendipity courtesy of exponential growth in Linked Data Web density.
So based on what I've stated so far, Scoble would simply browse the Web or visit his profile page, and in either scenario enjoy a "minority report" style of experience albeit all under his control (since he is the one driving his Web user agent).
What I describe above simply comes down to "Wish-lists" and associated recommendations becoming the norm outside the confines of Amazon's data space on the Web. Serendipitous discovery, intelligent lookups, and linkages are going to be the fundamental essence of Linked Data Web oriented applications, services, agents.
Beyond Scoble, it's also important to note that access to data will be controlled by entity "You". Your data space on the Web will be something you will controll access to in a myriad of ways, and it will include the option to provide licensed access to commercial entities on your terms. Naturally, you will also determine the currency that facilitates the value exchange :-)
 | Web 1.0 | Web 2.0 | Web 3.0 |
Simple Definition | Interactive / Visual Web | Programmable Web | Linked Data Web |
Unit of Presence | Web Page | Web Service Endpoint | Data Space (named structured data enclave) |
Unit of Value Exchange | Page URL | Endpoint URL for API | Resource / Entity / Object URI |
Data Granularity | Low (HTML) | Medium (XML) | High (RDF) |
Defining Services | Search | Community (Blogs to Social Networks) | Find |
Participation Quotient | Low | Medium | High |
Serendipitous Discovery Quotient | Low | Medium | High |
Data Referencability Quotient | Low (Documents) | Medium (Documents) | High (Documents and their constituent Data) |
Subjectivity Quotient | High | Medium (from A-list bloggers to select source and partner lists) | Low (everything is discovered via URIs) |
Transclusence | Low | Medium (Code driven Mashups) | HIgh (Data driven Meshups) |
What You See Is What You Prefer (WYSIWYP) | Low | Medium | High (negotiated representation of resource descriptions) |
Open Data Access (Data Accessibility) | Low | Medium (Silos) | High (no Silos) |
Identity Issues Handling | Low | Medium (OpenID) | High (FOAF+SSL) |
Solution Deployment Model | Centralized | Centralized with sprinklings of Federation | Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia) |
Data Model Orientation | Logical (Tree based DOM) | Logical (Tree based XML) | Conceptual (Graph based RDF) |
User Interface Issues | Dynamically generated static interfaces | Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath) | Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF |
Data Querying | Full Text Search | Full Text Search | Full Text Search + Structured Graph Pattern Query Language (SPARQL) |
What Each Delivers | Democratized Publishing | Democratized Journalism & Commentary (Citizen Journalists & Commentators) | Democratized Analysis (Citizen Data Analysts) |
Star Wars Edition Analogy | Star Wars (original fight for decentralization via rebellion) | Empire Strikes Back (centralization and data silos make comeback) | Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation") |
Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)
Robin:
Web 3.0 is fundamentally about the World Wid Web becoming a structured database equipped with a formal data model (RDF which is a moniker for Entity-Attribute-Value with Classes & Relationships based Graph Model), query language, and a protocol for handling divrerse data representational requirements via negotiation
.Web 3.0 is about a Web that facilitates serendipitous discovery of relevant things; thereby making serendipitous discovery quotient (SDQ), rather than search engine optimization (SEO), the critical success factor that drives how resources get published on the Web.
Personally, I believe we are on the cusp of a major industry inflection re. how we interact with data hosted in computing spaces. In a nutshell, the conceptual model interaction based on real-world entities such as people, places, and other things (including abstract subject matter) will usurp traditional logical model interaction based on rows and columns of typed and/or untyped literal values exemplified by relational data access and management systems.
Labels such as "Web 3.0", "Linked Data", and "Semantic Web", are simply about the aforementioned model transition playing out on the World Wide Web and across private Linked Data Webs such as Intranets & Extranets, as exemplified emergence of the "Master Data Management" label/buzzword.
As was the case with Web Services re. Web 2.0, there is a critical piece of infrastructure driving the evolution in question, and in this case it comes down to the evolution of Hyperlinking.
We now have a new and complimentary variant of Hyperlinking commonly referred to as "Hyperdata" that now sits alongside "Hypertext". Hyperdata when used in conjunction with HTTP based URIs as Data Source Names (or Identifiers), delivers a potent and granular data access mechanism scoped down to the datum (object or record) level; which is much different from the document (record or entity container) level linkage that Hypertext accords.
In addition, the incorporation of HTTP into this new and enhanced granular Data Source Naming mechanism also addresses past challenges relating to separation of data, data representation, and data transmission protocols -- remember XDR woes familiar to all sockets level programmers -- courtesy of in-built content negotiation. Hence, via a simple HTTP GET --against a Data Source Name exposed by a Hyperdata link -- I can negotiate (from client or server sides) the exact representation of the description (entity-attribute-value graph) of an Entity / Data Object / Resource, dispatched by a data server.
For example, this is how a description of entity "Me" ends up being available in (X)HTML or RDF document representations (as you will observe when you click on that link to my Personal URI).
The foundation of what I describe above comes from:
Some live examples from DBpedia:
Today, I revisited the same article -- and to my shock and horror -- my comments do not exist (note: the site did accept my comments yesterday!). Even more frustrating for me, I now have to expend time I don't have re-writing my comments due to the depth and danger of the inaccuracies in this post re. RDF in general.
Please look into what happened to my comments. It's too early for me to conclude that subjective censorship is a play on the Web -- which isn't a hard copy journalistic format style of platform where editors get away with such shenanigans. The Web is a sticky database, and outer joining is well and truly functional (meaning: exclusion and omission ultimately come back to bite via full outer join query results against the Web DB).
By the way, if you publish the comments I made to the post (yesterday), I will add a note to this post, accordingly.
Yes! David just confirmed to me via Twitter that this is yet another comment system related issue and absolutely no intent to censor etc. His words Twervatim :-)
For sake of clarity, I've itemized the inaccuracies and applied my correction comments (inline) accordingly:
Inaccuracy #1:
Resource Description Framework (RDF), a part of the XML story, provides interoperability between applications that exchange information.
Correction #1:
RDF and XML are not inextricably linked in any way. RDF is part Data Model (EAV/CR style Graph) with associated markup and data serialization formats that include: N3, Turtle, TriX, RDF/XML etc.
Inaccuracy #2:
RDF uses XML to define a foundation for processing metadata and to provide a standard metadata infrastructure for both the Web and the enterprise.
Correction #2:
RDF/XML is an XML based markup and data serialization format. As a markup language it can be used for creating RDF model records/statements (using Subject, Predicate, Object or Entity, Attribute, Value). As a serialization format, it provides a mechanism for marshaling RDF data across data managers and data consumers.
Inaccuracy #3:
The difference between the two is that XML is used to transport data using a common format, while RDF is layered on top of XML defining a broad category of data.
Correction #3:
See earlier corrections above.
Inaccuracy #4:
When the XML data is declared to be of the RDF format, applications are then able to understand the data without understanding who sent it.
Correction #4:
You do not declare data to be of RDF format. RDF isn't a format it is a data model (as stated above). You can "up lift" or map data from XML to RDF (hierarchical to graph model mapping). Likewise you can "down shift" or map data from RDF to XML (example: SPARQL SELECT query patterns "down shift" to SPARQL Results XML, which isn't RDF/XML, while keeping access to graphs via URIs or Entity Identifiers that reside within the serialization).
Inaccuracy #5:
RDF extends the XML model and syntax to be specified for describing either resources or a collection of information. (XML points to a resource in order to scope and uniquely identify a set of properties known as the schema.).
Correction #5:
See earlier comments.
The single accurate paragraph in this ebiz article lies right at the end and it states the following:
"I've always thought RDF has been underutilized for data integration, and it's really an old standard. Now that we're focused on both understanding and integrating data, perhaps RDF should make a comeback."
As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
See: AVF Pyramid Diagram.The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
See: RDBMS Primacy Diagram.For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
See: New EAV/CR Primacy Diagram.As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
What is our "Search" and "Find" demonstration about? It is about how you use the "Description" of "Things" to unambiguously locate things in a database at Web Scale.
To our perpetual chagrin, we are trying to demonstrate an engine -- not UI prowess -- but the immediate response is to jump to the UI aesthetics.
Google, Yahoo etc.. offer a simple input form for full text search patterns, they have a processing window for completing full text searches across Web Content indexed on their servers. Once the search patterns are processed, you get a page ranked result set (collection of Web pages basically that claim/state: we found N pages out of a document corpus of about M indexed pages).
Note: the estimate aspect of traditional search results in like "advertising small print" the user lives with the illusion that all possible documents on the Web (or even Internet) have been searched whereas in reality: 25% of the possible total is a major stretch; since the Web and Internet are fractal networks and scale-free, inherently growing at exponential rates "ad infinitum" across boundless dimensions of human comprehension.
The power of Linked Data ultimately comes down to the fact that the user constructs the path to what they seek via the properties of the "Things" in question. The routes are not hardwired since URI de-referencing (follow your nose pattern) is available to Linked Data aware query engines and crawlers.
We are simply trying to demonstrate how you can combine the best of full text search with the best of structured querying while reusing familiar interaction patterns from Google/Yahoo. Thus, you start with full text search, find get all the entities associated with the pattern, then use the entity types or entity properties to find what you seek.
You state in your post:
"To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed."
Correct.
"On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?".
Not exactly correct. We are performing aggregates using a configurable interactive time factor. Example: tell me how many entities of type: Person, with interest: Semantic Web, exist in this database within 2 seconds. Also understand that you could retry the same query and get different numbers within the same interactive time factor. It isn't your basic "query cut-off".
"And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can."
Yes, "Microsoft" was a poor example for sure, the example could have been pattern: "glenn mcdonald", which should demonstrate the fundamental utility of what we are trying to demonstrate i.e., entity disambiguation courtesy of entity properties and/or entity type filtering.
Compare Googles results for: Glenn McDonald with those from our demo (which dissambiguate "Glenn McDonald" via associated properties and/or types), assuming we both agree that your Web Site or Blog Home isn't the center of your entity graph or personal data space (i.e., data about you); so getting your home page at the top of the Google page rank offers limited value, in reality.
What are we bragging about? A little more than what you attempt to explain. Yes, we are showing that we can find stuff within a processing window, but understand the following:
I hope I've clarified what's going on with our demo? If not, pose your challenge via examples and I will respond with solutions or simply cry out loud: "no mas!".
As for your "Mac OX X Leopard" comments, I can only say this: I emphasized that this is a demo, the data is pretty old, and the input data has issues (i.e. some of the input data is bad as your example shows). The purpose of this demo is not about the text per se., it's about the size of the data corpus and faceted querying. We are going to have the entire LOD Cloud loaded into the real thing, and in addition to that our Sponger Middleware will be enabled, and then you can take issue with data quality as per your reference to "Cyndi Lauper" (btw - it takes one property filter to find information about her quickly using "dbpprop:name" after filtering for properties with text values).
Of all things, this demo had nothing to do with UI and Information presentation aesthetics. It was all about combining full text search and structured queries (sparql behind the scenes) against a huge data corpus en route to solving challenges associated with faceted browsing over large data sets. We have built a service that resides inside Virtuoso. The Service is naturally of the "Web Service" variety and can be used from any consumer / client environment that speaks HTTP (directly or indirectly).
To be continued ...
]]>Enter search pattern: Microsoft
You will get the usual result from a full text pattern search i.e., hits and text excerpts with matching patterns in boldface. This first step is akin to throwing your net out to sea while fishing.
Now you have your catch, what next? Basically, this is where traditional text search value ends since regex or xpath/xquery offer little when the structure of literal text is the key to filtering or categorization based analysis of real-world entities. Naturally, this is where the value of structured querying of linked data starts, as you seek to use entity descriptions (combination of attribute and relationship properties) to "Find relevant things".
Continuing with the demo.
Click on "Properties" link within the Navigation section of the browser page which results in a distillation and aggregation of the properties of the entities associated with the search results. Then use the "Next" link to page through the properties until to find the properties that best match what you seek. Note, this particular step is akin to using the properties of the catch (using fishing analogy) for query filtering, with each subsequent property link click narrowing your selection further.
Using property based filtering is just one perspective on the data corpus associated with the text search pattern; thus, you can alter perspectives by clicking on the "Class" link so that you can filter you search results by entity type. Of course, in a number of scenarios you would use a combination of entity types and entity properties filters to locate the entities of interest to you.
A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.
It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.
The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.
Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.
You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.
Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.
With great joy and pride, I wish Structured Dynamics all the success they deserve. Naturally, the collaborations and close relationship between OpenLink Software and its latest technology partner will continue -- especially as we collectively work towards a more comprehendible and pragmatic Web of Linked Data for developers (across Web 1.0, 2.0, 3.0, and beyond), end-users (information- and knowledge-workers), and entrepreneurs (driven by quality and tangible value contribution).
In 2009 I hope the following happens re. "Linked Data":
2009 is about a reboot on a monumental scale. We need new thinking, new technology, new approaches, and new solutions. No matter what route we take, we can't negate the importance of "Data". When dealing with organic or inorganic computers systems -- Data is simply everything!
The ability of individuals and enterprises to access, mesh, and disseminate data to relevant nodes across public and private networks will ultimately determine the winners and losers in the new frontier, ushered in by 2009.
Do not take data access and data management technology for granted. User interfaces come and ago, application logic comes and goes, but your data stays with you forever. If you are mystified by data access technology then make 2009 the year of data access technology demystification :-)
"..There is evidence that they promote LINKED DATA at any expense without understanding the rationale behind other approaches...".
To answer the question above, Linked Data is always relevant as long as we are actually talking about "Data" which is simply the case all of the time, irrespective of interaction medium.
If XBRL can be disconnected in anyway from Linked Data, I desperately would like to be enlightened (as per my comments to the post). Why wouldn't anyone desire the ability to navigate the linked data inherent in any financial report? Every entity in an XBRL instance document is an entity, directly or indirectly related to other entities. Why "Mash" the data when you can harmonize XBRL data via a Generic Financial Dictionary (schema or ontology) such that descriptions of Balance Sheet, P&L, and other entities are navigable via their attributes and relationships? In short, why "Mash" (code based brute force joining across disparately shaped data) when you can "Mesh" (natural joining of structured data entities)?
"Linked Data" is about the ability to connect all our observations (data)? , perceptions (information), and inferences / conclusions (knowledge) across a spectrum of interaction media. And it just so happens that the RDF data model (Entity-Attribute-Vaue + Class Relationships + HTTP based Object Identifiers), a range of RDF data model serialization formats, and SPARQL (Query Language and Web Service combo) actually make this possible, in a manner consistent with the essence of the global space we know as the World Wide Web.
A community developed knowledgebase comprised of Bio Informatics data from across 30 or so public data sources. The standard deployment of Bio2Rdf includes a a federation of SPARQL endpoints provided by project members and collaborators.
An Amazon EC2 hosted variant of the Bio2Rdf knowledgebase. In addition to providing a SPARQL endpoint, the data exposed by the Amazon AMI is published in compliance with Linked Data publishing best practices espoused by the Linking Open Data community (LOD).
The ability to instantiate a personal or service-specific variant of this powerful knowledgebase via the Amazon EC2 Cloud. Instead of a 22+ hour error prone odyssey - you simply get down to the task of data analysis and integration within 1.5 hrs (when setting up you AMI for the first time).
"Only one improves with age. With apologies to the originator of the phrase - “Hardware is like fish, operating systems are like wine.”
Yes! Applications are like Fish and Data like Wine, which is basically what Linked Data is fundamentally about, especially when you inject memes such as "Cool URIs" into the mix. Remember, the essence of Linked Data is all about a Web of Linked Data Objects endowed with Identifiers that don't change i.e., they occupy one place in public (e.g. World Wide Web) or private (your corporate Intranet or Extranet) networks, keeping the data that they expose relevant (as in fresh), accessible, and usable in many forms courtesy of the data access & representation dexterity that HTTP facilitates, when incorporated into object identifiers.
Here is another excerpt from his post that rings true (amongst many others):
What am I talking about? Processes change, and need to change. Baking data into the application is a bad idea because the data can’t then be extended in useful, and “unexpected ways”. But not expecting corporate data to be used in new ways is kind of like not expecting the Spanish Inquisition. But… “NOBODY expects the Spanish Inquisition! Amongst our weaponry are such diverse elements as: fear, surprise, ruthless efficiency, an almost fanatical devotion to the Pope.” (sounds like Enterprise Architecture ...).
Excerpted from the project home page:
The NeuroCommons project seeks to make all scientific research materials - research articles, annotations, data, physical materials - as available and as useable as they can be. We do this by both fostering practices that render information in a form that promotes uniform access by computational agents - sometimes called "interoperability". We want knowledge sources to combine meaningfully, enabling semantically precise queries that span multiple information sources.
In a nutshell, a great project that makes practical use of Linked Data Web technology in the areas of computational biology and neuroscience.
A pre-installed and fully tuned edition of Virtuoso that includes a fully configured Neurocommons Knowledgebase (in RDF Linked Data form) on Amazon's EC2 Cloud platform.
Generally, it provides a no-hassles mechanism for instantiating personal-, organization-, or service-specific instances of a very powerful research knowledgebase within approximately 1.15 hours compared to a lengthy rebuild from RDF source data alternative that takes 14 hours or more, depending on machine hardware configuration and host operating system resources.
A pre-installed and fully tuned edition of Virtuoso that includes a fully configured DBpedia instance on Amazon's EC2 Cloud platform.
Generally, it provides a no hassles mechanism for instantiating personal, organization, or service specific instances of DBpedia within approximately 1.5 hours as opposed to a lengthy rebuild from RDF source data that takes between 8 - 22 hours depending on machine hardware configuration and host operating system resources.
From a Web Entrepreneur perspective it offers all of the generic benefits of a Virtuoso EC2 AMI plus the following:
Here are a few live examples of DBpedia resource URIs deployed and de-referencable via one of my EC2 based personal data spaces:
A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.
From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:
From a Middleware perspective it provides:
From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering
From the general System Administrator's perspective it provides:
Higher level user oriented offerings include:
For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:
Basically this is how it works.
DBpedia replica implies:
Tomorrow is the official go live day (due to last minute price changes), but you can instantiate a paid Virtuoso AMI starting now :-)
To be continued...
]]>Here is how I see Linked Data providing tangible value to MDM tools vendors and users:
Of course Virtuoso was designed and developed to deliver the above from day one (circa. 1998 re. the core and 2005 re. the use of RDF for the final mile) as depicted below:
If we could just take "The Semantic Web" moniker for what it was -- a code name for an aspect of the Web -- and move on, things will get much clearer, fast!
Basically, what is/was the "Semantic Web" should really have been code named: ("You" Oriented Data Access) as a play on: Yoda's appreciation of the FORCE (Fact ORiented Connected Entities) -- the power of inter galactic, interlinked, structured data, fashioned by the World Wide Web courtesy of the HTTP protocol.
As stated in a earlier post, the next phase of the Web is all about the magic of entity "You". The single most important item of reference to every Web user would be the Person Entity ID (URI). Just by remembering your Entity ID, you will have intelligent pathways across, and into, the FORCE that the Linked Data Web delivers. The quality of the pathways and increased density of the FORCE are the keys to high SDQ (tomorrows SEO). Thus, the SDQ of URIs will ultimately be the unit determinant of value to Web Users, along the following personal lines, hence the critical platform questions:
While most industry commentators continue to ponder and pontificate about what "The Semantic Web" is (unfortunately), the real thing (the "FORCE") is already here, and self-enhancing rapidly.
Assuming we now accept the FORCE is simply an RDF based Linked Data moniker, and that RDF Linked Data is all about the Web as a structured database, we should start to move our attention over to practical exploitation of this burgeoning global database, and in doing so we should not discard knowledge from the past such as the many great examples available gratis from the Relational Database realm. For instance, we should start paying attention to the discovery, development, and deployment of high level tools such as query builders, report writers, and intelligence oriented analytic tools, none of which should -- at first point of interaction -- expose raw RDF or the SPARQL query language. Along similar lines of thinking, we also need development environments and frameworks that are counterparts to Visual Studio, ACCESS, File Maker, and the like.
In recent times I've stumbled across Master Data Management (MDM) which is all about entities that provide holistic views of enterprise data (or what I call: Context Lenses). I've also stumbled across emerging tensions in the .NET realm between Linq to Entities and Linq to SQL, where in either case the fundamental issues comes down to the optimal paths "Conceptual Level Access" over the "Logical Logical Level" when dealing with data access in the .NET realm.
Strangely, the emerging realm of RDF Linked Data, MDM, and .NET's Entity Frameworks, remain strangely disconnected.
Another oddity is the obvious, but barely acknowledged, blurring of the lines between the "traditional enterprise employee" and the "individual Web netizen". The fusion between these entities is one of the most defining characteristics of how the Web is reshaping the data landscape.
At the current time, I tend to crystalize my data access world view under the moniker: YODA ("You" Oriented Data Access), based on the following:
Virtuoso is an extremely compact product that is very easy to install. The ease of installation carries over to the PHP runtime when bound to Virtuoso.
]]>Here are some examples of how we distill Entities (People, Places, Music, and other things) from Freebase (X)HTML pages (meaning: we don't have to start from RDF information resources as data sources for the eventual RDF Linked Data we generate):
Tip: Install our OpenLink Data Explorer extension for Firefox. Once installed, simply browse through Freebase, and whenever you encounter a page about something of interest, simply use the following sequences to distill (via the Page Description feature) the entities from the page you are reading:
Here is a look at our offerings by product family:
As you explore the Linked Data graph exposed via our product portfolio, I expect you to experience, or at least spot, the virtuous potential of high SDQ (Serendipitous Discovery Quotient) courtesy of Linked Data, which is Web 3.0's answer to SEO. For instance, how Database, Operating System, and Processor family paths in the product portfolio graph (data network) unveil a lot more about OpenLink Software than meets the proverbial "eye" :-)
]]>Like Apache, Virtuoso is a bona-fide Web Application Server for PHP based applications. Unlike Apache, Virtuoso is also the following:
As result of the above, when you deploy a PHP application using Virtuoso, you inherit the following benefits:
As indicated in prior posts, producing RDF Linked Data from the existing Web, where a lot of content is deployed by PHP based content managers, should simply come down to RDF Views over the SQL Schemas and deployment / publishing of the RDF Views in RDF Linked data form. In a nutshell, this is what Virtuoso delivers via its PHP runtime hosting and pre packaged VADs (Virtuoso Application Distribution packages), for popular PHP based applications such as: phpBB3, Drupal, WordPress, and MediaWiki.
In addition, to the RDF Linked Data deployment, we've also taken the traditional LAMP installation tedium out of the typical PHP application deployment process. For instance, you don't have to rebuild PHP 3.5 (32 or 64 Bit) on Windows, Mac OS X, or Linux to get going, simply install Virtuoso, and then select a VAD package for the relevant application and you're set. If the application of choice isn't pre packaged by us, simply install as you would when using Apache, which comes dow to situating the PHP files in your Web structure under the Web Application's root directory.
At the current time, I've only provided links to ZIP files containing the Virtuoso installation "silent movies". This approach is a short-term solution to some of my current movie publishing challenges re. YouTube and Vimeo -- where the compressed output hasn't been of acceptable visual quality. Once resolved, I will publish much more "Multimedia Web" friendly movies :-)
]]>Typically, Orri's post are targeted at the hard core RDF and SQL DBMS audiences, but in this particular post, he shoots straight at the business community revealing "Opportunity Cost" containment as the invisible driver behind the business aspects of any market inflection.
Remember, the Web isn't ubiquitous because its users mastered the mechanics and virtues of HTML and/or HTTP. Web ubiquity is a function of the opportunity cost of not being on the Web, courtesy of the network effects of hyperlinked documents -- i.e., the instant gratification of traversing documents on the Web via a single click action. In similar fashion, the Linked Data Web's ubiquity will simply come down to the opportunity cost of not being "inside the Web", courtesy of the network effects of hyperlinked entities (documents, people, music, books, and other "Things").
Here are some excerpts from Orri's post:
Every time there is a major shift in technology, this shift needs to be motivated by addressing a new class of problem. This means doing something that could not be done before. The last time this happened was when the relational database became the dominant IT technology. At that time, the questions involved putting the enterprise in the database and building a cluster of line of business applications around the database. The argument for the RDBMS was that you did not have to constrain the set of queries that might later be made, when designing the database. In other words, it was making things more ad hoc. This was opposed then on grounds of being less efficient than the hierarchical and network databases which the relational eventually replaced. Today, the point of the Data Web is that you do not have to constrain what your data can join or integrate with, when you design your database. The counter-argument is that this is slow and geeky and not scalable. See the similarity? A difference is that we are not specifically aiming at replacing the RDBMS. In fact, if you know exactly what you will query and have a well defined workload, a relational representation optimized for the workload will give you about 10x the performance of the equivalent RDF warehouse. OLTP remains a relational-only domain. However, when we are talking about doing queries and analytics against the Web, or even against more than a handful of relational systems, the things which make RDBMS good become problematic.
If we think about Web 1.0 as a period where the distinguishing noun was: "Author", and Web 2.0 the noun: "Journalist", we should be able to see that what comes next is the noun: "Analyst". This new generation analyst would be equipped with de-referencable Web Identity courtesy of their Person Entity URI. The analyst's URI would also be the critical component of Web based low cost attribution ecosystem; one that ultimately turns the URI into the analyst's brand emblem / imprint.
You will control your data in the Web 3.0 realm. If somehow this remains somewhat incomprehensible and nebulous (as is typical in this emerging realm) then simply think about this as: The Magic of You!
Remember, "You" was the Times person of the year as an acknowledgement of the Web 2.0 phenomenon, and maybe this time next year it would simply be the "Magic of Being You" that's the person of the year :-)
Web 3.0 brings databasing to the Web (as a feature). The single most important action item at this stage is the act of creating a record for yourself, in this new distributed database held together by an HTTP based Network (e.g., the World Wide Web).
Now, if re-labeling can confuse me when applied to a realm I've been intimately involved with for eons (internet time). I don't want to imagine what it does for others who aren't that intimately involved with the important data access and data integration realms.
On the more refreshing side, the article does shed some light on the potency of RDF and OWL when applied to the construction of conceptual views of heterogeneous data sources.
"How do you know that data coming from one place calculates net revenue the same way that data coming from another place does? Youâve got people using the same term for different things and different terms for the same things. How do you reconcile all of that? Thatâs really what semantic integration is about."
BTW - I discovered this article via another titled: Understanding Integration And How It Can Help with SOA, that covers SOA and Integration matters. Again, in this piece I feel the gradual realization of the virtues that RDF, OWL, and RDF Linked Data bring to bear in the vital realm of data integration across heterogeneous data silos.
A number of events, at the micro and macro economic levels, are forcing attention back to the issue of productive use of existing IT resources. The trouble with the aforementioned quest is that it ultimately unveils the global IT affliction known as: heterogeneous data silos, and the challenges of pain alleviation, that have been ignored forever or approached inadequately as clearly shown by the rapid build up of SOA horror stories in the data integration realm.
Data Integration via conceptualization of heterogenous data sources, that result in concrete conceptual layer data access and management, remains the greatest and most potent application of technologies associated with the "Semantic Web" and/or "Linked Data" monikers.
As more Linked Data is injected into the Web from the Linking Open Data community and other initiatives, it's important to note that "Linked Data" is available in a variety of forms such as:
Note: The common glue across the different types of Linked Data remains the commitment to data object (entity) identification and access via de-referencable URIs (aka. record / entity level data source names).
As stated in my recent post titled: Semantic Web: Travails to Harmony Illustrated. Harmonious intersections of instance data, data dictionaries (schemas, ontologies, rules etc.) provide a powerful substrate (smart data) for the development and deployment of "People" and/or "Machine" oriented solutions. Of course, others have commented on these matters and expressed similar views (see related section below).
The clickable venn diagram below, provides a simple exploration path that exposes the linkage that already exists, across the different Linked Data types, within the burgeoning Linked Data Web.
Our diagram depicts the myriad of data sources from which RDF Linked Data is generated "on the fly" via our data source specific RDF-zation cartridges/drivers. It also unveils how the sponger leverages the Linked Data constellations of UMBEL, DBpedia, Bio2Rdf, and others for lookups.
]]>If the RDF generated, results in an entity-to-entity level network (graph) in which each entity is endowed with a de-referencable HTTP based ID (a URI), we end up with an enhancement to the Web that adds Hyperdata linking across extracted entities, to the existing Hypertext based Web of linked documents (pages, images, and other information resource types). Thus, I can use the same URL linking mechanism to reference a broader range of "Things" i.e., documents, things that documents are about, or things loosely associated with documents.
The Virtuoso Sponger is an example of an RDF Middleware solution from OpenLink Software. It's an in-built component of the Virtuoso Universal Server, and deployable in many forms e.g., Software as Service (SaaS) or traditional software installation. It delivers RDF-ization services via a collection of Web information resource specific Cartridges/Providers/Drivers covering Wikipedia, Freebase, CrunchBase, WikiCompany, OpenLibrary, Digg, eBay, Amazon, RSS/Atom/OPML feed sources, XBRL, and many more.
RDF-ization alone doesn't ensure valuable RDF based Linked Data on the Web. The process of producing RDF Linked Data is ultimately about the art of effectively describing resources with an eye for context.
The animation that follows illustrates the process (5,000 feet view), from grabbing resources via HTTP GET, to injecting RDF Linked Data back into the Web cloud:
Note: the Shredder is a Generic Cartridge, so you would have one of these per data source type (information resource type).
]]>From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).
Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)
Over emphasis on Description Logics (RDFS, OWL, Inference & Reasoning etc) matters without any actual real-world instance data (e.g., lot's of reasoning over RDF in zip files or local drives).
Over emphasis on Instance Data without Data Dictionary appreciation and utilization (e.g., Linked Data instance level linkage via "owl:sameAs").
Here we are dealing with numerous applications and frameworks that inextricably bind Instance Data Management and Data Dictionaries. Basically, an all or nothing proposition, if you want to delve into the RDF Linked Data solutions realm.
Often overlooked, is the fact that the Linked Data Web - as an aspect of the Semantic Web innovation continuum - is fundamentally about designing and constructing an "Open World" compatible DBMS for the Internet. Thus, erstwhile "Closed World" DBMS components such as Data Dictionaries (handlers of Data Definition, Referential Integrity etc.) and actual Instance Data, are now distributed and loosely coupled. Thus, your data could be in one Data Space while the data dictionary resides in another. In actual fact, you could have several loosely bound data dictionaries that serve the specific Inference and Reasoning needs of a variety of applications, services, or agents.
]]>Understanding potential Linked Data Web business models, relative to other Web based market segments, is best pursued via a BCG Matrix diagram, such as the one I've constructed below:
To conclude, the Linked Data Web's market opportunities are all about the evolution of the Web into a powerful substrate that offers a unique intersection of "Link Density" and "Relevance", exploitable across horizontal and vertical market segments to solutions providers. Put differently, SDQ is how you take "The Ad" out of "Advertising" when matching Web users to relevant things :-)
]]>In typical style, Henry walks you through his point of view using simple but powerful illustrations. Here is a key statement in his post that really struck me:
"In order to be able to have a mental theory one needs to be able to understand that other people may have a different view of the world. On a narrow three dimensional understanding of 'view', this reveals itself in that people at different locations in a room will see different things. One person may be able to see a cat behind a tree that will be hidden to another. In some sense though these two views can easily be merged into a coherent description."
Opaque Web pages (e.g., generated by Semantic Technology inside offerings that will not expose or share data entity URIs), irrespective of how smart the underlying page generation and visualization technology may be, a fundamentally autistic and counter intuitive as we move toward a Web of Linked Data.
Preoccupation with the "V" aspect of the M-V-C trinity is inadvertently compounding and the problem of digital autism on the Web. Unbeknownst to the purveyors of data silos and proprietary service lock-in, digital autism on the Web ultimately implies Web business model autism.
]]>"Artificial intelligence is supposed to let machines do things for people. The risk is that we may rely too much on them. Two months ago, for instance, writer Nicolas Carr asked whether Google is making us stupid. In my recent blog series "The Age of Google," I extended Carr’s discussion. Due to the success of Google, we are relying more on objective search than on active thinking to answer questions. In consequence, the more Google has advanced its service, the farther Google users have drifted from active thinking."
"But at least one form of human thinking cannot be replaced by machines. I am not talking about inference/discovery (which machines may be capable of doing) but about creation/generation-from-nothing (which I don’t believe machines may ever do)."
I tend to describe our ability to create/generate-from-nothing as "Zero-based Cognition", which is initially about "thought" and the eventually about "speed of thought dissemination" and "global thought meshing".
In a peculiar sense, Zero-based cognition is analogous to Zero-based budgeting from the accounting realm :-)
]]>If your Web presence goes beyond (X)HTML pages, via the addition of REST or SOAP based Web Services, then you re participating in Web usage dimension 2.0.
If you Web presence includes all of the above, with the addition of structured data interlinked with structured data across other points of presence on the Web, then you are participating in Web usage dimension 3.0 i.e., "Linked Data Web" or "Web of Data" or "Data Web".
BTW - If you've already done all of the above, and you have started building intelligent agents that exploit the aforementioned structured interlinked data substrate, then you are already in Web usage dimension 4.0.
A while back I watched Kevin Kelly's 5,000 days presentation at TED. During the presentation, I kept on scratching my head, wondering why phrases like "Linked Data", "Semantic Web", "Web of Data", "Data Web" where so unnaturally disconnected from his session narrative.
Yesterday I watched IMINDI's TechCrunch 50 presentation, and once again I saw the aforementioned pattern repeat itself. This time around, the poor founders of this "Linked Data Web" oriented company (which is what they are in reality) took a totally undeserved pasting from a bunch of panelist incapable of seeing beyond today (Web 2.0) and yesterday (initial Web bootstrap).
Anyway, thanks to the Web, this post will make a small contribution towards re-connecting the missing phrases to these "Linked Data Web" presentations.
]]>Courtesy of Linked Data, we are now able to extend the "document to document" linking mechanism of the Web (Hypertext Linking) to more granular "entity to entity" level linking. And in doing so, we have a layer of abstraction that in one swoop alleviates all of the infrastructure oriented data access impediments of yore. I know this sounds simplistic, but be rest assured, imbibing Linked Data's value proposition is really just that simple, once you engage solutions (e.g. Virtuoso) that enable you to deploy Linked Data across your enterprise.
Microsoft ACCESS, SQL Server, and Virtuoso all use the Northwind SQL DB Schema as the basis of the demonstration database shipped with each DBMS product. This schema is comprised of common IS/MIS entities that include: Customers, Contacts, Orders, Products, Employees etc.
What we all really want to do as data, information, and knowledge consumers and/or dispatchers, is be no more than a single "mouse click" away from relevant data/information/knowledge data access and/or exploration. Even better (but not always so obvious), we also want anyone in our network (company, division, department, cube-cluster) to inherit these data access efficiencies.
In this example, the Web Page about the Customer "ALKI" provides me with a myriad of exploration and data access paths e.g., when I click on the foaf:primarytopic property value link.
This simple example, via a single Web Page, should put to rest any doubts about the utility of Linked Data. Of course this is an old demo, but this time around the UI is minimalist as my prior attempts skipped a few steps i.e., starting from within a Linked Data explorer/browser.
Important note: I haven't exported SQL into an RDF data warehouse, I am converting the SQL into RDF Linked Data on the fly which has two fundamental benefits:
Enjoy!
Ubiquity from Mozilla Labs, provides an alternative entry point for experiencing the "Controller" aspect of the Web's natural compatibility with the MVC development pattern. As I've noted (in various posts) Web Services, as practiced by the REST oriented Web 2.0 community or SOAP oriented SOA community within the enterprise, is fundamentally about the ("Controller" aspect of MVC.
Ubiquity provides a commandline interface for direct invocation of Web Services. For instance, in our case, we can expose the Virtuoso's in-built RDF Middleware ("Sponger") and Linked Data deployment services via a single command of the form: describe-resource <url>
To experience this neat addition to Firefox you need to do the following:
Enjoy!
]]>As per usual, this is part post and part Linked Data demo. This time around, I am showcasing Proxy/Wrapper based dereferencable URIs and a new "Page Description" feature that showcases the capabilities of Virtuoso's in-built RDFization Middleware. Also note, the resource descriptions (RDF) are presented using an HTML page.
]]>What strikes me the most, is how sharing his findings act as serendipitous connectors to related insights and points of view, that ultimately create deeper shared knowledge about the core subject matter, courtesy of the Web hosted Blogosphere.
Note: You can substitute my examples using any Web resource URL. The underlying RDFization and Linked Data deployment functionality of the Virtuoso demo instance takes care of everything else. Also note that the HTML based resource description page capability is now deployed as part of the Virtuoso Sponger component of every Virtuoso installation starting with from version 5.0.8.
]]>Jana: What are the benefits you see to the business community in adopting semantic technology?
Me: Exposure, exploitation, of untapped treasure trove of interlinked data, information, and knowledge across disparate IT infrastructure via conceptual entry points (Entity IDs / URIs / Data Source Names) that refer to as "Context Lenses".
Jana: Do you think these benefits are great enough for businesses to adopt the changes?
Me: Yes, infrastructural heterogeneity is a fact of corporate life (growth, mergers, acquisitions etc). Any technology that addresses these challenges is extremely important and valuable. Put differently, the opportunity costs associated with IT infrastructural heterogeneity remains high!
Jana: How large do you think this impact will actually be?
Me: Huge, enterprise have been aware of their data, information, and knowledge treasure troves etc. for eons. Tapping into these via a materialization of the "information at your fingertips" vision is something they've simply been waiting to pursue without any platform lock-in, for as long as I've been in this industry.
Jana: Iâve heard, from contacts in the Bay Area, that they are skeptical of how large this impact of semantic technology will actually be on the web itself, but that the best uses of the technology are for fields such as medical information, or as you mentioned, geo-spatial data.
Me: Unfortunately, those people aren't connecting the Semantic Web and open access to heterogeneous data sources, or the intrinsic value of holistic exploration location of entity based data networks (aka Linked Data).
Jana: Are semantic technologies going to be part of the web because of people championing the cause or because it is actually a necessary step?
Me: Linked Data technology on the Web is a vital extension of the current Web. Semantic Technology without the "Web" component, or what I refer to as "Semantics Inside only" solutions, simply offer little or no value as Web enhancements based on their incongruence with the essence of the Web i.e., "Open Linkage" and no Silos! A nice looking Silo is still a Silo.
Jana: In the early days of the web, there was an explosion of new websites, due to the ease of learning HTML, from a business to a person to some crackpot talking about aliens. Even today, CSS and XHTML are not so difficult to learn that a determined person canât learn them from W3C or other tutorials easily. If OWL becomes the norm for websites, what do you think the effects will be on the web? Do you think it is easy enough to learn that it will be readily adopted as part of the standard toolkit for web developers for businesses?
Me: Correction, learning HTML had nothing to do with the Web's success. The value proposition of the Web simply reached critical mass and you simply couldn't afford to not be part of it. The easiest route to joining the Web juggernaut was a Web Page hosted on a Web Site. The question right now is: what's the equivalent driver for the Linked Data Web bearing in mind the initial Web bootstrap. My answer is simply this: Open Data Access i.e., getting beyond the data silos that have inadvertently emerged from Web 2.0.
Jana: Following the same theme, do you think this will lead to an internet full of corporate-controlled websites, with sites only written by developers rather than individuals?
Me: Not at all, we will have an Internet owned by it's participants i.e., You and the agents that work on your behalf.
Jana: So, you are imagining technologies such as Drupal or Wordpress, that allow users to manage sites without a great deal of knowledge of the nuts and bolts of current web technologies?
Me: Not at all! I envisage simple forms that provide conduits to powerful meshes of interlinked data spaces associated with Web users.
Jana: Given all of the buzz, and my own familiarity with ontology, I am just very curious if the semantic web is truly necessary?
Me:This question is no different than saying: I hear the Web is becoming a Database, and I wonder if a Data Dictionary is necessary, or even if access to structured data is necessary. It's also akin to saying: I accept "Search" as my only mechanism for Web interaction even though in reality, I really want to be able to "Find" and "Process" relevant things at a quicker rate than I do today, relative to the amount of information, and information processing time, at my disposal.
Jana: Will it be worth it to most people to go away from the web in its current form, with keyword searches on sites like Google, to a richer and more interconnected internet with potentially better search technology?
Me: As stated above, we need to add "Find" to the portfolio of functions we seek to perform against the Web. "Finding" and "Searching" are mutually inclusive pursuits at different ends of an activity spectrum.
Jana: For our more technical readers, I have a few additional questions: If no standardization comes about for mapping relational databases to domain ontologies, how do you see that as influencing the decisions about adoption of semantic technology by businesses? After all, the success of technology often lives or dies on its ease of adoption.
Me: Standardization of RDBMS to RDF Mapping is not the critical success factor here (of course it would be nice). As stated earlier, the issue of data integration that arises from IT infrastructural heterogeneity has been with decision makers in the enterprise for ever. The problem is now seeping into the broader consumer realm via Web ubiquity. The mistakes made in the enterprise realm are now playing out in the consumer Web realm. In both realms the critical success factors are:
Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk's coordinator, explains, "The basic problem that we all face nowadays is how to handle vast amounts of information at a sensible rate." According to Bernardi, Nepomuk takes a traditional approach by creating a meta-data layer with well-defined elements that services can be built upon to create and manipulate the information.
The comment above echoes my sentiments about the imminence of "information overload" due to the vast amounts of user generated content on the Internet as a whole. We are going to need to process more an more data within a fixed 24 hour timeframe, while attempting to balance our professional and personal lives. Be rest assured, this is a very serious issue, and you cannot event begin to address it without a Web of Linked Data.
"The first idea of building the semantic desktop arose from the fact that one of our colleagues could not remember the girlfriends of his friends," Bernard says, more than half-seriously. "Because they kept changing -- you know how it is. The point is, you have a vast amount of information on your desktop, hidden in files, hidden in emails, hidden in the names and structures of your folders. Nepomuk gives a standard way to handle such information."
If you get a personal URI for Entity "You", via a Linked Data aware platform (e.g. OpenLink Data Spaces) that virtualizes data across your existing Web data spaces (blogs, feed subscriptions, wikis, shared bookmarks, photo galleries, calendars, etc.), you then only have to remember your URI whenever you need to "Find" something, imagine that!
To conclude, "information overload" is the imminent challenge of our time, and the keys to challenge alleviation lie in our ability to construct and maintain (via solutions) few context lenses (URIs) that provide coherent conduits into the dense mesh of structured Linked Data on the Web.
]]>CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).
CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.
CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.
CrunchBase: Can you put into laymanâs terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.
CrunchBase: On your website you wrote that âRDF and SPARQL as productivity boosters in everyday web developmentâ. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).
Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.
Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:
]]>The statement above resonates with a lot of my fundamental views about the essence of Web. It also drives right at the core of what we are trying to address with the OpenLink Data Explorer (ODE) which simply isn't about Linked Data visualization, but the combination of visualization, user interaction, and unobtrusive exposure and exploitation of Linked Data Entities culled from the existing Web of Linked Documents. ODE consumes and processes URIs or URLs. Thus, as long as the (X)HTML container / host document keeps URIs or URLs in "agent view", ODE will give you the option to interact with the-data-behind Web information resources (e.g., Web Pages, Images, Audio etc..)
Do remember, "mission-critical" is no longer a corporate / enterprise theme. The lines of demarcation between the individual and enterprise are blurring at warp speed.
]]>The big deal about LINQ has been the singular focus on addressing point 1, in particular.
I've already written about the Linq2Rdf effort that meshes the best of .NET with the virtues of the "Linked Data Web".
Here is an architecture diagram that seeks to illustrate the powerful data access and manipulation options that the combination of Linq2RDF and Linked Data deliver:
What may not have been obvious to most in the past, is the fact that Mapping from Object Models to Relational Models wasn't really the solution to the problem at hand. Instead, the mapping should have been the other way around i.e., Relational to Object Model mapping. The emergence of RDF and RDBMS to RDF mapping technology is what makes this age-old headache addressable in very novel ways.
Key points:
SPASQL (SPARQL extension for SQL) enables the intelligent resource representation request handling and URI dereferencing, that underlies "Linked Data" (i.e., Hyperdata Linking) to occur in-process.
My contribution to the developing discourse takes the form of a Q&A session. I've taken the questions posed and provided answers that express my particular points of view:
Q: Is the desktop of the future going to just be a web-hosted version of the same old-fashioned desktop metaphors we have today?
A: No, it's going to be a more Web Architecture aware and compliant variant exposed by appropriate metaphors.
Q: The desktop of the future is going to be a hosted web service
A: A vessel for exploiting the virtues of the Linked Data Web.
Q: The Browser is Going to Swallow Up the Desktop
A: Literally, of course not! Metaphorically, of course! And then the Browser metaphor will decomposes into function specific bits of Web interaction amenable to orchestration by its users.
Q: The focus of the desktop will shift from information to attention
A: No! Knowledge, Information, and Data sharing courtesy of Hyperdata & Hypertext Linking.
Q: Users are going to shift from acting as librarians to acting as daytraders
A: They were Librarians at Web 1.0, Journalist at Web 2.0, and Analysts in Web 3.0 (i.e, analyze structured and interlinked data), and CEOs in Web 4.0 (i.e. get Agents to do stuff intelligently en route to making decisions).
Q: The Webtop will be more social and will leverage and integrate collective intelligence
A: The Linked Data Web vessel will only require you to fill in your profile (once) and then serendipitous discovery and meshing of relevant data will simply happen (the serendipity quotient will grow in line with Linked Data Web density).
Q: The desktop of the future is going to have powerful semantic search and social search capabilities built-in
A: It is going to be able to "Find" rather than "Search" for stuff courtesy of the Linked Data Web.
Q: Interactive shared spaces will replace folders
A: Data Spaces and their URIs (Data Source Names) replace everything. You simply choose the exploration metaphor that best suits you space interaction needs.
Q: The Portable Desktop
A: Ubiquitous Desktop i.e. do the same thing (all answers above) on any device connected to the Web.
Q: The Smart Desktop
A: Vessels with access to Smart Data (Linked Data + Action driven Context sprinklings).
Q: Federated, open policies and permissions
A: More federation for sure, XMPP will become a lot more important, and OAuth will enable resurgence of the federated aspects of the Web and Internet.
Q: The personal cloud
A: Personal Data Spaces plugged into Clouds (Intranet, Extranet, Internet).
Q: The WebOS
A: An operating system endowed with traditional Database and Host Operating system functionality such as: RDF Data Model, SPARQL Query Language, URI based Pointer mechanism, and HTTP based message Bus.
Q: Who is most likely to own the future desktop?
A: You! And all you need is a URI (an ID or Data Source Name for "Entity You") and a Profile Page (a place where "Entity You" is Describe by You).
You can get a feel for the future desktop by downloading and then installing the OpenLink Data Explorer plugin for Firefox, which allows you to switch viewing modes between Web Page and Linked Data behind the page. :-)
By coincidence, Glenn and I presented at this month's Cambridge Semantic Web Gathering.
I've provided a dump of Glenn's issues and my responses below:
RDF is a Graph based Data Model it stands for Resource Description Framework. The Metadata data angle comes from it's Meta Content Framework (MCF) origins. You can express and serialize data based on the RDF Data Model using: Turtle, N3, TriX, N-Triples, and RDF/XML.
These are just appeasement:
- old query paradigm: fishing in dark water with superstitiously tied lures; only works well in carefully stocked lakes
- we don't ask questions by defining answer shapes and then hoping they're dredged up whole.
SPARQL, MQL, and Entity-SQL are Graph Model oriented Query Languages. Query Languages always accompany Database Engines. SQL is the Relational Model equivalent.
Noble attempt to ground the abstract, but:
- URI dereferencing/namespace/open-world issues focus too much technical attention on cross-source cases where the human issues dwarf the technical ones anyway
- FOAF query over the people in this room? forget it.
- link asymmetry doesn't scale
- identity doesn't scale
- generating RDF from non-graph sources: more appeasement, right where the win from actually converting could be biggest!
Innovative use of HTTP to deliver "Data Access by Reference" to the Linked Data Web.
When you have a Data Model, Database Engine, and Query Language, the next thing you need is a Data Access mechanism that provides "Data Access by Reference". ODBC and JDBC (amongst others) provide "Data Access by Reference" via Data Source Names. Linked Data is about the same thing (URIs are Data Source Names) with the following differences:
Hugely motivating and powerful idea, worthy of a superhero (Graphius!), but:
- giant and global parts are too hard, and starting global makes every problem harder
- local projects become unmanageable in global context (Cyc, Freebase data-modeling lists...).
And my thus my plea, again. Forget "semantic" and "web", let's fix the database tech first:
- node/arc data-model, path-based exploratory query-model
- data-graph applications built easily on top of this common model; building them has to be easy, because if it's hard, they'll be bad
- given good database tech, good web data-publishing tech will be trivial!
- given good tools for graphs, the problems of uniting them will be only as hard as they have to be.
Giant Global Graph is just another moniker for a "Web of Linked Data" or "Linked Data Web".
Multi-Model Database technology that meshes the best of the Graph & Relational Models exist. In a nutshell, this is what Virtuoso is all about and it's existed for a very long time :-)
Virtuoso is also a Virtual DBMS engine (so you can see Heterogeneous Relational Data via Graph Model Context Lenses). Naturally, it is also a Linked Data Deployment platform (or Linked Data Sever).
The issue isn't the "Semantic Web" moniker per se., it's about how Linked Data (foundation layer of Semantic Web) gets introduced to users. As I said during the MIT Gathering: "The Web is experienced via Web Browsers primarily, so any enhancement to the Web must be exposed via traditional Web Browsers", which is why we've opted to simply add "View Linked Data Sources" to the existing set of common Browser options that includes:
By exposing the Linked Data Web option as described above, you enable the Web user to knowingly transition from the traditional Rendered (X)HTML page view to the Linked Data View (i.e., structured data behind the page). This simple "User Interaction" tweak makes the notion of exploiting a Structured Web becomes somewhat clearer.
The Linked Data Web isn't a panacea. It's just an addition to the existing Web that enrichens the things you can do with the Web. It's predominance, like any application feature, will be subject to the degrees to which it delivers tangible value or matrializes internal and external opportunity costs.
Note: The Web isn't ubiquitous today becuase all it's users groked HTML Markup. It's ubquitity is a function of opportunity costs: there simply came a point in the Web boostrap when nobody could afford the opportunity costs associated with being off the Web. The same thing will play out with Linked Data and the broader Semantic Web vision.
Links:The LinqToRdf project is about binding LINQ to RDF. It sits atop Joshua Tauberer's C# based Semantic Web/RDF library which has been out there for a while and works across Microsoft .NET and it's open source variant "Mono".
Historically, the Semantic Web realm has been dominated by RDF frameworks such as Sesame, Jena and Redland; which by their Open Source orientation, predominantly favor non-Windows platforms (Java and Linux). Conversely, Microsoft's .NET frameworks have sought to offer Conceptualization technology for heterogeneous Logical Data Sources via .NET's Entity Frameworks and ADO.NET, but without any actual bindings to RDF.
Interestingly, believe it or not, .NET already has a data query language that shares a number of similarities with SPARQL, called Entity-SQL, and a very innovative programming language called LINQ; that offers a blend of constructs for natural data access and manipulation across relational (SQL), hierarchical (XML), and graph (Object) models without the traditional object language->database impedance tensions of the past.
With regards to all of the above, we've just released a mini white paper that covers the exploitation of RDF-based Linked Data using .NET via LINQ. The paper offers a an overview of LinqToRdf, plus enhancements we've contributed to the project (available in LinqToRdf v0.8.). The paper includes real-world examples that tap into a MusicBrainz powered Linked Data Space, the Music Ontology, the Virtuoso RDF Quad Store, Virtuoso Sponger Middleware, and our RDfization Cartridges for Musicbrainz.
Enjoy!]]>Enjoy!
]]>The Web Universal Plug and Play (WUPnP) Cheatsheet:
Essentially, if you build an application and use the technologies suggested in the ‘glue section’ then your web application/service (whether it’s front-end or back-end) will fit into many many other web applications/services… and therefore also more manageable for the future! This is WUPnP.
Key technologies for making your services/applications as sticky as possible:
Web-based plug and play fun!
"(Via Daniel Lewis.)
]]>Naturally, we've decided to join the Crunchbase RDFization party, and have just completed a Virtuoso Sponger Cartridge (an RDFizer) for Crouncbase. What we add in our particular cartridge is additional meshing with DBpedia and Wikicompany Linked Data Spaces, plus RDFizaton of the Crunchbase (X)HTML pages :-)
As I've postulated for a while, Linked Data is about data "Meshing" and "Meshups". This isn't a buzzword play. I am pointing out an important distinction between "Mashups" and "Meshpus". Which goes as follows: "Mashups" are about code level joining devoid of structured modelling, hence the revelation of code as opposed to data when you look behind a "Mashup". "Meshups" on the other hand, are about joining disparate structured data sources across the Web. And when you look behind a "Meshup" you see structured data (preferably Linked Data) that enables further "Meshing".
I truly believe that we are now inches away from critical mass re. Linked Data, and because we are dealing with data, the network-effect will be sky-high! I shudder to think about the state of the Linked Data Web in 12 months time. Yes, I am giving the explosion 12 months (or less). These are very exciting times.
Demo Links:
For best experience I encourage you to look at the OpenLink Data Explorer extension for Firefox (2.x - 3.x). This enables you to go to Crunchbase (X)HTML pages (and other sites on the Web of course), and then simply use the "View | Linked Data Sources" main or context menu sequence to unveil the Linked Data Sources associated with any Web Page.
Of course there is much more to come!
]]>My use of "old media" implies: a place that still seeks subscriber data (no OpenID etc..), for the umpteenth time, as the toll fee for discourse development and participation on the Web.
Anyway, here is what I attempted to post as a comment to Dan Grigorovici's post titled: Where is the Semantic Web Killer App?
Dan,
An intriguing post to say the least :-)
"Linked Data" and "Semantic Web" aren't synonymous, they are simply connected, infrastructure DNA-wise. You can have "Semantic Web" style graphs (i.e RDF Data) and not have "Linked Data" as per Linked Data deployment tenets and best practices, a very important point.
I've stated repeatedly, the "Linked Data" emphasis has more to do with focusing on a point of crystallization within the larger "Semantic Web" vision, so here is a quick recap:
A term coined by TimBL that describes an application of HTTP to the time-tested process of "Data Access by Reference". "Linked Data" adds vital items to the "Data Access by Reference" pattern that have been erstwhile unattainable:
So we have HTTP based URIs as the Data Sources Names for a "Linked Data Web" i.e a Web of inter-connected Data Source Names that de-emphasize the importance of their host containers (Compound Documents / Information Resources).
The business case or value proposition of "Linked Data" is synonymous with the value proposition of data access technologies such as ODBC, JDBC. ADO.NET, OLE-DB, XMLA, and others (enterprise or consumer) in relation to the Individual and Enterprise pursuit of agility; in a realm where data is growing exponentially, and the maximum processing time in a single day remains 24 hrs. Data Access & Data Integration are timeless challenges due to the following constants:
Note: The line between the Enterprise & Individuals continue to blur by the second, this is something I covered during my Linked Data Planet keynote, which is like most things I put on the Web (via this blog data space), is a live and practical demonstration of the virtues of Linked Data courtesy of RDFa, the Bibliographic Ontology, and dereferencable URIs (i.e. HTTP based Data Source Names for Documents and the Entities they host).
I would tweak of the law modification expressed in Mike Bergman's post which states:
the value of a Linked Data network is proportional to the square of the number of links between the data objects.By simply injecting "Context" which is what a high fidelity linked data mesh facilitates i.e. a mesh of weighted links endowed with specifically typed links (as opposed to a single ambiguous type unspecific link), you end up with an even more insight into the power of a Linked Data Web.
How about Einstein's famous equaton: E=mc2? I am talking Energy (vitality) and Mass equivalence, where "E" is for Energy, "m" for Network Mesh base Mass ( where each entity network node contains sub-particles that are themselves dense network meshes all endowed with typed links and weightings), and "c" is for computer processing speed (processing speed is growing exponentially!). When you beam queries down a context rich mesh (a giant global graph comprised of named and dereferencable data sources), especially a mesh to which we are all connected, what do you get? Infrastructure for generating an unbelievable amount of intellectual energy (the result of exploding the sub-data-graphs within graph nodes) that is much better equipped to handle current and future challenges. Even better, we end up making constructive use of Einstein's findings (remember, we built a bomb the first time around!). TimBL articulates this fundamental value of the Web in slightly different language, but at the core, this is the essence of the Web as I believe he envisioned; the ability to connect us all in such a way that we exploit our collective manpower and knowledge constructively and unobtrusively, en route to making the world a much better place :-)
Note: None of this in incongruent with being compensated (i.e. making money) for contributing tangible value into, or around, the Mesh we know as the Web :-)
Enjoy!
]]>URIs are simple to use i.e you simply click on them via a user agents UI. However, URLs when incorporated into Data Source Naming en route to constructing HTTP based Identifiers, that deliver HTTP based pointers to the location / address of a Resource Descriptions, another matter.
I touched on this issue in my Linked Data Planet keynote last week, and I must say, it did set off a light.
I believe, we can only get the broader Web community to comprehend the utility of URIs (Web Data Source Names) by exposing said utility via the Web's Universal Client (Web Browser). For instance, how do URN based Identity / Naming schemes help in a world dominated by Web Browsers that only grok "http://"? From my vantage point, the practical solution is for data providers who already have "doi", "lsid" and other Handle based Identifiers in place, to embark upon http-to-native-naming-scheme-proxying.
In my usual "dog-fooding" and "practice what you preach" fashion, this is exactly what we do in the new Linked Data Web extension that we've decided to reveal to the public (albeit late beta). Thus, when you use an existing browser to view pages with "lsid" or "doi" URNs, you still enjoy the utility of getting at the "Raw Linked Data Sources" that these names expose.
]]>The keynote: Creating, Deploying, and Exploiting Linked Data, sought to achieve the fundamental goal of: Demystify the concept of "Linked Data" using anecdotal material that resonates with enterprise decision makers.
To my pleasure, 90% of the audience members confirmed familiarization with the "Data Source Name" concept of Open Database Connectivity (ODBC). Thus, all I had to do was map "Linked Data" to ODBC, and then unveil the fundamental add-ons that "Linked Data" delivers:
I believe a majority of attendees came to realize that the combination above injects a new Web interaction dynamic: access to "Subject matter Concepts" and Named Entities contained within a page via HTTP base Data Source Names (URIs).
BTW - My presentation is a Linked Data Space in it's own right courtesy of the Bibliographic Ontology (which provides slide show modeling) and RDFa that allows me to embed annotations into my Slidy based presentation :-)
Anyway, thanks to the Blogosphere, I can attempt to fix this problem myself -- via this post :-)
Q. If you wanted to provide a bewildered but still curious novice a public example of Linked Data at work in their everyday life, what would it be?
Kingsley Idehen: Any one of the following:
My Linking Open Data community Profile Page - the Linked Data integration is exposed via the "Explore Data" Tab My Linked Data Space - viewed via OpenLink's AJAR (Asynchronous Javascript and RDF) based Linked Data Brower My Events Calendar Tag Cloud - a Linked Data view of my Calendar Space using an RDF-aware browser In all cases, you have the ability to explore my data spaces by simply clicking on the links, which on the surface appear to be standard hypertext links, although in reality you are dealing with hyperdata links (i.e., links to entities that result in the generation of entity description pages that expose entity properties via hyperdata links). Thus, you have a single page that describes me in a very rich way since it encompasses all data associated with me, covering: personal profile, blog posts, bookmarks, tag clouds, social networks etc.
Q. What would you show the CEO or CTO of a company outside the tech industry?
Kingsley Idehen: A link to the Entity ALFKI, from the popular Northwind Database associated with Microsoft Access and SQL Server database installations. This particular link exposes a typical enterprise data space (orders, customers, employees, suppliers ...) in a single page. The hyperdata links represent intricate data relationships common to most business systems that will ultimately seek to repurpose existing legacy data sources and SOA services as Linked Data. Alternatively, I would show the same links via the Zitgist Data Viewer (another Linked Data-aware browser). In both cases, I am exploiting direct access to entities via HTTP due to the protocols incorporation into the Data Source Naming scheme.
]]>First up, the Library of Congress, take a look at the following pages which are "Human" and machine based "User Agent" friendly:
Key point: The pages above are served up in line with Linked Data deployment and publishing tenets espoused by the Linking Open Data Community (LOD) which include (in my preferred terminology):
The items above are features that users and decision makers should start to hone into when seeking, and evaluating, platforms that facilitate cost-effective exploitation of the Linked Data Web.
]]>As you can see from my recent post about how we've started the process of inoculating DBpedia against the potential dangers of "contextual incoherence", we are entering a newer era in the Semantic Web's evolution. My post and the one from Clark & Parsia both touch different aspects of the "Data Dictionary" for the Semantic Web issue.
Note: in my universe of discourse, a Data Dictionary manifests when the constraints and class hierarchies defined in an ontology (e.g. a web accessible shared ontology) are functionally bound to a data manager. Interestingly the binding can take the following forms:
The classification terminology I use above is very much off-the-cuff, its sole purpose is architectural distinction.
Anyway, it's really nice to see that we are entering an era re. the Semantic Web vision, where the virtues of reasoning are getting simpler to demonstrate and articulate.
In a nutshell, the point-point data integration era is coming to an end! The era of intelligent ontology based enterprise data integration is nigh!
Of course, there is much more to come on the practical utility front, so stay tuned as we work our way through the DBpedia inoculation program.
]]>When the DBpedia & Yago integration took place last year (around WWW2007, Banff) there was a little, but costly omission that occurred: nobody sought to load the Yago Class Hierarchy into the Virtuoso's Inference Engine :-(
Anyway, the Class Hierarchy has now been loaded into the Virtuoso's inference engine (as Virtuoso Inference Rules) and the following queries are now feasible using the live Virtuoso based DBpedia instance hosted by OpenLink Software:
-- Find all Fiction Books associated with a property "dbpedia:name" that has literal value: Â "The Lord of the Rings" .
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
-- Variant of query with Virtuoso's Full Text Index extension via the bif:contains function/magic predicate
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
SELECT DISTINCT ?s ?n
FROM < xmlns="http" dbpedia.org="dbpedia.org">//dbpedia.org>
WHERE {
?s a yago:Fiction106367107 .
?s dbpedia:name ?n .
?n bif:contains 'Lord and Rings'
}
-- Retrieve all individuals instances of Fiction Class which should include all Books.
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
Note: you can also move the inference pragmas to the Virtuoso Sever side i.e place the inference rules in a server instance config file, thereby negating the need to place "define input:inference 'http://dbpedia.org/resource/inference/rules/yago#'" pragmas directly in your SPARQL queries.
1995 (and the early 90’s) must have been a visionaries time of dreaming… most of their dreams are happening today.
Watch Steve Jobs (then of NeXT) discuss what he thinks will be popular in 1996 and beyond at OpenStep Days 1995:
Heres a spoiler:
The thing that OpenStep propose is:
What Steve was suggesting was one of the beginnings of the Data Web! Yep, Portable Distributed Objects and Enterprise Objects Framework was one of the influences of the Semantic Web / Linked Data Web…. not surprising as Tim Berners-Lee designed the initial web stack on a NeXT computer!
I’m going to spend a little time this evening figuring out how much ‘distributed objects’ stuff has been taken from the OpenStep stuff into the Objective-C + Cocoa environment. (<- I guess I must be quite geeky ;-))
"(Via Daniel Lewis.)
]]>Anyway, enjoy!
BTW - The only reason why the Semantic Web is perceived as complex relative to the original Document Web is simple this: The Semantic Web was designed in public view by the W3C and many collaborators, whereas the Document Web simply came into public view and consciousness as a somewhat finished solution.
]]>Ivan's presentation titled: State of the Semantic Web, is a must view for those who need a quick update on where things are re. the Semantic Web in general.
I also liked the fact that in proper "Lead by example" manner, his presentation isn't PDF or PPT based, it's a Web Document :-)
Hint: as per usual, this post contains a Linked Data demo nugget. This time around, it's in the form of a shared calendar covering a large number of Semantic Web Technology events. All I had to do was subscribe to a number of WebDAV accessible iCal files from my Calendar Data Space and the platform did the rest i.e. produce Linked Data Objects for events associated with a plethora of conferences.
If you assimilate Ivan's presentation properly, you will note I've just generated, and shared, a large number of URIs covering a range of conference events. Thus, you can extend my contributions (thereby enriching the GGG) by simply associating additional data from your Linked Data Space with mine. All you have to do is use my calendar data objects URIs in your statements.
]]>When tagging a document, the semantic tagging service passes the content of a target document through a processing pipeline (a distillation process of sorts) that results in automagic extraction of the following:
Once the extraction phase is completed, a user is presented with a list of "suggested tags" using a variety of user interaction techniques. The literal values of elected Tags are then associated with one or more Tag and Tag Meaning Data Objects, with each Object type endowed with a unique Identifier.
Broad acceptance that: "Context is king", is gradually taking shape. That said, "Context" landlocked within Literal values offers little over what we have right now (e.g. at Del.icio.us or Technorati), long term. By this I mean: if the end product of semantically enhanced tagging leaves us with: Literal Tag values only, Tags associated with Tag Data Objects endowed with platform specific Identifiers, or Tag Data Objects with any other Identity scheme that excludes HTTP, the ability of Web users to discern or derive multiple perspectives from the base Context (exposed by semantically enhanced Tags) will be lost, or severely impeded at best.
The shape, form, and quality of the lookup substrate that underlies semantic tagging services, ultimately affects "context fidelity" matters such as Entity Disambiguation. The importance of quality lookup infrastructure on the burgeoning Linked Data Web is the reason why OpenLink Software is intimately involved with the DBpedia and UMBEL projects.
I am immensely happy to see that the Web 2.0 and Semantic Web communities are beginning to coalesce around the issue of "Context". This was the case at the WWW2008 Linked Data Workshop, I am feeling a similar vibe emerging from the Semantic Web Technologies conference currently nearing completion in San Jose. Of course, I will be talking about, and demonstrating practical utility of all of this, at the upcoming Linked Data Planet conference.
ODBC identifies data sources using Data Source Names (DSNs).
WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.
ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.
WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).
ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).
WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!
So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!
Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.
URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.
I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.
The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.
By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)
]]>Unfortunately, the cost of completing ZDNet's unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I'll settle for a trackback ping instead.
What follows is the cut and paste of my intended comment contributions to Paul's post.
Paul,
As discussed earlier this week during our podcast session, commercialization of Semantic Web technology shouldn't be a mercurial matter at this stage in the game :-) It's all about looking at how it provides value :-)
From the Linked Data angle, the ability to produce, dispatch, and exploit "Context" across an array of "Perspectives" from a plethora of disparate data sources on the Web and/or behind corporate firewalls, offers immense commercial value.
Yahoo's Searchmonkey effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as "value consumption tickets" (Data Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to data encountered on the Web. Yahoo! is about to put this light on in a big way (imho).
The "self annotating" nature of the Web is what ultimately drives the manifestation of the long awaited Semantic Web. I believe I postulated about "Self Annotation & the Semantic Web" in a number of prior posts which, by the way, should be DataRSS compatible right now due to Yahoo's support of OpenSearch Data Providers (which this Blog Space has been for eons).
Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, Tag, Weblog, Shared Bookmark, Wikiword, Microformat, Microformat++ (eRDF or RDFa), GRDDL stylesheet, and RDFizer etc.. is a piece of structured data.
Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)
Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (information resource), and then you add Structure to the information resource (RSS, Atom, microformats, RDFa, eRDF, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to Linked Data) is a synch thanks to RDF Middleware (as per earlier RDF middleware posts).
]]>During this particular podcast interview, I deliberately wanted to have an conversation about the practical value of Linked Data, rather than the technical innards. The fundamental utility of Linked Data remains somewhat mercurial, and I am certainly hoping to do my bit at the upcoming Linked Data Planet conference re. demonstrating and articulating linked data value across the blurring realms of "the individual" and "the enterprise".
Note to my old schoolmates on Facebook: when you listen to this podcast you will at least reconcile "Uyi Idehen" with "Kingsley Idehen". Unfortunately, Facebook refuses to let me Identify myself in the manner I choose. Ideally, I would like to have the name: "Kingsley (Uyi) Idehen" associated with my Facebook ID since this is the Identifier known to my personal network of friends, family, and old schoolmates. This Identity predicament is a long running Identity case study in the making.
]]>The Linked Data Web (aka. Linked Data) describes RDF data injected into the Web, where the Data Object Identifiers (URIs) in an RDF graph (collection of RDF triples) are endowed with HTTP based URIs. The net effect of this approach to Data Object Identity is that it facilitates "Open Data Access by Reference" on the Web (aka data dereferencing).
If you recall pre Web ubiquity, in the enterprise realm for instance, Open Database Connectivity (ODBC) emerged as a mechanism for separating Data Access and Data Management in the database oriented Client-Sever model. Although ODBC gave you access to data, the data access entry point took the form of a data access specific naming mechanism called a "Data Source Name" (DSN). ODBC DSNs typically exposed Tables or Views. The same thing applies to JDBC where a non HTTP based URN scheme applies.
Zip forward to where we are today on the Web; the Web is evolving from a Document centric Database to a Distributed Object Database, and you should see that in Linked Data we are now truly looking at the best of all worlds: Web Open Database Connectivity (WODBC) with the following advantages:
To conclude, we now have "Semantics Inside" (RDF or non RDF), "Semantic Web" (RDF graphs with Object Identifiers that may or may not be HTTP based), and "Linked Data Web" (RDF graphs with Object Identifiers that must be HTTP based and dereferencable) oriented applications, in the emerging landscape associated with the "Semantics" moniker.
As per usual, this post is a record in my Blog oriented Data Space on the Web. The permalink of this post is a URI constructed with Giant Global Graph enrichment in mind :-)
]]>Of course, I also believe that Linked Data serves Web Data Integration across the Internet very well too, and the fact that it will be beneficial to businesses in a big way. No individual or organization is an island, I think the Internet and Web have done a good job of demonstrating that thus far :-) We're all data nodes in a Giant Global Graph.
Daniel lewis did shed light on the read-write aspects of the Linked Data Web, which is actually very close to the callout for a Wikipedia for Data. TimBL has been working on this via Tabulator (see Tabulator Editing Screencast), Bengamin Nowack also added similar functionality to ARC, and of course we support the same SPARQL UPDATE into an RDF information resource via the RDF Sink feature of our WebDAV and ODS-Briefcase implementations.
]]>In today's primarily Document centric Web, the pursuit of Context is akin to pursuing a mirage in a desert of user generated content. The quest is labor intensive, and you ultimaely end up without water at the end of the pursuit :-)
Listening to the Christine Connor's podcast interview with Talis simply reinforces my strong belief that "Context, Context, Context" is the Semantic Web's equivalent of Real Estate's "Location, Location, Location" (ignore the subprime loans mess for now). The critical thing to note is that you cannot unravel "Context" from existing Web content without incorporating powerful disambiguation technology into an "Entity Extraction" process. Of course, you cannot even consider seriously pursing any entity extraction and disambiguation endeavor without a lookup backbone that exposes "Named Entities" and their relationships to "Subject matter Concepts" (BTW - this is what UMBEL is all about). Thus, when looking at the broad subject of the Semantic Web, we can also look at "Context" as the vital point of confluence for the Data oriented (Linked Data) and the "Linguistic Meaning" oriented perspectives.
I am even inclined to state publicly that "Context" may ultimately be the foundation for 4th "Web Interaction Dimension" where practical use of AI leverages a Linked Data Web substrate en route to exposing new kinds of value :-)
"Context" may also be the focal point of concise value proposition articulation to VCs as in: "My solution offers the ability to discover and exploit "Context" iteratively, at the rate of $X.XX per iteration, across a variety of market segments :-)
]]>Here is the list:
For the time challenged (i.e. those unable to view this post using it's permalink / URI as a data source via the OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, or Tabulator), the benefits of this post are as follows:
Put differently, I cost-effectively contribute to the GGG across all Web interaction dimensions (1.0, 2.0, 3.0) :-)
]]>The great thing about the Linked Data Web is that it's much easier to discovery and respond to these points of view before the ink dries :-) Ben certainly needs to take a look at the Semantic Web FAQ pre or post assimilation of Daniel's response.
]]>Evoluation is evolution devoid of the randomness of mutation. A state of being in which it is possible to evaluate and choose evolutionary paths.
Evoluation actually describes where we are today in relation to the World Wide Web; to the Linking Open Data community (LOD), it's taking the path towards becoming a Giant Global Graph of Linked Data; to the Web 2.0 community, it's simply a collection of Web Services and associated APIs; and to many others, it remains an opaque collection of interlinked documents.
The great thing about the Web is that it allows netizens to explore a plethora of paths without adversely affecting the paths of others. That said, controlling one's path may take mutation out of evolution, but we are still left with the requirement to adapt and eventually survive in a competitive environment. Thus, although we can evaluate and choose from the many paths the Web's evolution offers us, the path that delivers the most benefits ultimately dominates. :-)
]]>Although the Web continues to shrink the planet by removing the restrictions of geopgrahic location, meeting people face-to-face remains invaluable (*priceless in Mastercard AD speak*). Naturally, meeting and chatting with as many LOD community members as possible was high up on my agenda.
As one of the co-chairs of the Linking Open Data Workshop (LODW), I had a 5 minute workshop opening slot during which I spoke about the following:
We have DBpedia as a major hub on the burgeoning Linked Data Web. When OpenLink offered to host DBpedia (a combination of Virtuoso DBMS Software and sizable backend Hardware infrastructure), it did so knowing that such an effort would emphatically address the "chicken and egg" conundrum that, prior to this undertaking, stifled the ability to demonstrate practical utility of HTTP based Linked Data.
Today, the Linked Data bootstrap mission has been accomplished.
Although DBpedia is a hub (ground zero of Linked Data), we have to put it into perspective in relation to a new set of needs and expectations moving forward. Today, DBpedia is a Sun at the heart of a Solar System within the Linked Data Galaxy. But unlike Space as we know it, in Cyberspace we can have connectivity and collaboration across Solar Systems -- life exists elsewhere and we are part of a collaborative collective unimpeded by constraints of space travel etc. Thus, expect to see the emergence of other Solar Systems accessible to DBpedia and its collections of planets (see. LOD diagram). Examples underway include UMBEL which will serve the Linked Data planets from OpenCyc (Subject Matter Concepts), Yago (Named Entities), and Bio2RDF (which provides powerful Bio Informatics based Linked Data planet).
I urged the community to veer more aggressively towards developing and demonstrating practical Linked Data driven solutions that are aligned to well known problems. Of course, I encouraged all presenters to make this an integral part of their presentations :-)
The workshop was well attended and I found all the presentations engaging and full of enthusiasm.
As the sessions progressed, it became clear during a number of accompanying Q&A sessions that a new Linked Data exploitation frontier is emerging. The frontier in question takes the form of a Linked Data substrate capable of addressing the taxonomic needs of solutions aimed at automated Named Entity Extraction, Disambiguation, Subject matter Concept alignment, transparently integrated with existing Web Content. Thus, we are moving beyond the minting and deployment of of dereferencable URIs and RDF data sets to automagically associating existing Web Content with Named Entities (People, Organizations, Places, Events etc..) and Subject matter Concepts (Politics, Music, Sports, and others) while remaining true to the Linking Open Data Community creed i.e. ensuring the Named Entity and Subject matter Concept URIs are available to user agents or users seeking to produce alternative data views (i.e. Mesh-ups).
I will get to part 2 of this report once the actual workshop sessions slides go live (*these are different from the pre-event PDFs links*).
]]>As I can't quite remix Videos on the spur of the moment (yet), I would encourage you to watch the video and then click on the link to my FOAF Profile, then follow the "Linked Data" tab to see how Linked Data oriented platforms (in my case OpenLink Data Spaces) that exist today actually deliver what's explained in the video.
"What You Know" (Data & Friend Networks) ultimately trumps "Who You Know" (Friend only Networks). The exploitation power of this reality is enhanced exponentially via the Linked Data Web once the implications of beaming SPARQL queries down specific URIs (entry points to Linked Data graphs) become clearer :-)
]]>Daniel simplifies my post by using diagrams to depict the different paths for PHP based applications exposing Linked Data - especially those that already provide a significant amount of the content that drives Web 2.0.
If all the content in Web 2.0 information resources are distillable into discrete data objects endowed with HTTP based IDs (URIs), with zero "RDF handcrafting Tax", what do we end up with? A Giant Global Graph of Linked Data; the Web as a Database.
So, what used to apply exclusively, within enterprise settings re. Oracle, DB2, Informix, Ingres, Sybase, Microsoft SQL Server, MySQL, PostrgeSQL, Progress Open Edge, Firebird, and others, now applies to the Web. The Web becomes the "Distributed Database Bus" that connects database records across disparate databases (or Data Spaces). These databases manage and expose records that are remotely accessible "by reference" via HTTP.
As I've stated at every opportunity in the past, Web 2.0 is the greatest thing that every happened to the Semantic Web vision :-) Without the "Web 2.0 Data Silo Conundrum" we wouldn't have the cry for "Data Portability" that brings a lot of clarity to some fundamental Web 2.0 limitations that end-users ultimately find unacceptable.
In the late '80s, the SQL Access Group (now part of X/Open) addressed a similar problem with RDBMS silos within the enterprise that lead to the SAG CLI which is exists today as Open Database Connectivity.
In a sense we now have WODBC (Web Open Database Connectivity), comprised of Web Services based CLIs and/or traditional back-end DBMS CLIs (ODBC, JDBC, ADO.NET, OLE-DB, or Native), Query Language (SPARQL Query Language), and a Wire Protocol (HTTP based SPARQL Protocol) delivering Web infrastructure equivalents of SQL and RDA, but much better, and with much broader scope for delivering profound value due to the Web's inherent openness. Today's PHP, Python, Ruby, Tcl, Perl, ASP.NET developer is the enterprise 4GL developer of yore, without enterprise confinement. We could even be talking about 5GL development once the Linked Data interaction is meshed with dynamic languages (delivering higher levels of abstraction at the language and data interaction levels). Even the underlying schemas and basic design will evolve from Closed World (solely) to a mesh of Closed & Open World view schemas.
]]>In the form above (the norm), Wordpress data can be injected into the Linked Data Web via RDFization middleware such as theVirtuoso Sponger (built into all Virtuoso instances) and Triplr. The downside of this approach is that the blog owner doesn't necessary possess full control over their contributions to the emerging Giant Global Graph or Linked Data.
Another route to Linked Data exposure is via Virtuoso's Metaschema Language for producing RDF Views over ODBC/JDBC accessible Data Sources, that enables the following setup:
Alternatively, you can also exploit Virtuoso as the SQL DBMS, RDF DBMS, Application Server, and Linked Data Deployment platform:
How Do I map the WordPress SQL Schema to RDF using Virtuoso?
Read the Meta Schema Language guide or simply apply our "WordPress SQL Schema to RDF" script to your Virtuoso hosted instance. Of course, there are other mappings that cover other PHP applications deployed via Virtuoso:
Trent Adams, Steve Greenberg, and I, also had a podcast chat about Web Data Portability and Accessibility (Linked Data). I also remixed Jon Breslin's "Data Portability & Me" presentation to produce: "Data Accessibility & Me".
The podcasts interviews and presentations provide contributions to the broadening discourse about Open Data Access / Connectivity on the Web.
]]>The list is nice, but actual execution can be challenging. For instance, when writing a blog post, or constructing a WikiWord, would you have enough disposable time to go searching for these URIs? Or would you compromise and continue to inject "Literal" values into the Web, leaving it to the reasoning endowed human reader to connect the dots?
Anyway, OpenLink Data Spaces is now equipped with a Glossary system that allows me to manage terms, meaning of terms, and hyper-linking of phrases and words matching associated with my terms. The great thing about all of this is that everything I do is scoped to my Data Space (my universe of discourse), I don't break or impede the other meanings of these terms outside my Data Space. The Glossary system can be shared with anyone I choose to share it with, and even better, it makes my upstreaming (rules based replication) style of blogging even more productive :-)
Remember, on the Linked Data Web, who you know doesn't matter as much as what your are connected to, directly or indirectly. Jason Kolb covers this issue in his post: People as Data Connectors, and so doesFrederick Giasson via a recent post titled: Networks are everywhere. For instance, this blog post (or the entire Blog) is a bona fide RDF Linked Data Source, you can use it as the Data Source of a SPARQL Query to find things that aren't even mentioned in this post, since all you are doing is beaming a query through my Data Space (a container of Linked Data Graphs). On that note, let's re-watch Jon Udell's "On-Demand-Blogosphere" screencast from 2006 :-)
]]>ReadWriteWeb via Alex Iskold's post have delivered another iteration of their "Guide to Semantic Technologies".
If you look at the title of this post (and their article) they seem to be accurately providing a guide to Semantic Technologies, so no qualms there. If on the other hand, this is supposed to he a guide to the "Semantic Web" as prescribed by TimBL then they are completely missing the essence of the whole subject, and demonstrably so I may add, since the entities: "ReadWriteWeb" and "Alex Iskold" are only describable today via the attributes of the documents they publish i.e their respective blogs and hosted blog posts.
Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.
TimBL's vision as espoused via the "Semantic Web" vision is about the production, consumption, and sharing of Data Objects via HTTP based Identifiers called URIs/IRIs (Hyperdata Links / Linked Data). It's how we use the Web as a Distributed Database where (as Jim Hendler once stated with immense clarity): I can point to records (entity instances) in your database (aka Data Space) from mine. Which is to say that if we can all point to data entities/objects (not just data entities of type "Document") using these Location, Value, and Structure independent Object Identifiers (courtesy of HTTP) we end up with a much more powerful Web, and one that is closer to the "Federated and Open" nature of the Web.
As I stated in a prior post, if you or your platform of choice aren't producing de-referencable URIs for your data objects, you may be Semantic (this data model predates the Web), but there is no "World Wide Web" in what you are doing.
I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.
As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.
Bottom-line, the Semantic Web is about adding the missing "Open Data Access & Connectivity" feature to the current Document Web (we have to beyond regex, grep, xpath, xquery, full text search, and other literal scrapping approaches). The Linked Data Web of de-referencable data object URIs is the critical foundation layer that makes this feasible.
Remember, It's not about "Applications" it's about Data and actually freeing Data from the "tyranny of Applications". Unfortunately, application inadvertently always create silos (esp. on the Web) since entity data modeling, open data access, and other database technology realm matters, remain of secondary interest to many application developers.
Final comment, RDF facilitates Linked Data on the Web, but all RDF isn't endowed with de-referencable URIs (a major source of confusion and misunderstanding). Thus, you can have RDF Data Source Providers that simply project RDF data silos via Web Services APIs if RDF output emanating from a Web Service doesn't provide out-bound pathways to other data via de-referencable URIs. Of course the same also applies to Widgets that present you with all the things they've discovered without exposing de-referencable URIs for each item.
BTW - my final comments above aren't in anyway incongruent with devising successful business models for the Web. As you may or may not know, OpenLink is not only a major platform provider for the Semantic Web (expressed in our UDA, Virtuoso, OpenLink Data Spaces, and OAT products), we are also actively seeding Semantic Web (tribe: Linked Data of course) startups. For instance, Zitgist, which now has Mike Bergman as it's CEO alongside Frederick Giasson as CTO. Of course, I cannot do Zitgist justice via a footnote in a blog post, so I will expand further in a separate post.
If you look at the title of this post (and their article) they seem to be accurately providing a guide to Semantic Technologies, so no qualms there. If on the other hand, this is supposed to he a guide to the "Semantic Web" as prescribed by TimBL then they are completely missing the essence of the whole subject, and demonstrably so I may add, since the entities: "ReadWriteWeb" and "Alex Iskold" are only describable today via the attributes of the documents they publish i.e their respective blogs and hosted blog posts.
Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.
TimBL's vision as espoused via the "Semantic Web" vision is about the production, consumption, and sharing of Data Objects via HTTP based Identifiers called URIs/IRIs (Hyperdata Links / Linked Data). It's how we use the Web as a Distributed Database where (as Jim Hendler once stated with immense clarity): I can point to records (entity instances) in your database (aka Data Space) from mine. Which is to say that if we can all point to data entities/objects (not just data entities of type "Document") using these Location, Value, and Structure independent Object Identifiers (courtesy of HTTP) we end up with a much more powerful Web, and one that is closer to the "Federated and Open" nature of the Web.
As I stated in a prior post, if you or your platform of choice aren't producing de-referencable URIs for your data objects, you may be Semantic (this data model predates the Web), but there is no "World Wide Web" in what you are doing.
I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.
As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.
Bottom-line, the Semantic Web is about adding the missing "Open Data Access & Connectivity" feature to the current Document Web (we have to beyond regex, grep, xpath, xquery, full text search, and other literal scrapping approaches). The Linked Data Web of de-referencable data object URIs is the critical foundation layer that makes this feasible.
Remember, It's not about "Applications" it's about Data and actually freeing Data from the "tyranny of Applications". Unfortunately, application inadvertently always create silos (esp. on the Web) since entity data modeling, open data access, and other database technology realm matters, remain of secondary interest to many application developers.
Final comment, RDF facilitates Linked Data on the Web, but all RDF isn't endowed with de-referencable URIs (a major source of confusion and misunderstanding). Thus, you can have RDF Data Source Providers that simply project RDF data silos via Web Services APIs if RDF output emanating from a Web Service doesn't provide out-bound pathways to other data via de-referencable URIs. Of course the same also applies to Widgets that present you with all the things they've discovered without exposing de-referencable URIs for each item.
BTW - my final comments above aren't in anyway incongruent with devising successful business models for the Web. As you may or may not know, OpenLink is not only a major platform provider for the Semantic Web (expressed in our UDA, Virtuoso, OpenLink Data Spaces, and OAT products), we are also actively seeding Semantic Web (tribe: Linked Data of course) startups. For instance, Zitgist, which now has Mike Bergman as it's CEO alongside Frederick Giasson as CTO. Of course, I cannot do Zitgist justice via a footnote in a blog post, so I will expand further in a separate post.
Yes, integration is hard, but I do profoundly believe that what's been happening on the Web over the last 10 or so years also applies to the Enterprise, and by this I absolutely do not mean "Enterprise 2.0" since "2.0" and productive agility do not compute in my realm of discourse.
large collections of RSS feeds, Wikiwords, Shared Bookmarks, Discussion Forums etc.. when disconnected at the data level (i.e. hosted in pages with no access to the "data behind") simply offer information deluge and inertia (there are only so many hours for processing opaque information sources in a given day).
Enterprises fundamentally need to process information efficiently as part of a perpetual assessment of their relative competitive Strengths, Weaknesses, Opportunities, and Threats (SWOT), in existing and/or future markets. Historically, IT acquisitions have run counter intuitively to the aforementioned quest for "Ability" due to the predominance of "rip and replace" approach technology acquisition that repeatedly creates and perpetuates information silos across Application, Database, Operating System, Development Environment boundaries. The sequence of events typically occurs as follows:
In the early to mid 90's (pre ubiquitous Web), operating system, programming language, operating system, and development framework independence inside the enterprise was technically achievable via ODBC (due to it's platform independence). That said, DBMS specific ODBC channels alone couldn't address the holistic requirements associated with Conceptual Views of disparate data sources, hence the need for Data Access Virtualization via Virtual Database Engine technology.
Just as is the case on the Web today, with the emergence of the "Linked Data" meme, enterprises now have a powerful mechanism for exploiting the Data Integration benefits associated with generating Data Objects from disparate data sources, endowed with HTTP based IDs (URIs).
Conceptualizing access to data exposed Databases APIs, SOA based Web Services (SOAP style Web Services), Web 2.0 APIs (REST style Web Services), XML Views of SQL Data (SQLX), pure XML etc.. is problem area addressed by RDF aware middleware (RDFizers e.g Virtuoso Sponger).
Here are examples of what SQL Rows exposed as RDF Data Objects (identified using HTTP based URIs) would look like outside or behind a corporate firewall:
What's Good for the Web Goose (Personal Data Space URIs) is good for the Enterprise Gander (Enterprise Data Space URIs).
The aforementioned qualification is increasingly necessary for the following reasons:
The terms GGG, Linked Data, Data Web, Web of Data, and Web 3.0 (when I use this term) all imply URI driven Open Data Access for the Web Database (maybe call this ODBC for the Web) -- ability to point to records across data spaces without any adverse effect to the remote data spaces. It's really important to note that none of the aforementioned terms have nothing to do with the "Linguistic Meaning of blurb". Building a smarter document exposed via a URL without exposing descriptive data links doesn't provide open access to information data sources.
As human beings we are all endowed with reasoning capability. But we can't reason without access to data. Dearth of openly accessible structured data is the source of many ills in cyberspace and across society in general. Today we still have Subjectivity reigning over Objectivity due to the prohibitive costs of open data access.
We can't cost-effectively pursue objectivity without cost-effective infrastructure for creating alternative views of the data behind information sources (e.g. Web Pages). More Objectivity and less Subjectivity is what the next Web Frontier is about. At OpenLink we simply use the moniker: Analysis for All! Everyone becomes a data analyst in some form, and even better, the analysis are easily accessible to anyone connected to the Web. Of course, you will be able to share special analysis with your private network of friends and family, or if you so choose, not at all :-)
Recap, it's important to note that Linked Data is the foundation layer of the Semantic Web vision. It's not only facilitates open data access, it also enables data integration (Meshing as opposed to Mashing) across disparate data schemas
As demonstrated by DBpedia and the Linked Data Solar system emerging around it, if you URI everything, then everything is Cool.
Linked Data and Information Silos are mutually exclusive concepts. Thus, you cannot produce a web accessible Information Silo and then refer to it as "Semantic Web" technology. Of course, it might be very Semantic, but it's fundamentally devoid of critical "Semantic Web" essence (DNA).
My acid test for any Semantic Web solution is simply this (using a Web User Agent or Client):
Here is the Acid test against my Data Space:
*On* the ubiquitous Web of "Linked Documents", HREF means (by definition and usage): Hypertext Reference to an HTTP accessible Data Object of Type: "Document" (an information resource). Of course we don't make the formal connection of Object Type when dealing with the Web on a daily basis, but whenever you encounter the "resource not found" condition notice the message: HTTP/1.0 404 Object Not Found, from the HTTP Server tasked with retrieving and returning the resource.
*In* the Web of "Linked Data", a complimentary addition to the current Web of "Linked Documents", HREF is used to reference Data Objects that are of a variety of "Types", not just "Documents". And the way this is achieved, is by using Data Object Identifiers (URIs / IRIs that are generated by the Linked Data deployment platform) in the strict sense i.e. Data Identity (URI) is separated from Data Address (URL). Thus, you can reference a Person Data Object (aka an instance of a Person Class) in your HREF and the HTTP Server returns a Description of the Data Object via a Document (again, an information resource). A document containing the Description of a Data Object typically contains HREFs to other Data Objects that expose the Attributes and Relationships of the initial Person Data Object, and it this collection of Data Objects that is technically called a "Graph" -- which is what RDF models.
What I describe above is basic stuff for anyone that's familiar with Object Database or Distributed Objects technology and concepts.
The Linked Document Web is a collection of physical resources that traverse the Web Information Bus in palatable format i.e documents. Thus, Document Object Identity and Document Object Data Address can be the same thing i.e. a URL can serve as the ID/URI of a Document Data Object.
The Linked Data Web on the other hand, is a Distributed Object Database, and each Data Object must be uniquely defined, otherwise we introduce ambiguity that ultimately taints the Database itself (making incomprehensible to reasoning challenged machines). Thus we must have unique Object IDs (URIs / IRIs) for People, Places, Events, and other things that aren't Documents. Once we follow the time tested rules of Identity, People can then be associated with the things they create (blog posts, web pages, bookmarks, wikiwords etc). RDF is about expressing these graph model relationships while RDF serialization formats enables the information resources to transport these data object link ladden information resources to requesting User Agents.
Put in more succinct terms, all documents on the Web are compound documents in reality (e.g. mast contain a least an image these days). The Linked Data Web is about a Web where Data Object IDs (URIs) enable us to distill source data from the information contained in a compound document.
The degree of unobtrusiveness of new technology, concepts, or new applications of existing technology, is what ultimately determines eventual uptake and meme virulence (network effects). For a while, the Semantic Web meme was mired in confusion and general misunderstanding due to a shortage of practical use case scenario demos.
The emergence of the SPARQL Query Language has provided critical infrastructure for a number of products, projects, and demos, that now make the utility of the Semantic Web vision mush clearly via the simplicity of Linked Data, as exemplified by the following:
The goal of this effort is standardization of approaches (syntax and methodology) for mapping Relational Data Model instance data to RDF (Graph Data Model).
Every record in a relational table/view/stored procedure (Table Valued Functions/Procedures) is declaratively morphed into an Entity (instance of a Class associated with a Schema/Ontology). The derived entities become part of a graph that exposes relationships and relationship traversal paths that have lower JOIN Costs than attempting the same thing directly via SQL. In a nutshell, you end up with a conceptual interface atop a logical data layer that enables a much more productive mechanism for exploring homogeneous and/or heterogeneous data without confinement at the DB instance, SQL DBMS type, host operating system, local area network, or wide area network levels.
Just as we have to mesh the Linked Data and Document Webs, unobtrusively. It's also important that the same principles to apply to exposure of RDBMS hosted data as RDF based Linked Data.
We all know that a large amount of data driving the IT engines of most enterprises resides in Relational Databases. And contrary to recent RDBMS vs RDF database misunderstandings espoused (hopefully inadvertently) by some commentators, Relational Database engines aren't going away anytime soon. Meshing Relational (logical) and Graph (conceptual) data models a natural progression along an evolutionary path towards: Analysis for All. By the way, there is a parallel evolution occurring in others realms such as Microsoft's ADO.NET's Entity Framework.
To Unobtrusively expose existing data sources as RDF Linked Data. The links that follow provide examples:
BTW - Benjamin Nowack penned an interesting post titled: Semantic Web Aliases, that covers a variety of labels used to describe the Semantic Web. The great thing about this post is that it provides yet another demonstration-in-the-making for the virtues of Linked Data :-)
Labels are harmless when their sole purpose is the creation of routes of comprehension for concepts. Unfortunately, Labels aren't always constructed with concept comprehension in mind, most of the time they are artificial inflectors and deflectors servicing marketing communications goals.
Anyway, irrespective of actual intent, I've endowed all of the labels from Bengee's post with URIs as my contribution important disambiguation effort re. the Semantic Web:
As per usual this post is best appreciated when processed via an Linked Data aware user agent.
]]>Daniel Lewis also penned an interesting post in response to Ian's, that actually triggered this post.
I think definition time has long expired re. the Web's many interaction dimensions, evolutionary stages, and versions.
On my watch it's simply demo / dog-food time. Or as Dan Brickley states: Just Show It.
Below, I've created a tabulated view of the various lanes on the Web's Information Super Highway. Of course, this is a Linked Data demo should you be interested in the universe of data exposed via the links embedded in this post :-)
1.0 |
2.0 |
3.0 |
|
Desire |
Information Creation & Retrieval |
Information Creation, Retrieval, and Extraction |
Distillation of Data from Information |
Information Linkage (Hypertext) |
Information Mashing (Mash-ups) |
Linked Data Meshing (Hyperdata) |
|
Enabling Protocol |
HTTP |
HTTP |
|
Markup |
|||
Basic Data Unit | Resource (Data Object) of type "Document" |
Resource (Data Object) of type "Document" |
Resource (Data Object) that may be one of a variety of Types: Person, Place, Event, Music etc. |
Basic Data Unit Identity |
Resource URL (Web Data Object Address) Â |
Resource URL (Web Data Object Address) Â |
Unique Identifier (URI) that is indepenent of actual Resource (Web Data Object) Address. Note: An Identifier by itself has no utility beyond Identifying a place around which actual data may be clustered. Â |
Query or Search |
Full Text Search patterns |
Full Text Search patterns |
Structured Querying via SPARQL |
Deployment |
Web Server (Document Server) |
Web Server + Web Services Deployment modules |
Web Server + Linked Data Deployment modules (Data Server) |
Auto-discovery |
<link rel="alternate"..> |
<link rel="alternate"..> |
<link rel="alternate" | "meta"..>, basic and/or transparent content negotiation |
Target User | Humans |
Humans & Text extraction and manipulation oriented agents (Scrappers) |
Agents with varying degrees of data processing intelligence and capacity |
Serendipitous Discovery Quotient (SDQ) | Low | Low | High |
Pain |
Information Opacity |
Information Silos |
Data Graph Navigability (Quality) |
Now I can simply state the following using Linked Data (hyperdata) links:
OpenLink Software's product porfolio is comprised of the following product families:We no longer have to explain (repeatedly) why our drivers exist in Express, Lite, and Multi-Tier Edition formats, or why you ultimately need Multi-Tier Drivers over Single Tier Drivers (Express or Lite Editions) since you ultimately heed high-performance, data encryption, and policy based security across each of the data access driver formats.
]]>A while back, I wrote a post titled:Why we need Linked Data. The aim of the post was to bring attention to the implications of exponential growth of User Generated Content (typically, semi-structured and unstructured data) on the Web. The growth in question is occurring within a fixed data & information processing timeframe (i.e. there will always be 24hrs in a day), which sets the stage for Information Overload as expressed in a recent post from ReadWriteWeb titled: Visualizing Social Media Fatigue.
The emerging "Web of Linked Data" augments the current "Web of Linked Documents", by providing a structured data corpus partitioned by containers I prefer to call: Data Spaces. These spaces enable Linked Data aware solutions to deliver immense value such as, complex data graph traversal, starting from document beachheads, that expose relevant data within a faction of the time it would take to achieve the same thing using traditional document web methods such as full text search patterns, scraping, and mashing etc.
Remember, our DNA based data & information system far exceeds that of any inorganic system when it comes to reasoning, but it remains immensely incapable of accurately and efficiently processing huge volumes of data & information -- irrespective of data model.
The Idea behind the Semantic Web has always been about an evolution of the Web into a structured data collective comprised of interlinked Data items and Data Containers (Data Spaces). Of course we can argue forever about the Semantics of the solution (ironically), but we can't shirk away from the impending challenges that "Information Overload" is about to unleash on our limited processing time and capabilities.
For those looking for a so called "killer application" for the Semantic Web, I would urge you to align this quest with the "Killer Problem" of our times, because when you do so you will that all routes lead to: Linked Data that leverages existing Web Architecture.
Once you understand the problem, you will hopefully understand that we all need some kind of "Data Junction Box" that provides a "Data Access Focal Point" for all of the data we splatter across the net as we sign up for the next greatest and latest Web X.X hosted service, or as we work on a daily basis with a variety of tools within enterprise Intranets.
BTW - these "Data Junction Boxes" will also need to be unobtrusively bound to our individual Identities.
]]>OpenLink Data Spaces (ODS) now officially supports:
- Attention Profiling Markup Language (APML).
- Meaning of a Tag (MOAT) in conjunction with Simple Knowledge Organisation System (SKOS) and Social-Semantic Cloud of Tags (SCOT).
- OAuth - an Open Authentication Protocol
Which means that OpenLink Data Spaces support all of the main standards being discussed in the DataPortability Interest Group!
APML Example:
All users of ODS automatically get a dynamically created APML file, for example: APML profile for Kingsley Idehen
The URI for an APML profile is: http://myopenlink.net/dataspace/<ods-username>/apml.xml
Meaning of a Tag Example:
All users of ODS automatically have tag cloud information embedded inside their SIOC file, for example: SIOC for Kingsley Idehen on the Myopenlink.net installation of ODS.
But even better, MOAT has been implemented in the ODS Tagging System. This has been demonstrated in a recent test blog post by my colleague Mitko Iliev, the blog post comes up on the tag search: http://myopenlink.net/dataspace/imitko/weblog/Mitko%27s%20Weblog/tag/paris
Which can be put through the OpenLink Data Browser:
OAuth Example:
OAuth Tokens and Secrets can be created for any ODS application. To do this:
- you can log in to MyOpenlink.net beta service, the Live Demo ODS installation, an EC2 instance, or your local installation
- then go to âSettingsâ
- and then you will see âOAuth Keysâ
- you will then be able to choose the applications that you have instantiated and generate the token and secret for that app.
Related Document (Human) Links
- OpenLink Data Spaces Official Page
- OpenLink Software Page
- OpenLink Data Spaces Wikipedia Page
- Attention Profiling Markup Language Project Website
- Meaning of a Tag Project Website
- Simple Knowledge Organisation Systems Project Website
- Social-Semantic Cloud of Tags Project Website
- OAuth Protocol Website
- DataPortability.org Website
- Semantically Interlinked Online Communities Project Website
Remember (as per my most recent post about ODS), ODS is about unobtrusive fusion of Web 1.0, 2.0, and 3.0+ usage and interaction patterns. Thanks to a lot of recent standardization in the Semantic Web realm (e.g SPARQL), we are now employ the MOAT, SKOS, and SCOT ontologies as vehicles for Structured Tagging.
This is how we take a key Web 2.0 feature (think 2D in a sense), bend it over, to create a Linked Data Web (Web 3.0) experience unobtrusively (see earlier posts re. Dimensions of Web). Thus, nobody has to change how they tag or where they tag, just expose ODS to the URLs of your Web 2.0 tagged content and it will produce URIs (Structured Data Object Identifiers) and a lnked data graph for your Tags Data Space (nee. Tag Cloud). ODS will construct a graph which exposes tag subject association, tag concept alignment / intended meaning, and tag frequencies, that ultimately deliver "relative disambiguation" of intended Tag Meaning (i.e. you can easily discern the taggers meaning via the Tags actual Data Space which is associated with the tagger). In a nutshell, the dynamics of relevance matching, ranking, and the like, change immensely without futile timeless debates about matters such as:
We can just get on with demonstrating Linked Data value using what exists on the Web today. This is the approach we are deliberately taking with ODS.
Tip: This post is best viewed via an RDF aware User Agent (e.g. a Browser or Data Viewer). I say this because the permalink of this post is a URI in a Linked Data Space (My Blog) comprised of more data than meets the eye (i.e. what you see when you read this post via a Document Web Browser) :-)
]]>There are quite a few reasons to use OpenLink Data Spaces (ODS). Here are 10 of the reasons why I use ODS:
- Its native support of DataPortability Recommendations such as RSS, Atom, APML, Yadis, OPML, Microformats, FOAF, SIOC, OpenID and OAuth.
- Its native support of Semantic Web Technologies such as: RDF and SPARQL/SPARUL for querying.
- Everything in ODS is an Object with its own URI, this is due to the underlying Object-Relational Architecture provided by Virtuoso.
- It has all the social media components that you could need, including: blogs, wikis, social networks, feed readers, CRM and a calendar.
- It is expandable by installing pre-configured components (called VADs), or by re-configuring a LAMP application to use Virtuoso. Some examples of current VADs include: MediaWiki, Wordpress and Drupal.
- It works with external webservices such as: Facebook, del.icio.us and Flickr.
- Everything within OpenLink Data Spaces is Linked Data, which provides more meaningful information than just plain structural information. This meaningful information could be used for complex inferencing systems, as ODS can be seen as a Knowledge Base.
- ODS builds bridges between the existing static-document based web (aka âWeb 1.0â), the more dynamic, services-oriented, social and/or user-orientated webs (aka âWeb 2.0â) and the web which we are just going into, which is more data-orientated (aka âWeb 3.0â or âLinked Data Webâ).
- It is fully supportive of Cloud Computing, and can be installed on Amazon EC2.
- Its released free under the GNU General Public License (GPL). [note]However, it is technically dual licensed as it lays on top of the Virtuoso Universal Server which has both Commercial and GPL licensing[/note]
The features above collectively provide users with a Linked Data Junction Box that may reside with corporate intranets or "out in the clouds" (Internet). You can consume, share, and publish data in a myriad of formats using a plethora of protocols, without any programming. ODS is simply about exposing the data from your Web 1.0, 2.0, 3.0 application interactions in structured from, with Linking, Sharing, and ultimately Meshing (not Mashing) in mind.
Note: Although ODS is equipped with a broad array of Web 2.0 style Applications, you do not need to use native ODS apps in order to exploit it's power. It binds to anything that supports the relevant protocols and data formats.
]]>If you want to explore who I know, what I read, and what I've tagged (amongst other things), all you have to do is:
Some Tools that help you comprehend what I am saying:
How Do I create the missing Bitmap Indexes?
Go to the HTML based Virtuoso Conductor, iSQL command line interface, or an ODBC / JDBC / ADO.NET / OLE DB client and execute:
CREATE BITMAP index RDF_QUAD_POGS on DB.DBA.RDF_QUAD (P,O,G,S);
CREATE BITMAP index RDF_QUAD_PSOG on DB.DBA.RDF_QUAD (P,S,O,G);
CREATE BITMAP index RDF_QUAD_SOPG on DB.DBA.RDF_QUAD (S,O,P,G);
Jason recently moved to Massachusetts which lead to me pinging him about our earlier blogosphere encounter and the emergence of a Data Portability Community. I also informed him about the fact that TimBL, myself, and a number of other Semantic Web technology enthusiasts, frequently meet on the 2nd Tuesday of each month at the MIT hosted Cambridge Semantic Web Gatherings, to discuss, demonstrate, debate all aspects of the Semantic Web. Luckily (for both of us), Jason attended the last event, and we got to meet each other in person.
Following our face to face meeting in Cambridge, a number of follow-on conversations ensued covering, Linked Data and practical applications of the Semantic Web vision. Jason writes about our exchanges a recent post titled: The Semantic Web. His passion for Data Portability enabled me to use OpenID and FOAF integration to connect the Semantic Web and Data Portability via the Linked Data concept.
During our conversations, Jason also eluded to the fact that he had already encountered OpenLink Software while working with our ODBC Drivers (part of or UDA product family) for IBM Informix (Single-Tier or Multi-Tier Editions) a few years ago (interesting random connection).
As I've stated in the past, I've always felt that the Semantic Web vision will materialize by way of a global epiphany. The count down to this inevitable event started at the birth of the blogosphere, ironically. And accelerated more recently, through the emergence of Web 2.0 and Social Networking, even more ironically :-)
The blogosphere started the process of Data Space coalescence via RSS/Atom based semi-strucutured data enclaves, Web 2.0 RDFpropagated Web Service usage en route to creating service provider controlled, data and information silosRDF, Social NetworkingRDF brought attention to the fact that User Generated Data wasn't actually owned or controlled by the Data Creators etc.
The emergence of "Data Portability" has created a palatable moniker for a clearly defined, and slightly easier to understand, problem: the meshing of Data and Identity in cyberspace i.e. individual points of presence in cyberspace, in the form of "Personal Data Spaces in the Clouds" (think: doing really powerful stuff with .name domains). In a sense, this is the critical inflection point between the document centric "Web of Linked Documents" and the data centric "Web or Linked Data". There is absolutely no other way solve this problem in a manner that alleviates the imminent challenges presented by information overload -- resulting from the exponential growth of user generated data across the Internet and enterprise Intranets.
]]>A query language for the burgeoning Structured & Linked Data Web (aka Semantic Web / Giant Global Graph). Like SQL, for the Relational Data Model, it provides a query language for the Graph based RDF Data Model.
It's also a REST or SOAP based Web Service that exposes SPARQL access to RDF Data via an endpoint.
In addition, it's also a Query Results Serialization format that includes XML and JSON support.
It brings important clarity to the notion of the "Web as a Database" by transforming existing Web Sites, Portals, and Web Services into bona fide corpus of Mesh-able (rather than Mash-able) Data Sources. For instance, you can perform queries that join one or more of the aforementioned data sources in exactly the same manner (albeit different syntax) as you would one or more SQL Tables.
-- SPARQL equivalent of SQL SELECT * against my personal data space hosted FOAF file
SELECT DISTINCT ?s ?p ?o FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s ?p ?o}
-- SPARQL against my social network -- Note: My SPARQL will be beamed across all of contacts in the social networks of my contacts as long as they are all HTTP URI based within each data space
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?Person FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s a foaf:Person; foaf:knows ?Person}
Note: you can use the basic SPARQL Endpoint, SPARQL Query By Example, or SPARQL Query Builder Demo tool to experiment with the demonstration queries above.
SPARQL is implemented by RDF Data Management Systems (Triple or Quad Stores) just as SQL is implemented by Relational Database Management Systems. The aforementioned data management systems will typically expose SPARQL access via a SPARQL endpoint.
A SPARQL implementors Testimonial page accompanies the SPARQL press release. In addition the is a growing collection of implementations on the ESW Wiki Page for SPARQL compliant RDF Triple & Quad Stores.
Yes! SPARQL facilitates an unobtrusive manifestation of a Linked Data Web by way of natural extension of the existing Document Web i.e these Web enclaves co-exist in symbiotic fashion.
As DBpedia very clearly demonstrates, Linked Data makes the Semantic Web demonstrable and much easier to comprehend. Without SPARQL there would be no mechanism for Linked Data deployment, and without Linked Data there is no mechanism for Beaming Queries (directly or indirectly) across the Giant Global Graph of data hosted by Social Networks, Shard Bookmarks Services, Weblogs, Wikis, RSS/Atom/OPML feeds, Photo Galleries and other Web accessible Data Sources (Data Spaces).
Information overload and Data Portability are two of the most pressing and imminent challenges affecting every individual connected to the global village exposed by the Internet and World Wide Web. I wrote an earlier post titled: Why We Need Linked Data that shed light on frequently overlooked realities about the Document Web.
The real Killer application of the Semantic Web (imho) is Linked Data (or Hyperdata), just as the killer application of the Document Web was Linked Documents (Hyperlinks). Linked Data enables human users (indirectly) and software agents (directly in response to human instruction) to traverse Web Data Spaces (Linked Data enclaves within the Giant Global Graph).
Semantic Web applications (conduits between humans and agents) that take advantage of Linked Data include:
DBpedia - General Knowledge sourced from Wikipedia and a host of other Linked Data Spaces.
Various Linked Data Browsers: Zitgist Data Viewer, OpenLink RDF Browser, DISCO Browser, and TimBL's Tabulator.
zLknks - Linked Data Lookup technology for Web Content Publishing systems (note: more to come on this in a future post).
OpenLink Data Spaces - a solution for Data Portability via a Linked Data Junction Box for Web 1.0 ((X)HTML Document Webs), 2.0 (XML Web Services based Content Publishing, Content Syndication, and Aggregation), and 3.0 (Linked Data) Data Spaces. Thus, via my URI (when viewed through a Linked Data Browser/Viewer) you can traverse my Data Space (i.e my Linked Data Graph) generated by the following activities:
Virtuoso - a Universal Server Platform that includes RDF Data Management, RDFization Middleware, SQL-RDF Mapping, RDF Linked Data Deployment, alongside a hybrid/multi-model, virtual/federated data service in a single product offering.
BTW - There is a Linked Data Workshop at this years World Wide Web conference. Also note the Healthcare & Life Science Workshop which is a related Linked Data technology and Semantic Web best practices realm. ]]>Senator Barack Obama is a beacon of change within the democratic party while Senator Hillary Clinton is status quo.
According to the data in the GovtTrack.us data space:
Senator Barack Obama is a rank-and-file Democrat according to GovTrack's analysis of his track record in congress. Whereas, Senator Hillary Clinton is a radical democrat, according to the same Govt. Track analysis of her track record in congress.
Who do we believe? The GovtTrack.us performance data, old media pundits, or postulations of the candidates? GovtTrack.us is a new approach to candidate vetting. It provides data in traditional Document Web and Linked Data Web forms, placing analytic power in the hands of the citizen.
Here are insights into the track records of Senators Hillary Clinton and Barack Obama via the Zitgist Linked Data Viewer:
Note: I am not aligned to any political party or candidate, this is just a demonstration of Linked Data that has a high degree of poignancy relative to US primary elections etc..
]]>So, unlike Scoble, I am able to make my Facebook Data portable without violating Facebook rules (no data caching outside Facebook realm) by doing the following:
In a nutshell, my Linked Data Space enables you to reference data in my data space via Object Identifiers (URIs), and some cases the Object IDs and Graphs are constructed on the fly via RDFization middleware.
Here are my URIs that provide different paths to my Facebook Data Space:
To conclude, 2008 is clearly the inflection year during which we will final unshackle Data and Identity from the confines of "Web Data Silos" by leveraging the HTTP, SPARQL, and RDF induced virtues of Linked Data.
Related Posts:
Writing a JDBC Driver for SPARQL is a little overkill. OpenOffice.org simply needs to make XML or Web Data (HTML, XHTML, and XML) bonafide data sources within its "Pivot Table" functionality realm. Then all that would then be required is a SPARQL SELECT Query transported via the SPARQL Protocol with results sent back using the SPARQL XML results serialization format (all part of a single SPARQL Protocol URL).
Excel successfully consumes the following information resource URI: http://tinyurl.com/yvoccj (a tiny url for a SPARQL SELECT against my FOAF file).
Alternatively, and currently achievable, you could simply use SPASQL (SPARQL within SQL) using a DBMS engine that supports SQL, SPARQL, and SPARQL e.g. Virtuoso.
Virtuoso SPASQL support is exposed via it's ODBC and/or JDBC Drivers. Thus you can do things such as:
BTW - My News Years Resolution: get my act together and shrink the ever increasing list of "simple & practical Virtuoso use case demos" on my todo which now spans all the way back to 2006 :-(
]]>Here goes:
In addition, in one week, courtesy of the Web, UK Semnantic Web Gatherings in Bristol and Oxford, I discover, interview, and employ Daniel :-) Imagine how long this would have taken to pull off via the Document Web, assuming I would even discover Daniel.
As with all things these days, the Web and Internet change everything, which includes talent discovery and recruitment.
A Global Social graph that is a mesh of Linked Data enables the process of recruitment, marketing, and other elements of busines management to be condensed down to a sending powerful beams across the aforementioned Graph :-) The only variable pieces are the traversal paths exposed to your beam via the beam's entry point URI. In my case, I have a single URI that exposes a Graph of critical paths for the Blogosphere (i.e data spaces of RSS Atom Feeds). Thus, I can discover if your profile matches the requirements associated with an opening at OpenLink Software (most of the time) before you do :-)
BTW - I just noticed that John Breslin described ODS as social-graph++ in his recent post, titled: Tales from the SIOC-o-sphere, part 6. In a funny way, this reminds of a post from the early blogosphere days about platforms and Weblog APIs (circa. 2003) about ODS (then exposed via the Blog Platform realm of Virtuoso).
]]>"The phrase Open Social implies portability of personal and social data. That would be exciting but there are entirely different protocols underway to deal with those ideas. As some people have told me tonight, it may have been more accurate to call this "OpenWidget" - though the press wouldn't have been as good. We've been waiting for data and identity portability - is this all we get?"
[Source: Read/Write Web's Commentary & Analysis of Google's OpenSocial API]
..Perhaps the world will read the terms of use of the API, and realize this is not an open API; this is a free API, owned and controlled by one company only: Google. Hopefully, the world will remember another time when Google offered a free API and then pulled it. Maybe the world will also take a deeper look and realize that the functionality is dependent on Google hosted technology, which has its own terms of service (including adding ads at the discretion of Google), and that building an OpenSocial application ties Google into your application, and Google into every social networking site that buys into the Dream. Hopefully the world will remember. Unlikely, though, as such memories are typically filtered in the Great Noise....
[Source: Poignant commentary excerpt from Shelly Power's Blog (as always)]
The "Semantic Data Web" vision has always been about "Data & Identity" portability across the Web. Its been that and more from day one.
In a nutshell, we continue to exhibit varying degrees of Cognitive Dissonance re the following realities:
The Data Web is about Presence over Eyeballs due to the following realities:
This is why we need to inject a mesh of Linked Data into the existing Web. This is what the often misunderstood vision of the "Semantic Data Web" or "Web of Data" or "Web or Structured Data" is all about.
As stated earlier (point 10 above), "Data is forever" and there is only more of it to come! Sociality and associated Social Networking oriented solutions are at best a spec in the Web's ocean of data once you comprehend this reality.
Note: I am writing this post as an early implementor of GData and an implementor of RDF Linked Data technology and a "Web Purist".
OpenSocial implementation and support across our relevant product families: Virtuoso (i.e the Sponger Middleware for RDF component), OpenLink Data Spaces (Data Space Controller / Services), and the OpenLink Ajaxt Toolkit (i.e OAT Widgets and Libraries), is a triviality now that the OpenSocial APIs are public.
The concern I have, and the problem that remains mangled in the vast realms of Web Architecture incomprehension, is the fact that GData and GData based APIs cannot deliver Structured Linked Data in line with the essence of the Web without introducing "lock-in" that ultimately compromises the "Open Purity" of the Web. Facebook and Google's OpenSocial response to the Facebook juggernaut (i.e. open variant of the Facebook Activity Dashboard and Social Network functionality realms, primarily), are at best icebergs in the ocean we know as the "World Wide Web". The nice and predictable thing about icebergs is that they ultimately melt into the larger ocean :-)
On a related note, I had the pleasure of attending the W3C's RDF and DBMS Integration Workshop, last week. The event was well attended by organizations with knowledge, experience, and a vested interested in addressing the issues associated with exposing none RDF data (e.g. SQL) as RDF, and the imminence of data and/or information overload covered in different ways via the following presentations:Download Lnks:
Jon Udell recently penned a post titled: The Fourth Platform. The post arrives at a spookily coincidental time (this happens quite often between Jon and I as demonstrated last year during our podcast; the "Fourth" in his Innovators Podcast series).
The platform that Jon describes is "Cloud Based" and comprised of Storage and Computation. I would like to add Data Access and Management (native and virtual) under the fourth platform banner with the end product called: "Cloud based Data Spaces".
As I write, we are releasing a Virtuoso AMI (Amazon Image) labeled: virtuoso-dataspace-server. This edition of Virtuoso includes the OpenLink Data Spaces Layer and all of the OAT applications we've been developing for a while.
There's more to come!
]]>First off, I am going to focus on the Semantic Data Web aspect of the overall Semantic Web vision (a continuum) as this is what we have now. I am also writing this post as a deliberate contribution to the discourse swirling around the real topic: Semantic Web Value Proposition.
We are in the early stages of the long anticipated Knowledge Economy. That being the case, it would be safe to assume that information access, processing, and dissemination are of utmost importance to individuals and organizations alike. You don't produce knowledge in a vacum! Likewise, you can produce Information in a vacum, you need Data.
Increasingly, Blogs, Wikis, Shared Bookmarks, Photo Galleries, Discussion Forums, Shared Calendars and the like, have become invaluable tools for individual and organizational participation in Web enabled global discourse (where a lot of knowledge is discovered). These tools, are typically associated with Web 2.0, implying Read-Write access via Web Services, centralized application hosting, and data lock-in (silos).
The reality expressed above is a recipe for "Information Overload" and complete annihilation of ones effective pursuit and exploitation of knowledge due "Time Scarcity" (note: disconnecting is not an option). Information abundance is inversely related to available processing time (for humans in particular). In my case for instance, I was actively subscribed to over 500+ RSS feeds in 2003. As of today, I've simply stopped counting, and that's just my Weblog Data Space. Then add to that, all of the Discussions I track across Blogs, wikis, message boards, mailing lists, traditional usnet discussion forumns, and the like, and I think you get the picture.
Beyond information overload, Web 2.0 data is "Semi-Structured" by way of it's dominant data containers ((X)HTML, RSS, Atom documents and data streams etc.) lacking semantics that formally expose individual data items as distinct entities, endowed with unambiguous naming / identification, descriptive attributes (a type of property/predicate), and relationships (a type of property/predicate).
Solution:Devise a standard for Structured Data Semantics that is compatible with the Web Information BUS.
Produce structured data (entities, entity types, entity relationships) from Web 1.0 and Web 2.0 resources that already exists on the Web such that individual entities, their attributes, and relationships are accessible and discernible to software agents (machines).
Once the entities are individually exposed, the next requirement is a mechanism for selective access to these entities i.e. a query language.
Semantic Data Web Technologies that facilitate the solution described above include:
Structured Data Standards:Use of URIs or IRIs for uniquely identifying physical (HTML Documents, Image Files, Multimedia Files etc..) and abstract (People, Places, Music, and other abstract things).
Entity Access & Querying:SPARQL Query Language - the SQL analog of the Semantic Data Web that enables query constructs that target named entities, entity attributes, and entity relationships
Organizations are rife with a plethora of business systems that are built atop a myriad of database engines, sourced from a variety of DBMS vendors. A typical organization would have a different database engine, from a specific DBMS vendor, underlying critical business applications such as: Human Resource Management (HR), Customer Relationship Management (CRM), Accounting, Supply Chain Management etc. In a nutshell, you have DBMS Engines, and DBMS Schema heterogeneity permeating the IT infrastructure of organizations on a global scale, making Data & Information Integration the biggest headache across all IT driven organizations.
Solution:Alleviation of the pain (costs) associated with Data & Information Integration.
Semantic Data Web offerings:A dexterous data model (RDF) that enables the construction of conceptual views of disparate data sources across an organization based on existing web architecture components such as HTTP and URIs.
Existing middleware solutions that facilitate the exposure of SQL DBMS data as RDF based Structured Data include:
BTW - There is an upcoming W3C Workshop covering the integration of SQL and RDF data.
The Semantic Data Web is here, it's value delivery vehicle is the URI. The URI is a conduit to Interlinked Structured Data (RDF based Linked Data) derived from existing data sources on the World Wide Web alongside data continuously injected into the Web by organizations world wide. Ironically, the Semantic Data Web only platform that crystallizes the: Information at Your Fingertips vision, without development environment, operating system, application, or database lock-in. You simply click on a Linked Data URI and the serendipitous exploration and discovery of data commences.
The unobtrusive emergence of the Semantic Data Web is a reflection of the soundness of the underlying Semantic Web vision.
If you are excited about Mash-ups then your are a Semantic Web enthusiast and benefactor in the making, because you only "Mash" (brute force data extraction and interlinking) because you can't "Mesh" (natural data extraction and interlinking). Likewise, if you are a social-networking, open social-graph, or portable social-network enthusiast, then you are also a Semantic Data Web benefactor and enthusiasts, because your "values" (yes, the values associated with the properties that define you e.g your interests etc) are the fundamental basis for portable, open, social-networking, which is what the Semantic Data Web hands to you on a platter without compromise (i.e. data lock-in or loss of data ownership).
Some practical examples of Semantic Data Web prowess:
Both browsers should lead you to the posts from Danny, Nova, and Tim. In both cases the URI < xmlns="http" www.openlinksw.com="www.openlinksw.com" dataspace="dataspace" kidehen="kidehen" openlinksw.com="openlinksw.com" weblog="weblog" s="s" blog="blog" b127="b127" d="d"> is a pointer to structured data (in my Blog Data Space) if your user agent (browser or other Web Client) requests an RDF representation of this post via its HTTP request payload (what the Browser are doing via the "Accept:" headers).</>
As you can see the Data Web is actually here! Without RDF generation upheaval (or Tax).
]]>My Comments:
Hyperdata is short for HyperLinked Data :-) The same applies to Linked Data. Thus, we have two literal labels for the same core Concept. HTTP is the enabling protocol for "Hyper-linking" Documents and associated Structured Data via the World Wide Web (Web for short). Data Links associated with Structured Data contained in, or hosted by, Documents on the Web.
RDFa, eRDF, GRDDL, SPARQL Query Language, SPARQL Protocol (SOAP or REST service), SPARQL Results Serializations (XML or JSON) collectively provide a myriad of unobtrusive routes to structured data embedded within, or associated with, existing Web Documents.
As Danny already states, ontologies are not prerequisites for producing structured data using the RDF Data Model. They simply aid the ability to express one's self clearly (i.e. no repetition or ambiguity) across a broad audience of machines (directly) and their human masters (indirectly).
Using the crux of this post as the anecdote: The Semantic Data Web would simplify the process of claiming and/or proving that Linked Data and Hyperdata describe the same concept. It achieves this by using Triples (Subject, Predicate, Object) expressed in various forms (N3, Turtle, RDF/XML etc.) to formalize claims in a form palatable to electronic agents (machines) operating on behalf of Humans. In a nutshell, this increases human productive by completely obliterates the erstwhile exponential costs of discovering data, information, and knowledge.
BTW - for full effect, view this post (i.e. cut and paste the Permalink URI of this post, below) into an RDF Browser such as:
Since I am aggressively tracking RDFa developments, I decided to quickly view Ivan's FOAF-in-RDFa file via the OpenLink RDF Browser. The full implications are best understood when you click on each of the Browser's Tabs -- each providing a different perspective on this interesting addition to the Semantic Data Web (note: the Fresnel Tab which demonstrates declarative UI templating using N3).
The OpenLink RDF Browser is a Rich Internet Application built using OAT (OpenLink Ajax Toolkit). In my case, I am deploying the RDF Browser from a Virtuoso instance, which implies that the Browser is able to use the Virtuoso Sponger Middleware (exposed as a REST Service at the Virtuoso instance endpoint: /proxy); which includes an RDFa Cartridge comprised of a metadata extractor and an RDF Schema / OWL Ontology mapper. That's it!
]]>"Reasonable Humans" continue to struggle with the congruency of social-networking and the Semantic Web. Funnily enough, our less capable counterparts (in the reasoning department) don't have such struggles. Here is my little interaction with an IRC Bot called "Phenny" (possibly mother of the Agent Web or Web of Agents or Web 4.0):
kidehen: .w web
phenny: web n. 1: An intricate network suggesting something that was formed by weaving or interweaving
phenny: web n. 2: An intricate trap that entangles or ensnares its victim
phenny: web v. 1: Construct or form a web, as if by weaving.
kidehen: .w network
phenny: network n. 1: An interconnected system of things or people
phenny: network n. 2: (broadcasting) a communication system consisting of a group of broadcasting stations that all transmit the same programs
phenny: network v. 1: Communicate with and within a group.
kidehen: .w social
phenny: social n. 1: A party of people assembled to promote sociability and communal activity
phenny: social a. 1: Relating to human society and its members
phenny: social a. 2: Living together or enjoying life in communities or organized groups.
kidehen: .w semantic
phenny: semantic a. 1: Of or relating to meaning or the study of meaning.
kidehen: .w graph
phenny: graph n. 1: A visual representation of the relations between certain quantities plotted with reference to a set of axes
phenny: graph v. 1: Represent by means of a graph
phenny: graph v. 2: Plot upon a graph.Note: Wikipedia's Graph definition is the one the applies to RDF :-)
To conclude, the whole Semantic Data Web vision and Open social-networking go hand-in-hand. You can't attain "Open social-networking" without appreciating the underlying core concepts (Structured Data, Graphs, and Triples). There is no way around the reality that all networks are graphs. Of course, this doesn't imply that solution users (end-users) have to master graph theory en route to network participation and exploitation. It simply implies that solution developers need some graph data model appreciation, or a graph data model aware platform, when developing social-networking solutions that seek to leverage the Web.
chimezie: .w tautology
phenny: tautology n. 1: (logic) a statement that is necessarily true
phenny: tautology n. 2: Useless repetition.
chimezie: .ety tautology
phenny: "1579, from L.L. tautologia 'representation of the same thing' (c.350), from Gk. tautologia, from tautologos 'repeating what has been said,' from tauto 'the same' + -logos 'saying,' related to legein 'to say' (see lecture)." - http://etymonline.com/?term=tautology
That lead me to the following sequence (preceding the initial IRC session dump in this post):
kidehen: .w conflagration
kidehen: .w imbroglio
phenny: conflagration n. 1: A very intense and uncontrolled fire.
phenny: imbroglio n. 1: An intricate and confusing interpersonal or political situation
phenny: imbroglio n. 2: A very embarrassing misunderstanding.
kidehen: .w buzzword
phenny: buzzword n. 1: Stock phrases that have become nonsense through endless repetition.
In sense, proposing the Semantic Data Web as a solution to open social-networiing challenges, more often than not results in your "No Semantic Web here" imbroglio. In a sense, the shortest path to a buzzword fueled conflagration :-)
]]>The abstract of my Semantic Web Strategies keynote contains a reference to the acronym MLD but it doesn't really expose what MLD is (i.e. initial acronym source isn't clearly identified in the abstract's opening paragraph). Thus, I am attempting to fix the aforementioned anomally via this blog post :-)
Market Leadership Discipline (MLD) is defined as follows: A strategy adopted by a company for attaining leadership in a given marketplace.
MLD strategies usually take one of the following forms:
MLD is a critical component of Enterprise Agility.
]]>New Semantic Data Web related features and enhancements include:
A dynamically generated Web Page comprised of Semantic Data Web style data links (formally typed links) and traditional Document Web links (generic links lacking type specificity).
Linked Data Pages will ultimately enable Facebook users to inject their public data into the Semantic Data Web as RDF based Linked Data. For instance, my Facebook Profile & Photo albums data is now available as RDF, without paying a cent of RDF handcrafting tax, thanks to the Virtuoso Sponger (middleware for producing RDF from non RDF data sources) which is now equipped with a new RDFizer Cartridger for the Facebook Query Language (FQL) and RESTful Web Service.
Demo Notes:
When you click on a link in DLD pages, you will be presented with a lookup that exposes the different interaction options associated with a given URI. Examples include:
Remember, the facebook URLs (links to web pages) are being converted, on the fly, into RDF based Structured Data ( graph model database) i.e Entity Sets that possess formally defined characteristics (attributes) and associations (relationships).
Note:This all happens because the OAT based RDF Browser simply makes a call to the Virtuoso Sponger's REST service which is exposed at the endpoint "/proxy" (note: this is standard with all Virtuoso Installations).
]]>This article, like the one from Mike, and our soon to be released Linked Data Deployment white paper, collectively address the main topic without inadvertent distraction by the misnomer: non-information resource. For instance, the OAI article uses the term: Generic Resource instead of Non-informaton Resource.
The Semantic Data Web is here, but we need to diffuse this reality across a broader spectrum of Web communities, so as to avoid unnecessary uptake inertia that can arise due basic incomprehension of key concepts such as Linked Data deployment.
]]>Note: I make no reference to "non information" resource, since a non-information resource is a data resource that may or may not contain 100% structured data. Also note that even when structured, the format may not be RDF.
]]>On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS.
Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol.
The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers).
HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc).
Examples of Information Resource and Data Source URIs:
Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL).
Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following:
The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as an Information Resource. It's comprised of granular data items called "Entities", that expose fine grained properties values, individual and/or group characteristics (attributes), and relationships (associations) with other Entities.
The Web is in the final stages of the 3rd phase of it's evolution. A phase characterized by the shuttling of structured data payloads (RDF) alongside less data oriented payloads (HTML, XHTML, XML etc.). As you can see, Linked Data and Structured Data are both terms used to describe the addition of more data centric payloads to the Web. Thus, you could view the process of creating a Structured Web of Linked Data as follows:
The Semantic Data Web is an evolution of the current Web (an Information Space) that adds structured data payloads (RDF) to current, less data oriented, structured payloads (HTML, XHTML, XML, and others).
The Semantic Data Web is increasingly seen as an inevitability because it's rapidly reaching the point of critical mass (i.e. network effect kick-in). As a result, Data Web emphasis is moving away from: "What is the Semantic Data Web?" To: "How will Semantic Data Web make our globally interconnected village an even better place?", relative to the contributions accrued from the Web thus far. Remember, the initial "Document Web" (Web 1.0) bootstrapped because of the benefits it delivered to blurb-style content publishing (remember the term electronic brochure-ware?). Likewise, in the case of the "Services Web" (Web 2.0), the bootstrap occurred because it delivered platform independence to Web Application Developers - enabling them to expose application logic behind Web Services. It is my expectation that the Data Integration prowess of the Data Web will create a value exchange realm for data architects and other practitioners from the database and data access realms.
A vital component of the new Virtuoso release is the finalization of our SQL to RDF mapping functionality -- enabling the declarative mapping of SQL Data to RDF. Additional technical insight covering other new features (delivered and pending) is provided by Orri Erling, as part of a series of post-Banff posts.
A majority of the world's data (especially in the enterprise realm) resides in SQL Databases. In addition, Open Access to the data residing in said databases remains the biggest challenge to enterprises for the following reasons:
Enterprises have known from the beginning of modern corporate times that data access, discovery, and manipulation capabilities are inextricably linked to the "Real-time Enterprise" nirvana (hence my use of 0.0 before this becomes 3.0).
In my experience, as someone whose operated in the data access and data integration realms since the late '80s, I've painfully observed enterprises pursue, but unsuccessfully attain, full control over enterprise data (the prized asset of any organization) such that data-, information-, knowledge-workers are just a click away from commencing coherent platform and database independent data drill-downs and/or discovery that transcend intranet, internet, and extranet boundaries -- serendipitous interaction with relevant data, without compromise!
Okay, situation analysis done, we move on..
At our most recent (12th June) monthly Semantic Web Gathering, I unveiled to TimBL and a host of other attendees a simple, but powerful, demonstration of how Linked Data, as an aspect of the Semantic Data Web, can be applied to enterprise data integration challenges.
The vision of data, information, or knowledge at your fingertips is nigh! Thanks to the infrastructure provided by the Semantic Data Web (URIs, RDF Data Model, variety of RDF Serialization Formats[1][2][3], and Shared Data Dictionaries / Schemas / Ontologies [1][2][3][4][5]) it's now possible to Virtualize enterprise data from the Physical Storage Level, through the Logical Data Management Levels (Relational), up to a Concrete Conceptual Model (Graph) without operating system, development environment or framework, or database engine lock-in.
We produce a shared ontology for the CRM and Business Reporting Domains. I hope this experiment clarifies how this is quite achievable by converting XML Schemas to RDF Data Dictionaries (RDF Schemas or Ontologies). Stay tuned :-)
Also watch TimBL amplify and articulate Linked Data value in a recent interview.
To deliver a mechanism that facilitates the crystallization of this reality is a contribution of boundless magnitude (as we shall all see in due course). Thus, it is easy to understand why even "her majesty", the queen of England, simply had to get in on the act and appoint TimBL to the "British Order of Merit" :-)
Note: All of the demos above now work with IE & Safari (a "remember what Virtuoso is epiphany") by simply putting Virtuoso's DBMS hosted XSLT engine to use :-) This also applies to my earlier collection of demos from the Hello Data Web and other Data Web & Linked Data related demo style posts.
]]>Of course, this also enables me to provide yet another Semantic Data Web demo in the form of additional viewing perspectives for the aforementioned FAQ (just click to see):
Lee also embarked on a similar embellishment effort re. the SPARQL Query Language FAQ thereby enabling me to also offer alternative viewing perspectives along similar lines:
]]>Note: the enhanced hyperlink (typed data link) lookup presents options to perform an Explore (all data about subject across Domains in the data space i.e. data links to and from Subject), Dereference (specific data in the Subject's Domain i.e. data links originating from subject).
I built these Linked Data Pages by simply doing the following:
The items that follow attempt to demonstrate the point by way of SIOC (Semantically-Interlinked Online Communities Ontology) and MO (Music Ontology) domain exploration:
Linked Data or Dynamic Data Web Pages:
Semantic Web Browser Sessions:
Key point, if you are modeling People, Communities, Organizations, Documents, and other entities in the People, Organizations, Documents etc. Data Space, don't forget to : FOAF-FOAF-FOAF it Up! :-)
]]>Naturally, this triggered an obvious opportunity to demonstrate the prowess of Linked Data on the Semantic Web. What follows is a quick dump of what I sent to the foaf-dev mailing list:
Here are variety of FOAF Views built using:
Enabling you to explore the following lines:
Last week, John Breslin published a post that contained a very nice presentation of what is best described as "Objects of Our Sociality". The presentation provides insight into the elements that collectively drive the creation of People & Data networks (communities). The presentation certainly unveils the often forgotten fact that although People & Data network construction is always socially driven, our intentions aren't always amorous :-)
At the core of the Semantic Data Web vision is the desire to leverage the "network effects" that communities provide, while exponentially reducing the cost of knowledge creation, discovery, and exchange in the process.
In short, the Semantic Data Web ultimately enables us to collectively do our bit for a greater good! Thus, quoting TimBL, "you do your bit and others will do theirs" :-)
]]>The XBRL Ontology Project seeks to address the obvious need to bring structured financial data into the emerging Semantic Data Web as articulated in this excerpt from the inaugural mailing list post:
Read on..." ]]>The parallel evolution of XBRL and the Semantic Web is one of the more puzzling current day technology misnomers:
The Semantic Web expresses a vision about a Web of Data connected by formal meaning (Context). Congruently, XBRL espouses a vision whereby by formally defined Financial Data is accessible via the Web (and other networks). Sadly, we have an abundance of XBRL Taxonomies, pretty wide adoption of the XBRL standard globally, but not a single RDFS Schema or OWL Ontology, derived from said taxonomies, in sight!
As the company's founder, it was quite compelling to read a third party article that accurately navigates and articulates the depth of work that we've undertaken since that seminal moment in 1997 when we decided to extend our product portfolio beyond the Universal Data Access Drivers family.
Of course I also take this opportunity to slip in another Semantic Data Web demo :-) Thus, take a look at this mother of all blog posts from Mike via the following:
Note: In both cases above, you use the "Explore" or "Dereference" options of the Data Link (typed hyperlink) to traverse the RDF data that has been materialized "on the fly" courtesy of Virtuoso's in-built RDF Middleware (called the Sponger).
BTW - I am assembling a collection of interesting DBpedia based Dynamic pages that showcase the depth of knowledge available from Wikipedia. If you're a current or future technology entrepreneur (or VC trying to grok the Semantic Web) then you certainly need to look at:
]]>Now that broader understanding of the Semantic Data Web is emerging, I would like to revisit the issue of "Data Spaces".
A Data Space is a place where Data Resides. It isn't inherently bound to a specific Data Model (Concept Oriented, Relational, Hierarchical etc..). Neither is it implicitly an access point to Data, Information, or Knowledge (the perception is purely determined through the experiences of the user agents interacting with the Data Space.
A Web Data Space is a Web accessible Data Space.
Real world example:
Today we increasing perform one of more of the following tasks as part of our professional and personal interactions on the Web:
John Breslin has nice a animation depicting the creation of Web Data Spaces that drives home the point.
Web Data Space SilosUnfortunately, what isn't as obvious to many netizens, is the fact that each of the activities above results in the creation of data that is put into some context by you the user. Even worse, you eventually realize that the service providers aren't particularly willing, or capable of, giving you unfettered access to your own data. Of course, this isn't always by design as the infrastructure behind the service can make this a nightmare from security and/or load balancing perspectives. Irrespective of cause, we end up creating our own "Data Spaces" all over the Web without a coherent mechanism for accessing and meshing these "Data Spaces".
What are Semantic Web Data Spaces?Data Spaces on the Web that provide granular access to RDF Data.
What's OpenLink Data Spaces (ODS) About?Short History
In anticipation of this the "Web Data Silo" challenge (an issue that we tackled within internal enterprise networks for years) we commenced the development (circa. 2001) of a distributed collaborative application suite called OpenLink Data Spaces (ODS). The project was never released to the public since the problems associated with the deliberate or inadvertent creation of Web Data silos hadn't really materialized (silos only emerged in concreted form after the emergence of the Blogosphere and Web 2.0). In addition, there wasn't a clear standard Query Language for the RDF based Web Data Model (i.e. the SPARQL Query Language didn't exist).
Today, ODS is delivered as a packaged solution (in Open Source and Commercial flavors) that alleviates the pain associated with Data Space Silos that exist on the Web and/or behind corporate firewalls. In either scenario, ODS simply allows you to create Open and Secure Data Spaces (via it's suite of applications) that expose data via SQL, RDF, XML oriented data access and data management technologies. Of course it also enables you to integrates transparently with existing 3rd party data space generators (Blogs, Wikis, Shared Bookmrks, Discussion etc. services) by supporting industry standards that cover:
Thus, by installing ODS on your Desktop, Workgroup, Enterprise, or public Web Server, you end up with a very powerful solution for creating Open Data access oriented presence on the "Semantic Data Web" without incurring any of the typically assumed "RDF Tax".
Naturally, ODS is built atop Virtuoso and of course it exploits Virtuoso's feature-set to the max. It's also beginning to exploit functionality offered by the OpenLink Ajax Toolkit (OAT).
]]>Well, I'll have a crack at helping him out i.e. defining the Semantic Data Web in simple terms with linked examples :-)
Tip: Watch the recent TimBL video interview re. the Semantic Data Web before, during, or after reading this post.
Here goes!
The popular Web is a "Web of Documents". The Semantic Data Web is a "Web of Data". Going down a level, the popular web connects documents across the web via hyperlinks. The Semantic Data Web connects data on the web via hyperlinks. Next level, hyperlinks on the popular web have no inherent meaning (lack context beyond: "there is another document"). Hyperlinks on the Semantic Data Web have inherent meaning (they possess context: "there is a Book" or "there is a Person" or "this is a piece of Music" etc..).
Very simple example:
Click the traditional web document URLs for Dan Connolly and Tim Berners-Lee. Then attempt to discern how they are connected. Of course you will see some obvious connections by reading the text, but you won't easily discern other data driven connections. Basically, this is no different to reading about either individual in a print journal, bar the ability to click on hyperlinks that open up other pages. The Data Extraction process remains labour intensive :-(
Repeat the exercise using the traditional web document URLs as Data Web URIs, this time around, paste the hyperlinks above into an RDF aware Browser (in this case the OpenLink RDF Browser). Note, we are making a subtle but critical change i.e. the URLs are now being used as Semantic Data Web URIs (a small-big-deal kind of thing).
If you're impatient or simply strapped for time (aren't we all these days), simply take a look at these links:
Note: There are other RDF Browsers out there such as:
All of these RDF Browsers (or User Agents) demonstrate the same core concepts in subtly different ways.
If I haven't lost you, proceed to a post I wrote a few weeks ago titled: Hello Data Web (Take 3 - Feel the "RDF" Force).
If you've made it this far, simply head over to DBpedia for a lot of fun :-)
Note Re. my demos: we make use of SVG in our RDF Browser which makes them incompatible with IE (6 or 7) and Safari. That said, Firefox (1.5+), Opera 9.x, WebKit (Open Source Safari), and Camino work fine.
Note to Scoble:
All the Blogs, Wikis, Shared Bookmarks, Image Galleries, Discussion Forums and the like are Semantic Web Data Spaces. The great thing about all of this is that through RSS 2.0's wild popularity, Blogosphere has done what I postulated about a while back: The Semantic Web would be self-annotating, and so it has come to be :-)
To prove the point above: paste your blog's URL into the OpenLink RDF Browser and see it morph into a Semantic Data Web URI (a pointer to Web Data that's you've created) once you click the "Query" button (click on the TimeLine tab for full effect). The same applies to del.icio.us, Flickr, Googlebase, and basically any REST style Web Service as per my RDF Middleware post.
Lazy Semantic Web Callout:
If you're a good animator (pro or hobbyist), please produce an animation of a document going through a shredder. The strips that emerge from the shredder represent the granular data that was once the whole document. The same thing is happening on the Web right now, we are putting photocopies of (X)HTML documents through the shredder (in a good way) en route to producing granular items of data that remain connected to the original copy while developing new and valuable connections to other items of Web Data.
That's it!
]]>From my perspective on things I prefer to align my articulation of the changes that are occurring across our industry (courtesy of the Internet Inflection) to the MVC pattern.
Re. the Web Versions (or Dimensions of Interaction):
The same applies to evolution of Openness:
In the (C)ontroller realm where the focal point is Application Logic, data access issues aren't obvious (*I recall my battles with Richard Stallman re. the appropriate Open Source License variant for iODBC during the embryonic years of database and data access technology on Linux*). Data is an enigma in this realm, unfortunately. This implies that "Data Lock-in" occurs deliberately, but in most cases, inadvertently when we make Application Logic the focal point of everything. Another example is Web 2.0 in which the norm (unfortunately) is to suck in your data, and then refuse to give you complete ownership over how it is used (including the fact that you may want to share it elsewhere).
Open Data is a really big deal which is why the SWEO supported Linking Open Data Project is a very big deal. The good news is that this movement is gathering moment at an exponential rate :-)]]>]]>"There is a potential problem with republication of transformed data, in that right away there may be inconsistency with the original source data. Here provenance tracking (probably via named graphs) becomes a must-have. The web data space itself can support very granular separation. Whatever, data integration is a hard problem. But if you have a uniform language for describing resources, at least it can be possible."
Alex James also chimes in with valuable insights in his post: Sampling the global data model, where he concludes:
"Exactly we need to use projected views, or conceptual models. 'See a projected view can be thought of as a conceptual model that has some mapping to a *sampling* of the global data model.
The benefits of introducing this extra layer are many and varied: Simplicity, URI predictability, Domain Specificity and the ability to separate semantics from lower level details like data mapping.
Unfortunately if you look at todayâs ORMs you will quickly notice that they simply map directly from Object Model to Data Model in one step.
This naïve approach provides no place to manage the mapping to a conceptual model that sampling the worldâs data requires.
What we need to solve the problems Stefano sees is to bring together the world of mapping and semantics. And the place they will meet is simply the Conceptual Model."
Data Integration challenges arise because the following facts hold true all of the time (whether we like it or not):
- Data Heterogeneity is a fact of life at the intranet and internet levels
- Data is rarely clean
- Data Integration prowess are ultimately measured by pain alleviation
- A some point human participation is required, but the trick is to move human activity up the value chain
- Glue code size and Data Integration success are inversely related
- Data Integration is best addressed via "M" rather than "C" (if we use the MVC pattern as a guide. "V" is dead on arrival for the scrappers out there)
In 1997 we commenced the Virtuoso Virtual DBMS Project that morphed into the Virtuoso Universal Server; A fusion of DBMS functionality and Middleware functionality in a single product. The goal of this undertaking remains alleviation of the costs associated with Data Integration Challenges by Virtualizing Data at the Logical and Conceptual Layers.
The Logical Data Layer has been concrete for a while (e.g Relational DBMS Engines), what hasn't reached the mainstream is the Concrete Conceptual Model, but this is changing fast courtesy of the activity taking place in the realm of RDF.
RDF provides an Open and Standards compliant vehicle for developing and exploiting Concrete Conceptual Data Models that ultimately move the Human aspect of the "Data Integration alleviation quest" higher up the value chain.
Some Definitions (as per usual):
RDF Middleware (as defined in this context) is about producing RDF from non RDF Data Sources. This implies that you can use non RDF Data Sources (e.g. (X)HTML Web Pages, (X)HTML Web Pages hosting Microformats, and even Web Services such as those from Google, Del.icio.us, Flickr etc..) as Semantic Web Data Source URIs (pointers to RDF Data).
In this post I would like to provide a similar perspective on this ability to treat non RDF as RDF from RDF Browser perspective.
First off, what's an RDF Browser?
An RDF Browser is a piece of technology that enables you to Browse RDF Data Sources by way of Data Link Traversal. The key difference between this approach and traditional browsing is that Data Links are typed (they possess inherent meaning and context) whereas traditional links are untyped (although universally we have been trained to type them as links to Blurb in the form of (X)HTML pages or what is popularly called "Web Content".).
There are a number of RDF Browsers that I am aware off (note: pop me a message directly of by way of a comment to this post if you have a browser that I am unaware of), and they include (in order of creation and availability):
Each of the browsers above can consume the services of Triplr or the Virtuoso Sponger en route to unveiling a RDF Data that is traversable via URI dereferencing (HTTP GETing the data exposed by the Data Pointer). Thus you can cut&paste the following into each of the aforementioned RDF Browsers:
Since we are all time challenged (naturally!) you can also just click on these permalinks for the OAT RDF Browser demos:
]]>Quick Definitions:
Reasons for the distinction:
Examples:
So what? You may be thinking.
For starters, I can quite easily Mesh data from Googlebase (which emits RSS 2.0 or Atom) and other data sources with the Mapping Services from Yahoo!
I can achieve this in minutes without writing a single line of code. I can do it because of the Data Model prowess of RDF (self-describing instance-data), the data interchange and transformation power of XML and XSLT respectively, the inherent power of XML based Web Services (REST or SOAP), and of course, having a Hybrid Server product like Virtuoso at my disposal that delivers a cross platform solution for exploiting all of these standards coherently.
I can share the self-describing describing data source that serves my Meshup. Try reusing the data presented by a Mashup via the same URL that you used to locate Mashup to get my drift.
Demo Links:
What does this all mean?
"Context" is the catalyst of the burgeoning Data Web (Semantic Web Layer - 1). It's the emerging appreciation of "Context" that is driving the growing desire to increment Web versions from 2.0 to 3.0. It also the the very same "Context" that has been a preoccupation of Semantic Web vision since its inception.
The journey towards a more Semantic Web is all inclusive (all "ANDs" and no "ORs" re. participation).
The Semantic Web is self-annotating. Web 2.0 has provided a huge contribution to the self annotation effort: on the Web we now have Data Spaces for Bookmarks (e.g del.icio.us), Image Galleries ( e.g Flickr), Discussion Forums (remember those comments associated with blog posts? ditto the pingbacks and trackbacks?), People Profiles (FOAF, XFN, del.icio.us, and those crumbling walled-gardens around many Social Networks), and more..
A Web without granular access to Data is simply not a Web worth having (think about the menace of click-fraud and spam).
]]>Web 3.0 & Marketwatch. Excerpted below:
In Web 3.0, I predict, we are going to start seeing roll-ups. We will see a trunk that emerges from the Context, be it film (Netflix), music (iTunes), cooking / food, working women, single parents, … and assembles the Web 3.0 formula that addresses the whole set of needs of a consumer in that Context. Imagine:]]>-I am a petite woman, dark skinned, dark haired, brown eyed. I have a distinct personal style, and only certain designers resonate with it (Context).
-I want my personal SAKS Fifth Avenue which carries clothes by those designers, in my size (Commerce).
-I want my personal Vogue, which covers articles about that Style, those Designers, and other emerging ones like them (Content).
I want to exchange notes with others of my size-shape-style-psychographic and discover what else looks good. I also want the recommendation system tell me what they’re buying (Community)
There’s also some basic principles of what looks good based on skin tone, body shape, hair color, eye color … I want the search engine to be able to filter and match based on an algorithm that builds in this knowledge base (Personalization, Vertical Search).
Now, imagine the same for a short, fat man, who doesn’t really have a sense of what to wear. And he doesn’t have a wife or a girl-friend. Before Web 3.0, he could go to the personal shopper at Nordstrom.
With Web 3.0, the internet will be his Personal Shopper.
(Via Read/Write Web.)
Web 3.0: When Web Sites Become Web Services: "
.....As more and more of the Web is becoming remixable, the entire system is turning into both a platform and the database. Yet, such transformations are never smooth. For one, scalability is a big issue. And of course legal aspects are never simple.'
But it is not a question of if web sites become web services, but when and how. APIs are a more controlled, cleaner and altogether preferred way of becoming a web service. However, when APIs are not avaliable or sufficient, scraping is bound to continue and expand. As always, time will be best judge; but in the meanwhile we turn to you for feedback and stories about how your businesses are preparing for 'web 3.0'.
We are hitting a little problem re. Web 3.0 and Web 2.0, naturally :-) Web 2.0 is one of several (present and future) Dimensions of Web Interaction that turns Web Sites into Web Services Endpoints; a point I've made repeatedly [1] [2] [3] [4] across the blogosphere, in addition to my early futile attempts to make the Wikipedia's Web 2.0 article meaningful (circa 2005), as per the Wikipedia Web 2.0 Talk Page excerpt below:
Web 2.0 is a web of executable endpoints and well formed content. The executable endpoints and well formed content are accessible via URIs. Put differently, Web 2.0 is a web defined by URIs for invoking Web Services and/or consuming or syndicating well formed content.
Hopefully, someone with more time on their hands will expand on this ( I am kinda busy)
.BTW - Web 2.0 being a platform doesn't distinguish it in anyway from Web 1.0. They are both platforms, the difference comes down to platform focus and mode of experience.
Web 3.0 is about Data Spaces: Points of Semantic Web Presence that provide granular access to Data, Information, and Knowledge via Conceptual Data Model oriented Query Languages and/or APIs.
The common denominator across all the current and future Web Interaction Dimensions is HTTP. While their differences are as follows:
Examples of Web 3.0 Infrastructure:
Web 3.0 is not purely about Web Sites becoming Web Services endpoints. It is about the "M" (Data Model) taking it's place in the MVC pattern as applied to the Web Platform.
I will repeat myself yet again:
The Devil is in the Details of the Data Model. Data Models make or break everything. You ignore data at your own peril. No amount of money in the bank will protect you from Data Ignorance! A bad Data Model will bring down any venture or enterprise, the only variable is time (where time is directly related to your increasing need to obtain, analyze, and then act on data, over repetitive operational cycles, that have ever decreasing intervals).
This applies to the Real-time enterprise of Information and/or knowledge workers and Real-time Web Users alike.
BTW - Data Makes Shifts Happen (spotter: Sam Sethi).
]]>The examples above combine OAT and Exhibit. OAT handles the binding to SPARQL.
Here is a pure OAT variation of the prior examples that includes an enhanced anchor (hyperlink) feature that enables a variety of traversal behaviors and actions against the same RDF Data:
Note: Use the "dereference option" (retrieve/get data associated with URI) for maximum effect. The "explore" is useful after you've dereferenced a few URIs. Also note that columns are resizable, like those in a spreadsheet, which also implies dynamic sorting capability.
]]>PREFIX dbpedia: <http://dbpedia.org/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?name ?birth ?death FROM <http://dbpedia.org> WHERE { ?person dbpedia:birthplace <http://dbpedia.org/resource/Berlin> . ?person dbpedia:birth ?birth . ?person foaf:name ?name . ?person dbpedia:death ?death FILTER (?birth < "1900-01-01"^^xsd:date and bif:contains (?name, 'otto')) . } ORDER BY ?name
You can test further using our SPARQL Endpoint for DBpedia or via the DBPedia bound Interactive SPARQL Query Builder or just click *Here* for results courtesy of the SPARQL Protocol (REST based Web Service).
Note: This is in-built functionality as Virtuoso has possessed Full Text Indexing since 1998-99. This capability applies to physical and virtual graphs managed by Virtuoso.
A per usual, there is more to come as we now have a nice intersection point for SPARQL and XQuery/XPath since Triple Objects (the Literal variety) can take the form of XML Schema based Complex Types :-) A point I alluded too in my podcast interview with Jon Udell last year (*note: mechanical turk based transcript is bad*). The point I made went something like this: "...you use SPARQL to traverse the typed links and then use XPath/XQuery for further granular access to the data if well-formed..."
Anyway, the podcast interview lead to this InfoWorld article titled: Unified Data Theory.
Web 2.0 commentators such as Mike Arrington, and as mentioned above,Tim O'Reilly, both blogged about the imminent release of Freebase earlier today. Although I haven't looked at this database yet, it is crystal clear to me that it is one of many Web Databases to come. Others that I am personally familiar with, and involved in, include: DBpedia (Wikipedia as a true Database) and Zitgist (soon to be unveiled).
All of these databases mark the crystallization of the "Data Web" and the imminence of what is increasingly referred to as Web 3.0.
I certainly hope that all web 3.0 Database Providers keep the data Open, adhere to Web Best Practice recipes for sharing and publishing data, and generally make the process of data, information, and knowledge discovery via the Web much easier.
]]>Using Solvent to extract data from structured pages: "
I’ve put together a short tutorial on Solvent, a very nice web page parsing utility. It is still a little rough around the edges, but I wanted to throw it out there and continue working on it since there isn’t a whole lot of existing documentation.
"(Via Wing Yung.)
After reading the interesting post above I quickly (and quite easily) knocked together a "Dynamic Data Web Page for Major League Baseball" using data from the Virtuoso hosted edition of dbpedia. Just click on the "Explore" option whenever you click on a URI of interest. Enjoy!
]]>Linking personal posted content across communities: "
With the help of Kingsley, Uldis and I have been looking at how SIOC can be used to link the content that a single person posts to a number of community sites. The picture below shows an example of stuff that Iâve created on Flickr, YouTube, etc. through my various user identities on those sites (these match some SIOC types that we want to add to a separate module). We can also say that each Web 2.0 content item is a user-contributed post, with some attached or embedded content (e.g. a file or maybe just some metadata). This is part of a new discussion on the sioc-dev mailing list, and weâd value your contributions.
Edit: The inner layer is a person (semantically described in FOAF), the next layer is their user accounts (described in FOAF, SIOC) and the outer layer is the posted content - text, files, associated metadata - on community sites (again described using SIOC).
No Tags"(Via John Breslin - Cloudlands.)
The point that John is making about the Data Web and Interlinked Data Spaces exposed via URIs (e.g Personal URIs), crystallizes a number of very important issues about the Data Web that may remain unclear. I am hoping that by digesting the post excerpt above, in conjunction with the items below, aids the pursuit of clarity and comprehension about the all important Data Web (Semantic Web - Layer 1):
Examples of some of these principles in practice:
And of course there is more to come such as Grandma's Semantic Web Browser which is coming from Zitgist LLC (pronounced: Zeitgeist) a joint venture of OpenLink Software and Frederick Giasson.
]]>Here is what I was I was able to knock together using my SPARQL QBE (without writing the SPARQL by hand):
Note: Just select the "Explore" option when the link-lookup window appears in response to you clicking on any of the links. That said, if you are using the Firefox Linkification extension the page will not work properly (as per this discussion about disabling Linkification) :-(
BTW - I have a comments page, so don't be shy about showing me how you could produce this kind of data driven web page much quicker than I have :-)
Warning: IE6 and Safari (use Webkit instead) cannot process these pages due to the use of Ajax.
]]>Rich Internet Applications ultimately enable intelligent processing of self-describing databases originating from data servers as demonstrated by these examples:
In this third take on my introduction to the Data Web I would like to share a link with you (a Dynamic Start Page in Web 2.0 parlance) with a Data Web twist: You do not have to preset the Start Page Data Sources (this is a small-big thing, if you get my drift, hopefully!).
Here are some Data Web based Dynamic Start Pages that I have built for some key play ers from the Semantic Web realm (in random order):
"These are RDF prepped Data Sources....", you might be thinking, right? Well here is the reminder: The Data Web is a Global Data Generation and Integration Effort. Participation may be active (Semantic Web & Microformats Community), or passive (web sites, weblogs, wikis, shared bookmarks, feed subscription, discussion forums, mailing lists etc..). Irrespective of participation mode, RDF instance can be generated from close to anything (I say this because I plan to add binary files holding metadata to this mix shortly). Here are examples of Dynamic Start Pages for non RDF Data Sources:
what about Microformats you may be wondering? Here goes:
Let's carry on.
How about some traditional Web Sites? Here goes:
And before I forget, here is My Data Web Start Page .
Due to the use of Ajax in the Data Web Start Pages, IE6 and Safari will not work. For Mac OS X users, Webkit works fine. Ditto re. IE7 on Windows.
]]>Play Date: What is that thing on the Wall? My Son: Security Alarm Play Date: How does it work My Son: If you click on that top button and then open the door, I will have to enter a code when we come back in or the alarm will go off Play Date: What is the code? My Son: I can't tell you that! Play Date: Why not? My Son: You might come and steal something from our house! Play Date: No I won't! My Son: Well, you might tell someone that might come and steal something from our house! or that person could tell someone who could tell someone that would steal from our house
LOL!! of course! At the same time wondering, how come a majority of adults don't quite see the need for granular access to Web Data in a manner that enables computers and humans to collectively arrive at similar decisions?
Putting Data in context en route to producing actionable knowledge is a transient endeavor that engages a myriad of human senses. We demonstrate comprehension of this fact in our daily existence as social creatures (at a very early age as depicted above). That said, we seem to forget this fact when engaging the Web: If we can't see it then it can't be valuable.
BTW - I just received a ping about the "Sensory Web" (which is just another way of describing a Data Driven Web experience from my vantage point.)
In the popular M-V-C pattern you don't see the "M", but the "M" will kill you if you get it wrong (it is the FORCE)! Coming to think about it, the pattern could have been coined: V-C-M or C-M-V, but isn't for obvious reasons :-)
RDF is the vehicle that enables us tap into the Data aspect of the Web. We started off with pages of blurb linked via hypertext (Web 1.0) and then looked to "Keywords" for some kind of data access; we then isolated some "Verbs" and discovered another dimension of Web Interaction (Web 2.0) but looked to these "Verbs" for data access which left us with Mashups; and now we are starting to extract "Nouns" and "Adjectives" from sentences (Subject, Predicate, Object - Triples) associated with resources on the Web (Data Web / Web 3.0 / Semantic Web Layer 1) which provides a natural data access substrate for Meshups (natural joining of disparate data from a plethora of data sources) while providing the foundation layer for the Semantic Web.
For those who need use-cases that demonstrate tangible value re. the Semantic Web, here are some projects to note courtesy of the Semantic Web Education and Outreach (SWEO) interest group:
XMP and microformats revisited: "
Yesterday I exercised poetic license when I suggested that Adobeâs Extensible metadata platform (XMP) was not only the spiritual cousin of microformats like hCalendar but also, perhaps, more likely to see widespread use in the near term. My poetic license was revoked, though, in a couple of comments:
Mike Linksvayer: How someone as massively clued-in as Jon Udell could be so misled as to describe XMP as a microformat is beyond me.
Danny Ayers: Like Mike I donât really understand Jonâs references to microformats - I first assumed he meant XMP could be replaced with a uF.
Actually, Iâm serious about this. If I step back and ask myself what are the essential qualities of a microformat, itâs a short list:
Mike notes:
XMP is embedded in a binary file, completely opaque to nearly all users; microformats put a premium on (practically require) colocation of metadata with human-visible HTML.
Yes, I understand. And as someone who is composing this blog entry as XHTML, in emacs, using a semantic CSS tag that will enable me to search for quotes by Mike Linksvayer and find the above fragment, Iâm obviously all about metadata coexisting with human-readable HTML. And Iâve been applying this technique since long before I ever heard the term microformats â my own term was originally microcontent.
(Via Jon Udell.)
I believe Jon is acknowledging the fact that the propagation of metadata in "Binary based" Web data sources is no different to the microformats based propagation that is currently underway in full swing across the "Text based" Web data sources realm. He is reiterating the fact that the Web is self-annotating (exponentially) by way of Metadata Embedding. And yes, what he describes is a similar to Microformats in substance and propagation style :-)
Here is what I believe Jon is hoping to see:
My little "Hello Data Web!" meme was about demonstrating a view that Danny has sought for a while: unobtrusive meshing of microformats and RDF via GRDDL and SPARQL binding that simply eliminates the often perceived "RDF Tax". Danny, Jon, myself, and many others have always understood that making the Data Web (Web of RDF Instance Data) more of a Force (Star Wars style) is the key to unravelling the power of the "Web as a Database". Of course, we also tend the describe our nirvana in different ways that sometimes obscures the fundamental commonality of vision that we all share.
Personally, I believe everyone should simply "feel the force" or observe "the bright and dark sides of the force" that is RDF. When this occurs en masse there will be a global epiphany (similar to what happened around the time of the initial unveiling of the Web of Hypertext). Jon's meme brings the often overlooked realm of binary based metadata sources into the general discourse.
JBinary Files as bona fide Data Web URIs (i.e. Metadata Sources) is much closer than you think :-) I should have my "Hello Data Web of Binary Data Sources" unveiled very soon!
Once you grasp the concept of entering values into the "Default Data Source URI field", take a look at: http://programmableweb.com and other URIs (hint: scroll through the results grid to the QEDWiki demo item)
]]>What I am demonstrating is how existing Web Content hooks transperently into the "Data Web". Zero RDF Tax :-) Everything is good!
Note: Please look to the bottom of the screen for the "Run Query" Button. Remember, it not quite Grandma's UI but should do for Infonauts etc.. A screencast will follow.
]]>OAT: OpenAjax Alliance Compliant Toolkit: "
Ondrej Zara and his team at Openlink Software have created a Openlink Software JS Toolkit, known as OAT. It is a full-blown JS framework, suitable for developing
rich applications with special focus to data access.
OAT works standalone, offers vast number of widgets and has some rarely seen features, such as on-demand library loading (which reduces the total amount of downloaded JS code).
OAT is one of the first JS toolkits which show full OpenAjax Alliance conformance: see the appropriate wiki page and conformance test page.
There is a lot to see with this toolkit:
You can see some of the widgets in a Kitchen sink application
Sample data access applications:
OAT is Open Source and GPLâed over at sourceforge and the team has recently managed to incorporate our OAT data access layer as a
module to dojo datastore.
(Via Ajaxian Blog.)
This is a corrected version of the initial post. Unfortunately, the initial post was inadvertently littered with invalid links :-( Also, since the original post we have released OAT 1.2 that includes integration of our iSPARQL QBE into the OAT Form Designer application.
Re. Data Access, It is important to note that OAT's Ajax Database Connectivity layers supports data binding to the following data source types:
OAT also includes a number of prototype applications that are completely developed using OAT Controls and Libraries:
Note: Pick "Local DSN" from page initialization dialog's drop-down list control when prompted
]]>The Data in Fred's post is based on FOAF Ontology instance data generated from a myriad of Data Sources.
]]>The Semantic Web is about granular exposure of the underlying web-of-data that fuels the World Wide Web. It models "Web Data" using a Directed Graph Data Model (back-to-the-future: Network Model Database) called RDF.
In line with contemporary database technology thinking, the Semantic Web also seeks to expose Web Data to architects, developers, and users via a concrete Conceptual Layer that is defined using RDF Schema.
The abstract nature of Conceptual Models implies that actual instance data (Entities, Attributes, and Relationships/Associations) occurs by way of "Logical to Conceptual" schema mapping and data generation that can involve a myriad of logical data sources (SQL, XML, Object databases, traditional web content, RSS/Atom feeds etc.). Thus, by implication, it is safe assume that the Semantic Web's construction is basically a Data Integration and exposure effort. The point that Stefano alludes to in the blog post excerpts that follow:
The semantic web is really just data integration at a global scale. Some of this data might end up being consistent, detailed and small enough to perform symbolic reasoning on, but even if this is the case, that would be such a small, expensive and fragile island of knowledge that it would have the same impact on the world as calculus had on deciding to invade Iraq.
The biggest problem we face right now is a way to 'link' information that comes from different sources that can scale to hundreds of millions of statements (and hundreds of thousands of equivalences). Equivalences and subclasses are the only things that we have ever needed of OWL and RDFS, we want to 'connect' dots that otherwise would be unconnected. We want to suggest people to use whatever ontology pleases them and then think of just mapping it against existing ones later. This is easier to bootstrap than to force them to agree on a conceptualization before they even know how to start!
Additional insightful material from Stefano:
Benjamin Nowack also chimes into this conversation via his simple guide to understanding Data, Information, and Knowledge in relation so the Semantic Web.
]]>Our loosely coupled webs of hypertext, services, and data present an intriguing realm of perpetually expanding and contracting clusters (aka conversations as exemplified by digg swarms). The only issue we have today is that you cannot perceive the aforementioned realm through the lenses of the Hypertext- or Interactive-Web or the API oriented Services-Web. Which is why we need a new frontier in the web innovation continuum. A frontier that unveils, with clarity, the somewhat unperceived realm of "People and Data Networks" en route to simplifying "Network Effects" exploitation: spotting, connecting to, and constructing conversation clusters.
Once again, this is what the Semantic Web facilitates by delivering a Data Model that exposes these "People & Data Networks". When you write a blog post, comment on a blog post, share bookmarks, tag resources, share and tag photos etc. You are contributing links and nodes to this network :-)
]]>A new and exciting project in the Semantic Web area: The Music Ontology, by Frederic Giasson (PTSW, TalkDigger).
Its goal is to provide a vocabulary to describe Artists, Releases, Songs and so on in RDF. It is mainly based on the MusicBrainz Metadata Vocabulary, but with new improvements as defining relationships between artists and links to external services. And, most important thing, a lot of triples from the current MusicBrainz database should be available in a few weeks. A mailing-list has been launched for discussions and improvements.
I was waiting for this kind of vocabulary (and data) for some time (as I
never took time to look as MBz database export) especially to easilly find all covers of
a given song. From another point of view, I'll be happy to use it to represent
- and query - various releases of a given record (using the
mo:other_release_of property
), especially for vynil records with
reissues (so what about a mo:reissue
property ?) with
different colors, inner sleeve ...
Well, finally what about converting the FLEX book in RDF to query this huge punk and hardcore database (and use its URIs for want-lists) ?
"(Via Alexandre Passant - Terraces.)
]]>SPARQL (query language for the Semantic Web) basically enables me to query a collection of typed links (predicates/properties/attributes) in my Data Space (ODS based of course) without breaking my existing local bookmarks database or the one I maintain at del.icio.us.
I am also demonstrating how Web 2.0 concepts such as Tagging mesh nicely with the more formal concepts of Topics in the Semantic Web realm. The key to all of this is the ability to generate RDF Data Model Instance Data based on Shared Ontologies such as SIOC (from DERI's SIOC Project) and SKOS (again showing that Ontologies and Folksonomies are complimentary).
This demo also shows that Ajax also works well in the Semantic Web realm (or web dimension of interaction 3.0) especially when you have a toolkit with Data Aware controls (for SQL, RDF, and XML) such as OAT (OpenLink Ajax Toolkit). For instance, we've successfully used this to build a Visual Query Building Tool for SPARQL (alpha) that really takes a lot of the pain out of constructing SPARQL Queries (there is much more to come on this front re. handling of DISTINCT, FILTER, ORDER BY etc..).
For now, take a look at the SPARQL Query dump generated by this SIOC & SKOS SPARQL QBE Canvas Screenshot.
You can cut and paste the queries that follow into the Query Builder or use the screenshot to build your variation of this query sample. Alternatively, you can simply click on *This* SPARQL Protocol URL to see the query results in a basic HTML Table. And one last thing, you can grab the SPARQL Query File saved into my ODS-Briefcase (the WebDAV repository aspect of my Data Space).
Note the following SPARQL Protocol Endpoints:
My beautified Version of the SPARQL Generated by QBE (you can cut and paste into "Advanced Query" section of QBE) is presented below:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX dct: <http://purl.org/dc/elements/1.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT distinct ?forum_name, ?owner, ?post, ?title, ?link, ?url, ?tag FROM <http://myopenlink.net/dataspace> WHERE { ?forum a sioc:Forum; sioc:type "bookmark"; sioc:id ?forum_name; sioc:has_member ?owner. ?owner sioc:id "kidehen". ?forum sioc:container_of ?post . ?post dct:title ?title . optional { ?post sioc:link ?link } optional { ?post sioc:links_to ?url } optional { ?post sioc:topic ?topic. ?topic a skos:Concept; skos:prefLabel ?tag}. }
Unmodified dump from the QBE (this will be beautified automatically in due course by the QBE):
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX dct: <http://purl.org/dc/elements/1.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?var8 ?var9 ?var13 ?var14 ?var24 ?var27 ?var29 ?var54 ?var56 WHERE { graph ?graph { ?var8 rdf:type sioc:Forum . ?var8 sioc:container_of ?var9 . ?var8 sioc:type "bookmark" . ?var8 sioc:id ?var54 . ?var8 sioc:has_member ?var56 . ?var9 rdf:type sioc:Post . OPTIONAL {?var9 dc:title ?var13} . OPTIONAL {?var9 sioc:links_to ?var14} . OPTIONAL {?var9 sioc:link ?var29} . ?var9 sioc:has_creator ?var37 . OPTIONAL {?var9 sioc:topic ?var24} . ?var24 rdf:type skos:Concept . OPTIONAL {?var24 skos:prefLabel ?var27} . ?var56 rdf:type sioc:User . ?var56 sioc:id "kidehen" . } }
Current missing items re. Visual QBE for SPARQL are:
Quick Query Builder Tip: You will need to import the following (using the Import Button in the Ontologies & Schemas side-bar);
Browser Support: The SPARQL QBE is SVG based and currently works fine with the following browsers; Firefox 1.5/2.0, Camino (Cocoa variant of Firefox for Mac OS X), Webkit (Safari pre-release / advanced sibling), Opera 9.x. We are evaluating the use of the Adobe SVG plugin re. IE 6/7 support.
Of course this should be a screencast, but I am the middle of a plethora of things right now :-)
]]>I came across a pretty deep comments trail about the aforementioned items on Fred Wilson's blog (aptly titled: A VC) under the subject heading: Web 3.0 Is The Semantic Web.
Contributions to the general Semantic Web discourse by way of responses to valuable questions and commentary contributed by a Semantic Web skeptic (Ed Addison who may be this Ed Addison according to Google):
Ed, Responses to your points re. Semantic Web Matrialization:<< 1) ontologies can be created and maintained by text extractors and crawlers" >>
Ontologies will be developed by Humans. This process has already commenced and far more landscape has been covered that you may be aware of. For instance, there is an Ontology for Online Communities with Semantics factored in. More importantly, most Blogs, Wikis, and other "points of presence" on the Web are already capable of generating Instance Data for this Ontology by way of the underlying platforms that drive these things. The Ontology is called: SIOC (Semantically-Interlinked Online Communities).
<< 2) the entire web can be marked up, semantically indexed, and maintained by spiders without human assistance >>
Most of it can, and already is :-) Human assistance should, and would, be on an "exception basis" a preferred use of human time (IMHO). We do not need to annotate the Web manually when this labor intensive process can be automated (see my earlier comments).
<< 3) inference over the semantic web does not require an extremely deep heuristic search down multiple, redundant, cyclical pathways with many islands that are disconnected >>
When you have a foundation layer of RDF Data (generated in the manner I've discussed above), you then have a substrate that's far more palatable to Intelligent Reasoning. Note, the Semantic Web is made of many layers. The critical layer at this juncture is the Data-Web (Web of RDF Data). Note, when I refer to RDF I am not referring to RDF/XML the serialization format, I am referring to the Data Model (a Graph).
<< 4) the web becomes smart enough to eliminate websites or data elements that are incorrect, misleading, false, or just plain lousy >>
The Semantic Web vision is not about eliminating Web Sites (The Hypertext-Document-Web). It is simply about adding another dimension of interaction to the Web. This is just like the Services-Web dimension as delivered by Web 2.0.
We are simply evolving within an innovation continuum. There is no mutual exclusivity about any of the Web Dimensions since they collectively provide us with a more powerful infrastructure for building and exploiting "collective wisdom".
As for the Data-Web experiment part of this post, I would expect to see this post exposed as another contribution to the Data-Web via the PingTheSemanticWeb notification service :-) Implying, that all the relevant parts of this conversation are in a format (Instance Data for the SIOC Ontology) that is available for further use in a myriad of forms.
]]>Web Me2.0 -- Exploding the Myth of Web 2.0:"Many people have told me this week that they think 'Web 2.0' has not been very impressive so far and that they really hope for a next-generation of the Web with some more significant innovation under the hood -- regardless of what it's called. A lot of people found the Web 2.0 conference in San Francisco to be underwhelming -- there was a lot of self-congratulation by the top few brands and the companies they have recently bought, but not much else happening. Where was all the innovation? Where was the focus on what's next? It seemed to be a conference mainly about what happened in the last year, not about what will happen in the coming year. But what happened last year is already so 'last year.' And frankly Web 2.0 still leaves a lot to be desired. The reason Tim Berners-Lee proposed the Semantic Web in the first place is that it will finally deliver on the real potential and vision of the Web. Not that today's Web 2.0 sucks completely -- it only sort of sucks. It's definitely useful and there are some nice bells and whistles we didn't have before. But it could still suck so much less!"
Web 2.0 is a (not was) a piece of the overall Web puzzle. The Data Web (so called Web 3.0) is another critical piece of this puzzle, especially as it provides the foundation layer (Layer 1) of the Semantic Web.
Web 2.0 was never about "Open Data Access", "Flexible Data Models", or "Open World" meshing of disparate data sources built atop disparate data schemas (see: Web 2.0's Open Data Access Conundrum). It was simply about "Execution and APIs". I already written about "Web Interaction Dimensions", but you call also look at the relationship of the currently perceived dimensions through the M-V-C programming pattern:
Another point to note, Social Networking is hot, but nearly every social network that I know (and I know and use most of them) suffers from an impedance mismatch between the service(s) they provide (social networks) and their underlying data models (in many cases Relational as opposed to Graph). Networks are about Relationships (N-ary) and your cannot effectively exploit the deep potential of: "Network Effects" (Wisdom of Crowds, Viral Marketing etc..) without a complimentary data model, you simply can't.
Finally, the Data Web is already here, I promised a long time ago (Internet Time) that the manifestation of the Semantic Web would occur unobtrusively, meaning, we will wake up one day and realize we are using critical portions of the Semantic Web (i.e. Data-Web) without even knowing it. Guess what? It's already happening. Simple case in point, you may have started to notice the emergence of SIOC gems in the same way you may have observed those RSS 2.0 gems at the dawn of Web 2.0. What I am implying here is that the real question we should be asking is: Where is the Semantic Web Data? And how easy or difficult will it be to generate? And where are the tools? My answers are presented below:
Next stop, less writing, more demos, these are long overdue! At least from my side of the fence :-) I need to produce a little step-by-guide oriented screencasts that demonstrates how Web 2.0 meshes nicely with the Data-Web.
Here are some (not so end-user friendly) examples of how you can use SPARQL (Data-Web's Query Language) to query Web 2.0 Instance Data projected through the SIOC Ontology:
Note: You can use the online SPARQL Query Interface at: http://demo.openlinksw.com/isparql.
Other Data-Web Technology usage demos include:
Amongst the numerous comments about this subject, I felt most compelled to respond to the commentary from Tim O'Reilly (based on his proximity to Web 2.0 etc..) in relation to his view that the NYT's Web 3.0 = Collective Intelligence Harnessing aspect of his Web 2.0 meme.
My response is dumped semi-verbatim below:
Tim,
A few things:
- We are in an innovation continuum
- The Web as a medium of innovation will evolve forever
- Different commentators have different views about monikers associated with these innovations
- To say Web 3.0 (aka the Data Web or Semantic Web - Layer 1) is what Web 2.0's collective intelligence is all about is a little inaccurate (IMHO); Web 2.0 doesn't provide "Open Data Access"
- Web 2.0 is a "Web of Services" primarily, a dimension of "Web Interaction" defined by interaction with Services
- Web 3.0 ("Data Web" or "Web of Databases" or "Semantic Web - Layer 1") is a Web dimension that provides "Open Data Access" that will be exemplified by the transition from "Mash-ups" (brute force data joining) to "Mesh-ups" (natural data joining)
The original "Web of Hypertext" or "Interactive Web", the current "Web of Services", and the emerging "Data Web" or "Web of Databases" collectively provide dimensions of interaction in the innovation continuum called the Web.
There are many more dimensions to come. Monikers come and go, but the retrospective "Long Shadow" of Innovation is ultimately timeless.
"Mutual Inclusivity" is a critical requirement for truly perceiving these "Web Interaction Dimensions" ("Participation" if I recall). "Mutual Exclusivity" on the other hand, simpy leads to obscuring reality with Versionitis as exemplified by the ongoing: Web 1.0 vs 2.0 vs 3.0 debates.
BTW - I enjoyed reading Nick Carr's take on the Web 3.0 meme, especially his "tongue in cheek" power-grab for the rights to all "Web 3.0" Conferences etc. :-)
]]>From the ISWC event (#swig) I just located a presentation by TimBL titled: Semantic Web & Web 2.0
Key Excerpt: Web 2.0 and the Semantic Web work "Well Apart" and "Great Together".
]]>We have lately been busy with RDF scalability. We work with the 8000 university LUBM data set, a little over a billion triples. We can load it in 23h 46m on a box with 8G RAM. With 16G we probably could get it in 16h.
The resulting database is 75G, 74 bytes per triple which is not bad. It will shrink a little more if explicitly compacted by merging adjacent partly filled pages. See Advances in Virtuoso Triple Storage for an in-depth treatment of the subject.
The real question of RDF scalability is finding a way of having more than one CPU on the same index tree without them hitting the prohibitive penalty of waiting for a mutex. The sure solution is partitioning, would probably have to be by range of the whole key. but before we go to so much trouble, well look at dropping a couple of critical sections from index random access. Also some kernel parameters may be adjustable, like a spin count before calling the scheduler when trying to get an occupied mutex. Still we should not waste too much time on platform specifics. Well see.
We just updated the Virtuoso Open Source cut. The latest RDF refinements are not in, so maybe the cut will have to be refreshed shortly.
We are also now applying the relational to RDF mapping discussed in Declarative SQL Schema to RDF Ontology Mapping to the ODS applications.
There is a form of the mapping in the VOS cut on the net but it is not quite ready yet. We must first finish testing it through mapping all the relational schemas of the ODS apps before we can really recommend it. This is another reason for a VOS update in the near future.
We will be looking at the query side of LUBM after the ISWC 2006 conference. So far, we find queries compile OK for many SIOC use cases with the cost model that there is now. A more systematic review of the cost model for SPARQL will come when we get to the queries.
We put some ideas about inferencing in the Advances in Triple Storage paper. The question is whether we should forward chain such things as class subsumption and subproperties. If we build these into the SQL engine used for running SPARQL, we probably can do these as unions at run time with good performance and better working set due to not storing trivial entailed triples. Some more thought and experimentation needs to go into this.
]]>
Geonames announced the release of its Geonames ontology v1.2. The new ontology has few enhancements. It introduced the notion of linked data and made clear distinction between URI that intended for linking documents and for linking ontology concepts.
Different types of geospatial data are of different spatial granularity. Data of different spatial granularity may relate to each other by the containment relation. For example, countries contain states, states contains cities and so on. Some geospatial data are of the similar spatial granularity (e.g., two cities that are nearby each other, or two countries that are neighboring each other). To support the knowledge representation of these relationships, the ontology introduced three new properties: childreanFeatures, nearbyFeatures and neighbouringFeatures.
In the Semantic Web, both ontology concepts and physical web documents are linked by URI. Sometimes in applications, it’s useful to make clear whether the use of a URI is intended for linking documents or for linking ontology concepts. The new Geonames ontology introduced a URI convention for identifying the intended usage of a URI. This convention also simplifies the discovering of geospatial data using Geonames web services.
Here is an example:
Other interesting ontology properties include wikipediaArticle and locationMap. The former links a Feature instance to a Web article on Wikipedia, and the latter links a Feature instance to a digital map Web page.
For additional information about Geonames ontology v1.2, see Marc’s post at the Geonames blog.
"(Via Geospatial Semantic Web Blog.)
]]>A declarative language adapted from SPARQL's graph pattern language (N3/Turtle) for mapping SQL Data to RDF Ontologies. We currently refer to this as a Graph Pattern based RDF VIEW Definition Language.
It provides an effective mechanism for exposing existing SQL Data as virtual RDF Data Sets (Graphs) negating the data duplication associated with generating physical RDF Graphs from SQL Data en route to persistence in a dedicated Triple Store.
Enterprise applications (traditional and web based) and most Web Applications (Web 1.0 and Web 2.0) sit atop relational databases, implying that SQL/RDF model and data integration is an essential element of the burgeoning "Data Web" (Semantic Web - Layer 1) comprehension and adoption process.
In a nutshell, this is a quick route for non disruptive exposure of existing SQL Data to SPARQL supporting RDF Tools and Development Environments.
CREATE GRAPH IRI("http://myopenlink.net/dataspace")
CREATE IRI CLASS odsWeblog:feed_iri "http://myopenlink.net/dataspace/kidehen/weblog/MyFeeds" ( in memb varchar not null, in inst varchar not null)
What is Blogosphere 2.0 anyway?
Blog clusters that incorporate the "Open Data Access" dimension to their usage pattern via content exported as RDF Data Sets or Virtual RDF Data Sets (as demonstrated by the OpenLink Data Spaces SIOC Reference). In either scenario, the RDF rendition of blog content is accessible for to ad-hoc querying via SPARQL (btw - checkout this cool SPARQL FAQ).
The really fascinating thing about the "Blgosophere 2.0" is that the transition from "Blogosphere 1.0" is going to be transparent! The "Open Data Access" will actually do the talking etc..
]]>]]>We have made new benchmarks with loading the 47 million triples of the Wikipedia links data set. So far, our best result is 40 minutes with a dual core Xeon with 8G memory. This comes to about 18000 triples per second with between 1.2 and 2 CPU cores busy, slightly depending on configuration parameters. Our previous best result was with a dual 1.6GHz SPARC with 7700 triples per second on loading the 2M triple Wordnet data set.
These are memory based speeds. We have implemented an automatic background compaction for database tables and have tried the Wikipedia load with and without. The CPU cost of the compaction was about 10% with a slight gain in real time due to less IO.
But the real deal remains IO. With the compaction on, we got 91 bytes per triple, all included, i.e. two indices on the triples table, dictionaries from IRI IDs to URIs etc. The compaction is rather simple, it just detects adjacent dirty pages about to be written to disk and sees if the set of contiguous dirty pages would fit on fewer pages than they now take. If so, it rewrites the pages and frees the ones left over. It does not touch clean pages. With some more logic it could also compact clean pages, provided the result did not have more dirty pages than the initial situation. With more aggressive compaction we will get about 75 bytes per triple. We will try this.
But the real gains will come from index compression with bitmaps. For the Wikipedia data set, this will cut one of the indices to about a third of its current size. This is also the index with the more random access, so the benefit is compounded in terms of working set. At that point we will be looking at about 50 bytes per triple. We will see next week how this works with the LUBM RDF benchmark.
Recent Virtuoso Developments: "
We have been extensively working on virtual database refinements. There aremany SQL cost model adjustments to better model distributed queries and wenow support direct access to Oracle and Informix statistics system tables.Thus, when you attach a table from one or the other, you automatically getup to date statistics. This helps Virtuoso optimize distributed queries.Also the documentation is updated as concerns these, with a new section ondistributed query optimization.
On the applications side, we have been keeping up with the SIOC RDF ontologydevelopments. All ODS applications now make their data available as SIOCgraphs for download and SPARQL query access.
What is most exciting however is our advance in mapping relational data intoRDF. We now have a mapping language that makes arbitrary legacy data in Virtuoso or elsewhere in the relational world RDF queriable. We will putout a white paper on this in a few days.
Also we have some innovations in mind for optimizing the physical storage ofRDF triples. We keep experimenting, now with our sights set to the highend of triple storage, towards billion triple data sets. We areexperimenting with a new more space efficient index structure for betterworking set behavior. Next week will yield the first results.
]]>Anyway, we now have OpenID support in OpenLink Data Spaces (ODS) which coincides nicely with the growing support of OpenID across the web.
The beauty of OpenID support in ODS is that I now have a URL that meshes with my identity (at least in line with what I have chosed to share with the public via the Web). For instance, http://www.openlinksw.com/dataspace/kidehen@openlinksw.com is my OpenID as well as my personal URI (you look closer at this link and you have a map of my Data Space).
To really understand what I am getting at here you should open up My OpenID URL using one of the following:
To be continued....
]]>
Hopefully, I can expand further :-)
]]>For additional clarity re. my comments above, you can also look at the SPARQL & SIOC Usecase samples document for our OpenLink Data Spaces platform. Bottom line, the Semantic Web and SPARQL aren't BORING. In fact, quite the contrary, since they are essential ingredients of a more powerful Web than the one we work with today!
Enjoy the rest of John's post:
]]>Creating connections between discussion clouds with SIOC:
(Extract from our forthcoming BlogTalk paper about browsers for SIOC.)
SIOC provides a unified vocabulary for content and interaction description: a semantic layer that can co-exist with existing discussion platforms. Using SIOC, various linkages are created between the aforementioned concepts, which allow new methods of accessing this linked data, including:
- Virtual Forums. These may be a gathering of posts or threads which are distributed across discussion platforms, for example, where a user has found posts from a number of blogs that can be associated with a particular category of interest, or an agent identifies relevant posts across a certain timeframe.
- Distributed Conversations. Trackbacks are commonly used to link blog posts to previous posts on a related topic. By creating links in both directions, not only across blogs but across all types of internet discussions, conversations can be followed regardless of what point or URI fragment a browser enters at.
- Unified Communities. Apart from creating a web page with a number of relevant links to the blogs or forums or people involved in a particular community, there is no standard way to define what makes up an online community (apart from grouping the people who are members of that community using FOAF or OPML). SIOC allows one to simply define what objects are constituent parts of a community, or to say to what community an object belongs (using sioc:has_part / part_of): users, groups, forums, blogs, etc.
- Shared Topics. Technorati (a search engine for blogs) and BoardTracker (for bulletin boards) have been leveraging the free-text tags that people associate with their posts for some time now. SIOC allows the definition of such tags (using the subject property), but also enables hierarchial or non-hierarchial topic definition of posts using sioc:topic when a topic is ambiguous or more information on a topic is required. Combining with other Semantic Web vocabularies, tags and topics can be further described using the SKOS organisation system.
- One Person, Many User Accounts. SIOC also aims to help the issue of multiple identities by allowing users to define that they hold other accounts or that their accounts belong to a particular personal identity (via foaf:holdsOnlineAccount or sioc:account_of). Therefore, all the posts or comments made by a particular person using their various associated user accounts across platforms could be identified.
A phase in the evolution web usage patterns that emphasizes Web Services based interaction between âWeb Usersâ and âPoints of Web Presenceâ over traditional âWeb Usersâ and âWeb Sitesâ based interaction. Basically, a transition from visual site interaction to presence based interaction.
BTW - Dare Obasanjo also commented about Web usage patterns in his post titled: The Two Webs. Where he concluded that we had a dichotomy along the lines of: HTTP-for-APIs (2.0) and HTTP-for-Browsers (1.0). Which Jon Udell evolved into: HTTP-Services-Web and HTTP-Intereactive-Web during our recent podcast conversation.
With definitions in place, I will resume my quest to unveil the aforementioned Web 2.0 Data Access Conundrum:
As you can see from the above, Open Data access isn't genuinely compatible with Web 2.0.
We can also look at the same issue by way of the popular M-V-C (Model View Controller) pattern. Web 2.0 is all about the âVâ and âCâ with a modicum of âMâ at best (data access, open data access, and flexible open data access are completely separate things). The âCâ items represent application logic exposed by SOAP or REST style web services etc. I'll return to this later in this post.
What about Social Networking you must be thinking? Isn't this a Web 2.0 manifestation? Not at all (IMHO). The Web was developed / invented by Tim Berners-Lee to leverage the âNetwork Effectsâ potential of the Internet for connecting People and Data. Social Networking on the other hand, is simply one of several ways by which construct network connections. I am sure we all accept the fact that connections are built for many other reasons beyond social interaction. That said, we also know that through social interactions we actually develop some of our most valuable relationships (we are social creatures after-all).
The Web 2.0 Open Data Access impedance reality is ultimately going to be the greatest piece of tutorial and usecase material for the Semantic Web. I take this position because it is human nature to seek Freedom (in unadulterated form) which implies the following:
Web 2.0 by definition and use case scenarios is inherently incompatible with the above due to the lack of Flexible and Open Data Access.
If we take the definition of Web 2.0 (above) and rework it with an appreciation Flexible and Open Data Access you would arrive at something like this:
A phase in the evolution of the web that emphasizes interaction between âWeb Usersâ and âWeb Dataâ facilitated by Web Services based APIs and an Open & Flexible Data Access Model â.
In more succinct form:
A pervasive network of people connected by data or data connected by people.
Returning to M-V-C and looking at the definition above, you now have a complete of âMâ which is enigmatic in Web 2.0 and the essence of the Semantic Web (Data and Context).
To make all of this possible a palatable Data Model is required. The model of choice is the Graph based RDF Data Model - not to be mistaken for the RDF/XML serialization which is just that, a data serialization that conforms to the aforementioned RDF data model.
The Enterprise Challenge
Web 2.0 cannot and will not make valuable inroads into the the enterprise because enterprises live and die by their ability to exploit data. Weblogs, Wikis, Shared Bookmarking Systems, and other Web 2.0 distributed collaborative applications profiles are only valuable if the data is available to the enterprise for meshing (not mashing).
A good example of how enterprises will exploit data by leveraging networks of people and data (social networks in this case) is shown in this nice presentation by Accenture's Institute for High Performance Business titled: Visualizing Organizational Change.
Web 2.0 commentators (for the most part) continue to ponder the use of Web 2.0 within the enterprise while forgetting the congruency between enterprise agility and exploitation of people & data networks (The very issue emphasized in this original Web vision document by Tim Berners-Lee). Even worse, they remain challenged or spooked by the Semantic Web vision because they do not understand that Web 2.0 is fundamentally a Semantic Web precursor due to Open Data Access challenges. Web 2.0 is one of the greatest demonstrations of why we need the Semantic Web at the current time.
Finally, juxtapose the items below and you may even get a clearer view of what I am an attempting to convey about the virtues of Open Data Access and the inflective role it plays as we move beyond Web 2.0:
Information Management Proposal - Tim Berners-Lee
Visualizing Organizational Change - Accenture Institute of High Performance Business
This is fundamentally an animation demonstrating Semantic Web exploitation in the classic: picture speaks a thousand words manner. It also illustrates (yet again) the important Data Space(s) aspect of creating Semantic Web presence.
Finally, the Web 2.0 usage pattern tries to espouse what's demonstrated in this animation via data-context-challenged interactions (due to its "Walled Garden" and "Data Silo" approach to Data Access etc..). The Semantic Web (as per numerous posts on the subject) on the other hand achieves this via data-context-aware interactions (as will be exemplified via meshups).
]]>One of the great things about the moderate “open data access” that we have today (courtesy of the blogosphere) is the fact that you can observe the crystallization of new thinking, and/or new appreciation of emerging ideas, in near real-time. Of course, when we really hit the tracks with the Semantic Web this will be in “conditional real-time” (i.e. you choose and control your scope and sensitivity to data changes etc..).
For instance, by way of feed subscriptions, I stumbled upon a series of posts by Jason Kolb that basically articulate what I (and others who believe in the Semantic Web vision) have been attempting to convey in a myriad of ways via posts and commentary etc..
Here are the links to the 4 part series by Jason:
A phase in the evolution web usage patterns that emphasizes Web Services based interaction between âWeb Usersâ and âPoints of Web Presenceâ over traditional âWeb Usersâ and âWeb Sitesâ based interaction. Basically, a transition from visual site interaction to presence based interaction.
BTW - Dare Obasanjo also commented about Web usage patterns in his post titled: The Two Webs. Where he concluded that we had a dichotomy along the lines of: HTTP-for-APIs (2.0) and HTTP-for-Browsers (1.0). Which Jon Udell evolved into: HTTP-Services-Web and HTTP-Intereactive-Web during our recent podcast conversation.
With definitions in place, I will resume my quest to unveil the aforementioned Web 2.0 Data Access Conundrum:
As you can see from the above, Open Data access isn't genuinely compatible with Web 2.0.
We can also look at the same issue by way of the popular M-V-C (Model View Controller) pattern. Web 2.0 is all about the âVâ and âCâ with a modicum of âMâ at best (data access, open data access, and flexible open data access are completely separate things). The âCâ items represent application logic exposed by SOAP or REST style web services etc. I'll return to this later in this post.
What about Social Networking you must be thinking? Isn't this a Web 2.0 manifestation? Not at all (IMHO). The Web was developed / invented by Tim Berners-Lee to leverage the âNetwork Effectsâ potential of the Internet for connecting People and Data. Social Networking on the other hand, is simply one of several ways by which construct network connections. I am sure we all accept the fact that connections are built for many other reasons beyond social interaction. That said, we also know that through social interactions we actually develop some of our most valuable relationships (we are social creatures after-all).
The Web 2.0 Open Data Access impedance reality is ultimately going to be the greatest piece of tutorial and usecase material for the Semantic Web. I take this position because it is human nature to seek Freedom (in unadulterated form) which implies the following:
Web 2.0 by definition and use case scenarios is inherently incompatible with the above due to the lack of Flexible and Open Data Access.
If we take the definition of Web 2.0 (above) and rework it with an appreciation Flexible and Open Data Access you would arrive at something like this:
A phase in the evolution of the web that emphasizes interaction between âWeb Usersâ and âWeb Dataâ facilitated by Web Services based APIs and an Open & Flexible Data Access Model â.
In more succinct form:
A pervasive network of people connected by data or data connected by people.
Returning to M-V-C and looking at the definition above, you now have a complete of âMâ which is enigmatic in Web 2.0 and the essence of the Semantic Web (Data and Context).
To make all of this possible a palatable Data Model is required. The model of choice is the Graph based RDF Data Model - not to be mistaken for the RDF/XML serialization which is just that, a data serialization that conforms to the aforementioned RDF data model.
The Enterprise Challenge
Web 2.0 cannot and will not make valuable inroads into the the enterprise because enterprises live and die by their ability to exploit data. Weblogs, Wikis, Shared Bookmarking Systems, and other Web 2.0 distributed collaborative applications profiles are only valuable if the data is available to the enterprise for meshing (not mashing).
A good example of how enterprises will exploit data by leveraging networks of people and data (social networks in this case) is shown in this nice presentation by Accenture's Institute for High Performance Business titled: Visualizing Organizational Change.
Web 2.0 commentators (for the most part) continue to ponder the use of Web 2.0 within the enterprise while forgetting the congruency between enterprise agility and exploitation of people & data networks (The very issue emphasized in this original Web vision document by Tim Berners-Lee). Even worse, they remain challenged or spooked by the Semantic Web vision because they do not understand that Web 2.0 is fundamentally a Semantic Web precursor due to Open Data Access challenges. Web 2.0 is one of the greatest demonstrations of why we need the Semantic Web at the current time.
Finally, juxtapose the items below and you may even get a clearer view of what I am an attempting to convey about the virtues of Open Data Access and the inflective role it plays as we move beyond Web 2.0:
Information Management Proposal - Tim Berners-Lee
Visualizing Organizational Change - Accenture Institute of High Performance Business
Continuing from our recent Podcast conversation, Jon Udell sheds further insight into the essence of our conversation via a âStrategic Developerâ column article titled: Accessing the web of databases.
Below, I present an initial dump of a DataSpace FAQ below that hopefully sheds light on the DataSpace vision espoused during my podcast conversation with Jon.
What is a DataSpace?
A moniker for Web-accessible atomic containers that manage and expose Data, Information, Services, Processes, and Knowledge.
What would you typically find in a Data Space? Examples include:
How do Data Spaces and Databases differ?
Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled.
How do Data Spaces and Content Management Systems differ?
Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity.
How do Data Spaces and Knowledgebases differ?
A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models.
What Architectural Components make up a Data Space?
Where can I see a DataSpace along the lines described, in action?
Just look at my blog, and take the journey as follows:
What about other Data Spaces?
There are several and I will attempt to categorize along the lines of query method available:
Type 1 (Free Text Search over HTTP):
Google, MSN, Yahoo!, Amazon, eBay, and most Web 2.0 plays .
Type 2 (Free Text Search and XQuery/XPath over HTTP)
A few blogs and Wikis (Jon Udell's and a few others)
What About Data Space aware tools?
]]>
I was compelled to go back to the RSS 2.0 imbroglio when I came across Dave Winer's comments re. "the SEC attempting to reinvent RSS 2.0..." response to Jon Udell's recent XBRL article.
Although I don't believe in complex entry points into complex technology realms, I do subscribe to the approach where developers deal with the complexity associated with a problem domain while hiding said complexity from ambivalent end-users via coherent interfaces -- which does not always imply User Interface.
XBRL is a great piece of work that addresses the complex problem domain of Financial Reporting. The only thing it's missing right now is an Ontology that facilitates RDF Data Model based XBRL Schema and Instance Data which ultimately makes XBRL data available to RDF query languages such as SPARQL. This line of thought implies, for instance, an XML Schema to OWL Ontology Mapping for Schema Data (as explained in a white paper by the VSIS Group at the university of Hamburg) leaving the Instance Data to be generated in a myriad of ways that includes XML to RDF and/or XML->SQL->RDF.
As I stated in an earlier post: we should not mistake ambivalence to lack of intelligence. Assuming "Simple" is always right at all times is another way of subscribing to this profound misconception. You know, assuming the world was flat (as opposed to geoid) was quite palatable at some point in the history of mankind, I wonder what would have happened if we held on to this point of view to this day because of its "Simplicity"?
]]>
OAT offers a broad Javascript-based, browser-independent widget set
for building data source independent rich internet applications that are usable across a broad range of Ajax-capable web browsers.
OAT's support binding to the following data sources via its Ajax Database Connectivity Layer:
SQL Data via XML for Analysis (XMLA)
Web Data via SPARQL, GData, and OpenSearch Query Services
Web Services specific Data via service specific binding to SOAP and REST style web services
The toolkit includes a collection of powerful rich internet application prototypes include: SQL Query By Example, Visual Database Modeling, and Data bound Web Form Designer.
Project homepage on sourceforge.net:
http://sourceforge.net/projects/oat
Source Code:
http://sourceforge.net/projects/oat/files
Live demonstration:
http://www.openlinksw.com/oat/
]]>SEMANTIC KNIGHT:More from: Semantic Knight vs. Web Hacker Duel. Nice antidote to lots of self-rightous talk in the aftermath of the TBL-Norvig encounter. Thanks York.
None shall pass without formally defining the ontological meta-semantic thingies of their domain something-or-others!
HACKER:
What?
SEMANTIC KNIGHT:
None shall pass without using all sorts of semantic meta-meta-meta-stuff that we will invent Real Soon Now!
HACKER:
I have no quarrel with you, good Sir Knight, but I must get my work done on the Web. Stand aside!
(Via Valentin Zacharias.)
]]>
Goggle vs Semantic Web: "Google exec challenges Berners-Lee 'At the end of the keynote, however, things took a different turn. Google Director of Search and AAAI Fellow Peter Norvig was the first to the microphone during the Q&A session, and he took the opportunity to raise a few points.
'What I get a lot is: 'Why are you against the Semantic Web?' I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first,' Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user.'
Related: Google Base -- summing up."
(Via More News.)
When will we drop the ill conceived notion that end-users are incompetent?
Has it every occurred to software developers and technology vendors that incompetent, dumb, and other contemptuous end-user adjectives simply reflect the inability of most technology products to surmount end-user "Interest Activation Thresholds"?
Interest Activation Threshold (IAT)? What's That?
I have a fundamental personal belief that all human beings are intelligent. Our ability to demonstrate intelligence, or be perceived as intelligent, is directly proportional to our interest level in a given context. In short, we have "Ambivalence Quotients" (AQs) just as we have "Intelligence Quotients" (IQs).
An interested human being is an inherently intelligent entity. The abstract nature of human intelligence also makes locating the IQ and AQ on/off buttons a mercurial quest at the best of times.
Technology end-users exhibit high AQs, most of the time due to the inability of most technology products to truly engage, and ultimately stimulate genuine interest, by surmounting IAT and reducing AQ.
Ironically, when a technology vendor is lagging behind its competitors in the "features arms race" it is common place to use the familiar excuse: "our end-users aren't asking for this feature".
Note To Google:
Ambivalence isn't incompetence. If end-users were genuinely incompetent, how is that they run rings around your page rank algorithms by producing google-friendly content at the expense of valuable context? What about the deteriorating value of Adsense due to click fraud? Likewise, the continued erosion of the value of your once exemplary "keyword based search" service? As we all know, necessity is the mother of invention, so when users develop high AQs because there is nothing better, we end up with a forced breech of "IAT"; which is why the issues that I mention remain long term challenges for you. Ironically, the so called "incompetents" are already outsmarting you, and you don't seem to comprehend this reality or its inevitable consequences.
Finally, how you are going to improve value without integrating the Semantic Web vision into your R&D roadmap? I can tell you categorically that you have little or no wiggle room re. this matter, especially if you want to remain true to your: "don't be evil" mantra. My guess is that you will incorporate Semantic Web technologies sooner rather than later (Google Co-op is a big clue). I would even go as far as predicting a Google hosted SPARQL Query Endpoint alongside your GData endpints during the next 6-12 months (if even that long). I believe that your GData protocol (like the rest of Web 2.0) will ultimately accelerate your appreciation of the data model dexterity that RDF brings to loosely coupled knowledge networks espoused by the Semantic Web vision.
Google & Semantic Web Paradox
The Semantic Web vision has the RDF graph data model at its core (and for good reason), but even more confusing for me, as I process Google sentiments about the Semantic Web, is the fact that RDF's actual creator (Ramanathan Guha aka. Guha) currently works at Google. There's a strange disconnect here IMHO.
If I recall correctly, Google wants to organize the worlds data and information, leaving the knowledge organization to someone else which is absolutely fine. What is increasingly irksome, is the current tendency to use corporate stature to generate Fear, Uncertainty, and Doubt when the subject matter is the "Semantic Web".
BTW - I've just read Frederick Giasson's perspective on the Google Semantic Web paradox which ultimately leads to the same conclusions regarding Google's FUD stance when dealing with matters relating to the Semantic Web.
I wonder if anyone is tracking the google hits for "fud google semantic web"?
]]>Here is a dump of the post titled: Intermediate RDF Loading Results:
]]>Following from the post about a new Multithreaded RDF Loader, here are some intermediate results and action plans based on my findings.
The experiments were made on a dual 1.6GHz Sun SPARC with 4G RAM and 2 SCSI disks. The data sets were the 48M triple Wikipedia data set and the 1.9M triple Wordnet data set. 100% CPU means one CPU constantly active. 100% disk means one thread blocked on the read system call at all times.
Starting with an empty database, loading the Wikipedia set took 315 minutes, amounting to about 2500 triples per second. After this, loading the Wordnet data set with cold cache and 48M triples already in the table took 4 minutes 12 seconds, amounting to 6838 triples per second. Loading the Wikipedia data had CPU usage up to 180% but over the whole run CPU usage was around 50% with disk I/O around 170%. Loading the larger data set was significantly I/O bound while loading the smaller set was more CPU bound, yet was not at full 200% CPU.
The RDF quad table was indexed on GSPO and PGOS. As one would expect, the bulk of I/O was on the PGOS index. We note that the pages of this index were on the average only 60% full. Thus the most relevant optimization seems to be to fill the pages closer to 90%. This will directly cut about a third of all I/O plus will have an additional windfall benefit in the form of better disk cache hit rates resulting from a smaller database.
The most practical way of having full index pages in the case of unpredictable random insert order will be to take sets of adjacent index leaf pages and compact the rows so that the last page of the set goes empty. Since this is basically an I/O optimization, this should be done when preparing to write the pages to disk, hence concerning mostly old dirty pages. Insert and update times will not be affected since these operations will not concern themselves with compaction. Thus the CPU cost of background compaction will be negligible in comparison with writing the pages to disk. Naturally this will benefit any relational application as well as free text indexing. RDF and free text will be the largest beneficiaries due to the large numbers of short rows inserted in random order.
Looking at the CPU usage of the tests, locating the place in the index where to insert, which by rights should be the bulk of the time cost, was not very significant, only about 15%. Thus there are many unused possibilities for optimization,for example writing some parts of the loader current done as stored procedures in C. Also the thread usage of the loader, with one thread parsing and mapping IRI strings to IRI IDs and 6 threads sharing the inserting could be refined for better balance, as we have noted that the parser thread sometimes forms a bottleneck. Doing the updating of the IRI name to IRI id mapping on the insert thread pool would produce some benefit.
Anyway, since the most important test was I/O bound, we will first implement some background index compaction and then revisit the experiment. We expect to be able to double the throughput of the Wikipedia data set loading.
More Thoughts on ORDBMS Clients, .NET and RDF:
Continuing on from the previous post... If Microsoft opens the right interfaces for independent developers, we see many exciting possibilities for using ADO .NET 3 with Virtuoso.]]>Microsoft quite explicitly states that their thrust is to decouple the client side representation of data as .NET objects from the relational schema on the database. This is a worthy goal.
But we can also see other possible applications of the technology when we move away from strictly relational back ends. This can go in two directions: Towards object oriented database and towards making applications for the semantic web.
In the OODBMS direction, we could equate Virtuoso table hierarchies with .NET classes and create a tighter coupling between client and database, going as it were in the other direction from Microsofts intended decoupling. For example, we could do typical OODBMS tricks such as prefetch of objects based on storage clustering. The simplest case of this is like virtual memory, where the request for one byte brings in the whole page or group of pages. The basic idea is that what is created together probably gets used together and if all objects are modeled as subclasses of (subtables) of a common superclass, then, regardless of instance type, what is created together (has consecutive ids) will indeed tend to cluster on the same page. These tricks can deliver good results in very navigational applications like GIS or CAD. But these are rather specialized things and we do not see OODBMS making any great comeback.
But what is more interesting and more topical in the present times is making clients for the RDF world. There, the OWL Ontology Language could be used to make the .NET classes and the DBMS could, when returning URIs serving as subjects of triple include specified predicates on these subjects, enough to allow instantiating .NET instances as 'proxies' of these RDF objects. Of course, only predicates for which the client has a representation are relevant, thus some client-server handshake is needed at the start. What data could be prefetched is like the intersection of a concise bounded description and what the client has classes for. The rest of the mapping would be very simple, with IRIs becoming pointers, multi-valued predicates lists and so on. IRIs for which the RDF type were not known or inferable could be left out or represented as a special class with name-value pairs for its attributes, same with blank nodes.
In this way,.NETs considerable UI capabilities could directly be exploited for visualizing RDF data, only given that the data complied reasonably well with a known ontology.
If an SPARQL query returned a resultset, IRI type columns would be returned as .NET instances and the server would prefetch enough data for filling them in. For a SPARQL CONSTRUCT, a collection object could be returned with the objects materialized inside. If the interfaces allow passing an Entity SQL string, these could possibly be specialized to allow for a SPARQL string instead. LINQ might have to be extended to allow for SPARQL type queries, though.
Many of these questions will be better answerable as we get more details on Microsofts forthcoming ADO .NET release. We hope that sufficient latitude exists for exploring all these interesting avenues of development.
I shopped for everything except food on eBay. When working with foreign-language documents, I used translations from Babel Fish. (This worked only so well. After a Babel Fish round-trip through Italian, the preceding sentence reads, 'That one has only worked therefore well.') Why use up space storing files on my own hard drive when, thanks to certain free utilities, I can store them on Gmail's servers? I saved, sorted, and browsed photos I uploaded to Flickr. I used Skype for my phone calls, decided on books using Amazon's recommendations rather than 'expert' reviews, killed time with videos at YouTube, and listened to music through customizable sites like Pandora and Musicmatch. I kept my schedule on Google Calendar, my to-do list on Voo2do, and my outlines on iOutliner. I voyeured my neighborhood's home values via Zillow. I even used an online service for each stage of the production of this article, culminating in my typing right now in Writely rather than Word. (Being only so confident that Writely wouldn't somehow lose my work -- or as Babel Fish might put it, 'only confident therefore' -- I backed it up into Gmail files.Interesting article, Tim O'Reilly's response is here"
(Via Valentin Zacharias (Student).)
Tim O'Reilly's response provides the following hierarchy for Web 2.0 based on The what he calls: "Web 2.0-ness":
level 3: The application could ONLY exist on the net, and draws its essential power from the network and the connections it makes possible between people or applications. These are applications that harness network effects to get better the more people use them. EBay, craigslist, Wikipedia, del.icio.us, Skype, (and yes, Dodgeball) meet this test. They are fundamentally driven by shared online activity. The web itself has this character, which Google and other search engines have then leveraged. (You can search on the desktop, but without link activity, many of the techniques that make web search work so well are not available to you.) Web crawling is one of the fundamental Web 2.0 activities, and search applications like Adsense for Content also clearly have Web 2.0 at their heart. I had a conversation with Eric Schmidt, the CEO of Google, the other day, and he summed up his philosophy and strategy as "Don't fight the internet." In the hierarchy of web 2.0 applications, the highest level is to embrace the network, to understand what creates network effects, and then to harness them in everything you do.
Level 2: The application could exist offline, but it is uniquely advantaged by being online. Flickr is a great example. You can have a local photo management application (like iPhoto) but the application gains remarkable power by leveraging an online community. In fact, the shared photo database, the online community, and the artifacts it creates (like the tag database) is central to what distinguishes Flickr from its offline counterparts. And its fuller embrace of the internet (for example, that the default state of uploaded photos is "public") is what distinguishes it from its online predecessors.
Level 1: The application can and does exist successfully offline, but it gains additional features by being online. Writely is a great example. If you want to do collaborative editing, its online component is terrific, but if you want to write alone, as Fallows did, it gives you little benefit (other than availability from computers other than your own.)
Level 0: The application has primarily taken hold online, but it would work just as well offline if you had all the data in a local cache. MapQuest, Yahoo! Local, and Google Maps are all in this category (but mashups like housingmaps.com are at Level 3.) To the extent that online mapping applications harness user contributions, they jump to Level 2.
So, in a sense we have near conclusive confirmation that Web 2.0 is simply about APIs (typically service specific Data Silos or Walled-gardens) with little concern, understanding, or interest in truly open data access across the burgeoning "Web of Databases". Or the Web of "Databases and Programs" that I prefer to describe as "Data Spaces"
Thus, we can truly begin to conclude that Web 3.0 (Data Web) is the addition of Flexible and Open Data Access to Web 2.0; where the Open Data Access is achieved by leveraging Semantic Web deliverables such as the RDF Data Model and the SPARQL Query Language :-)
]]>GeoRSS & Geonames for Philanthropy: "
I heard about Kiva.ORG in a BusinessWeek podcast. After visiting its website, I think there are few places where GeoRSS (in the RDF/A syntax) and Geonames can be used to enhance the siteâs functionality.
Itâs a microfinance website for people in the developing countries. Its business model is in the intersection between peer-to-peer financing and philanthropy. The goal is to help developing country businesses to borrow small loans from a large group of Web users, so that they can avoid paying high interests to the banks.
For example, a person in Uganda can request a $500 loan and use it for buying and selling more poultry. One or more lenders (anyone on the Web) may decide to grant loans to that person in increments as tiny as $25. After few years, that person will pay back the loans to the lenders.
I went to the website and discovered the site has a relative weak search and browsing interface. In particular, there is no way to group loan requests based on geographical locations (e.g., countries, cities and regions).
Took a look at individual loan pages. Each page actually has standard ways to describe location information â e.g., Location: Mbale, Uganda.
It should be relative easy to add GeoRSS points (in the RDF/A syntax) to describe these location information (an alternative maybe using Microformat Geo or W3C Geo). Once the location information is annotated, one can imagine building a map mashup to display loan requests in a geospatial perspective. One can also build search engines to support spatial queries such as âfind me all loans with from Mbaleâ.
Since Kiva.ORG webmasters may not be GIS experts, it will be nice if we can find ways to automatically geocode location information and describe that using GeoRSS. This automatic geocoding procedure can be developed using Geonamesâs webservices. Take a string âMbaleâ or âUgandaâ, and send to Geonamesâs search service. The procedure will get back JSON or XML description of the location, which include latitude and longitude. This will then be used to annotate the location information in a Kiva loan page.
Can you think of other ways to help Kiva.ORG to become more âgeospatially intelligentâ?
You can learn more about Kiva.ORG at its website and listen to this podcast.
In an initial response to these developmentsOrri Erling, Virtuoso's Program Manager, shares valuable insights from past re. Object-Relational technology developments and deliverables challenges. As Orri notes, the Virtuoso team suspended ORM and ORDBMS work at the onset of the Kubl-Virtuoso transition due to the lack of standardized client-side functionality exposure points.
My hope is that Microsoft's efforts trigger community wide activity that result in a collection of interfaces that make scenarios such as generating .NET based Semantic Web Objects (where the S in an S-P->O RDF-Triple becomes a bona fide .NET class instance generated from OWL).
To be continued since the interface specifics re. ADO.NET 3.0 remain in flux...
]]>Note to Tim:
Is the RDF.net domain deal still on? I know it's past 1st Jan 2006, but do bear in mind that the critical issue of a broadly supported RDF Query Language only took significant shape approximately 13 months ago (in the form of SPARQL), and this is all so critical to the challenge you posed in 2003.
RDF.net could become a point of semantic-web-presence through which the benefits of SPARQL compliant Triple|Quad Stores, Shared Ontologies, and SPARQL Protocol are unveiled in their well intended glory :-).
]]>Standards as social contracts: "Looking at Dave Winer's efforts in evangelizing OPML, I try to draw some rough lines into what makes a de-facto standard. De Facto standards are made and seldom happen on their own. In this entry, I look back at the history of HTML, RSS, the open source movement and try to draw some lines as to what makes a standard.
"(Via Tristan Louis.)
I posted a comment to the Tristan Louis' post along the following lines:
Analysis is spot on re. the link between de facto standardization and bootstrapping. Likewise, the clear linkage between boostrapping and connected communities (a variation of the social networking paradigm).
Dave built a community around a XML content syndication and subscription usecase demo that we know today as the blogosphere. Superficially, one may conclude that Semantic Web vision has suffered to date from a lack a similar bootstrap effort. Whereas in reality, we are dealing with "time and context" issues that are critical to the base understanding upon which a "Dave Winer" style bootstrap for the Semantic Web would occur.
Personally, I see the emergence of Web 2.0 (esp. the mashups phenomenon) as the "time and context" seeds from which the Semantic Web bootstrap will sprout. I see shared ontologies such as FOAF and SIOC leading the way (they are the RSS 2.0's of the Semantic Web IMHO).
]]>Hiding Ontology from the Semantic Web Users: "
Ontology is a key foundation of the Semantic Web. Without ontology, it will be difficult for applications to share knowledge and reason over information that is published on the Web. However, it is a serious mistake to think that the Semantic Web is simply a collection of ontologies.
Last week I was invited to be on a panel discussion at the Humans and the Semantic Web Workshop. I talked a bit about the Geospatial Semantic Web and its associated research issues. Overall the workshop went very well. You can read about the notes from the workshop here.
Some of my new thinkings after the workshop are as the follows.
I was asked the question, âWhatâre user-related issues that Semantic Web developers must pay attention to?â I think building Semantic Web applications are similar to building database applications. Few things we can learn from our past experience in building database applications.
When building database-driven applications, we store information in SQL databases, and we use SQL to access, manipulate, and manage this information. When building Semantic Web applications, we express ontologies and information in RDF, and use RDF query languages (e.g. SPARQL) to access and manipulate this information.
When building database-driven applications, we hide complexity from the end-users. For example, we almost never expose raw SQL statements to the end users, or ask users to process the raw result sets returned from an SQL engine. We always provide intuitive interfaces for accessing and representing information.
When building Semantic Web applications, we should also hide complexity from the end-users. Users shouldnât need to see or edit RDF statements. Users shouldnât need to be fluent in SPARQL queries or able parse graphs that are returned by a SPARQL engine.
Semantic Web developers should spend more time on building functional capabilities that solve real world problems and improve peopleâs productivity. Itâs important to remember that âthe Semantic Web != ontologiesâ.
" ]]>This particular inflection and, ultimately, transistion is going to occur at Warp Speed!
]]>
A quick FYI:
Virtuoso has offered a DBMS hosted Filesystem via WebDAV for a number of years, but the implications of this functionality have remained unclear for just as long. Thus, we developed (a few years ago) and released (recently) an application layer above Virtuoso's WebDAV storage realm called: “The OpenLink Briefcase” (nee. oDrive). This application allows you to view items uploaded by content type and/or kind (People, Business Cards, Calendars, Business Reports, Office Documents, Photos, Blog Posts, Feed Channels/Subscriptions, Bookmarks etc..). it also includes automatic metadata extraction (where feasible) and indexing. Naturally, as an integral part of our “OpenLink Data Spaces” (ODS) product offering, it supports GData, URIQA, SPARQL (note: WebDAV metadata is sync'ed with Virtuoso's RDF Triplestore), SQL, and WebDAV itself.
You can explore the power of this product via the following routes:
"We all know that structured data is boring and useless; while unstructured data is sexy and chock full of value. Well, only up to a point, Lord Copper. Genuinely unstructured data can be a real nuisance - imagine extracting the return address from an unstructured letter, without letterhead and any of the formatting usually applied to letters. A letter may be thought of as unstructured data, but most business letters are, in fact, highly-structured." ....Duncan Pauly, founder and chief technology officer of Coppereye add's eloquent insight to the conversation:
"The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure:* The structure of the data itself.
* The structure of the container that hosts the data.
* The structure of the access method used to access the data.
These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."
Data understanding and appreciation is dwindling at a time when the reverse should be happening. We are supposed to be in the throws of the "Information Age", but for some reason this appears to have no correlation with data and "data access" in the minds of many -- as reflected in the broad contradictory positions taken re. unstructured data vs structured data, structured is boring and useless while unstructured is useful and sexy....
The difference between "Structured Containers" and "Structured Data" are clearly misunderstood by most (an unfortunate fact).
For instance all DBMS products are "Structured Containers" aligned to one or more data models (typically one). These products have been limited by proprietary data access APIs and underlying data model specificity when used in the "Open-world" model that is at the core of the World Wide Web. This confusion also carries over to the misconception that Web 2.0 and the Semantic/Data Web are mutually exclusive.
But things are changing fast, and the concept of multi-model DBMS products is beginning to crystalize. On our part, we have finally released the long promised "OpenLink Data Spaces" application layer that has been developed using our Virtuoso Universal Server. We have structured unified storage containment exposed to the data web cloud via endpoints for querying or accessing data using a variety of mechanisms that include; GData, OpenSearch, SPARQL, XQuery/XPath, SQL etc..
To be continued....
]]>Apple patent application for cascade feature for creating records in a database:
" On June 22, the US Patent & Trademark Office revealed Apple’s patent application titled ‘Cascade feature for creating records in a database,’ originally filed in December 2004. The present invention relates to databases and, more particularly, to providing a cascade feature for a database program which can serve as an... [ read more ]"
(Via Macsimum News.)
Its one thing to not know, or have any demonstrable interest in, the enterprise corporate market (the land of database technology utilization etc..), and a completely different matter when lack of technology advances in this realm amount to advertising one's ignorance about database matters so publicly.
I would like to assume that this patent is dead on arrival since there should be an army of DBMS vendors Triggered by this attempt to CASCADE DELETE years of existing prior art LOL!!
The attempt to use Model Independence as the patentable variation of "DBMS Cascade Functionality" prior art doesn't wash. CASCADE functionality is old news in the real DBMS world! What next? Patent application for mixing SQL and SPARQL in 2009?
There is a gradual sense that we are now making the Conceptual View of Data Real, across the board, and obviously there would be a clear need to apply CASCADE technology in this context. But the fact that you realize this now (Apple!) simply doesn't make it novel in any shape or form.
]]>Despite page ranking and other techniques, the scale of the Internet is straining available commercial search engines to deliver truly relevant content.' This observation is not new, but its relevance is growing.' Similarly, the integration and interoperabillity challenges facing enterprises have never been greater.' One approach to address these needs, among others, is to adopt semantic Web standards and technologies.
The image is compelling:' targeted and unambiguous information from all relevant sources, served in usable bit-sized chunks.' It sounds great; why isn’t it happening?
There are clues — actually, reasons — why semantic Web technology is not being embraced on a broad-scale way.' I have spoken elsewhere as to why enterprises or specific organizations will be the initial adopters and promoters of these technologies.' I still believe that to be the case.' The complexity and lack of a network effect ensure that semantic Web stuff will not initially arise from the public Internet.
Parellels with Knowledge Management
]]>A pre-print from Tim Finin and Li Deng entitled, Search Engines for Semantic Web Knowledge,1 presents a thoughtful and experienced overview of the challenges posed to conventional search by semantic Web constructs.' The authors’ base much of their observations on their experience with the Swoogle semantic Web search engine over the past two years.' They also used Swoogle, whose index contains information on over 1.3M RDF documents, to generate statistics on the semantic Web size and growth in the paper.
Among other points, the authors note these key differences and challenges from conventional search engines:
The authors particularly note the challenge of indexing as repositories grow to actual Internet scales.
Though not noted, I would add to this list the challenge of user interfaces. Only a small percentage of users, for example, use Google’s more complicated advanced search form.' In its full-blown implementation, semantic Web search variations could make the advanced Google form look like child’s play.
'
1Tim Finin and Li Ding, 'Search Engines for Semantic Web Knowledge,' a pre-print to be published in the Proceedings of XTech 2006: Building Web 2.0, May 16, 2006, 19 pp.' A PDF of the paper is available for download.
" ]]>]]>Two graphs that explain most IT dysfunction (Part I): "
Inspired by reading about other peopleâs blogging weaknesses, Iâve decided to finally get this one off the back burner and post it. Iâm pretty sure that this isnât original, but I started thinking about this way back in 1996 (pre-social-bookmarking) and Iâve lost my pointer to whatever influenced it. Anybody who can set me straight- Iâd appreciate it.
So here goes.
There are two graphs which, when seen together, explain a hell of a lot about various forms of dysfunction that you see in the technology world.
In this first graph, X represents relative âtechnical expertiseâ and Y represents the âperceived benefitâ in the introduction of a new technology:
The summary is that technical neophytes (A) tend to see high potential benefit in new technologies, while people who have a bit of technology experience (B) grow increasingly cynical about technology claims and can rattle-off the names of technologies that they have seen over-hyped and that have under-delivered. The interesting thing though, is that, as people become really expert in technology (C), their view of the potential benefits in new technology starts to increase again. At the far right of this scale Iâm talking about the real experts- the alpha-geeks of the world.
In the second graph, X again represents technical expertise, but Y represents âperceived riskâ associated with the introduction of a new technology:
Here the curve is inverted, but the basic pattern is the same. The neophytes (A) are blissfully unaware of the things that can go wrong with the introduction of a new technology. The tech-savvy (B) are battle-scarred and have seen (and possibly caused) countless disasters. The alpha-geeks (C) have also seen their share of problems, but they have also learned from their mistakes and know how to avoid them in the future. The alpha-geeks understand how to manage the risk.
Now things get interesting when you map these two dynamics against each other:
You see that neophytes in group A have essentially the same world view as the alpha-geeks in group C, but for completely different reasons. The trouble starts when you realize that most of senior executives, venture capitalists and members of the popular press are in group A. At the other extreme, most R&D groups, architecture groups, independent consultancies, technology pundits, etc. are in group C . There are a few problems with this:
- People in group A will often talk to and solicit advice from people in group C
- There are relatively few people in group C
- Most of the people who actually have to implement new technologies are in group B.
So you can start to see the problem.
In Part II Iâl talk some more about group B and Iâll discuss some of the classic patterns that emerge when A, B and C try to work with each other.
"
Virtuoso extends its SQL3 implementation with syntax for integrating SPARQL into queries and subqueries.Thus, as part of a SQL SELECT query or subquery, one can write the SPARQL keyword and a SPARQL query as part of query text processed by Virtuoso's SQL Query Processor.
Using Virtuoso's Command line or the Web Based ISQL utility type in the following (note: "SQL>" is the command line prompt for the native ISQL utility):
SQL> sparql select distinct ?p where { graph ?g { ?s ?p ?o } };
Which will return the following:
p varchar ---------- http://example.org/ns#b http://example.org/ns#d http://xmlns.com/foaf/0.1/name http://xmlns.com/foaf/0.1/mbox ...
SQL> select distinct subseq (p, strchr (p, '#')) as fragment from (sparql select distinct ?p where { graph ?g { ?s ?p ?o } } ) as all_predicates where p like '%#%' ;
fragment varchar ---------- #query #data #name #comment ...
You can pass parameters to a SPARQL query using a Virtuoso-specific syntax extension. '??' or '$?' indicates a positional parameter similar to '?' in standard SQL. '??' can be used in graph patterns or anywhere else where a SPARQL variable is accepted. The value of a parameter should be passed in SQL form, i.e. this should be a number or an untyped string. An IRI ID can not be passed, but an absolute IRI can. Using this notation, a dynamic SQL capable client (ODBC, JDBC, ADO.NET, OLEDB, XMLA, or others) can execute parametrized SPARQL queries using parameter binding concepts that are common place in dynamic SQL. Which implies that existing SQL applications and development environments (PHP, Ruby, Python, Perl, VB, C#, Java, etc.) are capable of issuing SPARQL queries via their existing SQL bound data access channels against RDF Data stored in Virtuoso.
Note: This is the Virtuoso equivalent of a recently published example using Jena (a Java based RDF Triple Store).
Create a Virtuoso Function by execting the following:
SQL> create function param_passing_demo (); { declare stat, msg varchar; declare mdata, rset any; exec ('sparql select ?s where { graph ?g { ?s ?? ?? }}', stat, msg, vector ('http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#int1', 4 ), -- Vector of two parameters 10, -- Max. result-set rows mdata, -- Variable for handling result-set metadata rset -- Variable for handling query result-set ); return rset[0][0]; }Test new "param_passing_demo" function by executing the following:
SQL> select param_passing_demo ();
Which returns:
callret VARCHAR _______________________________________________________________________________http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#four1 Rows. -- 00000 msec.
A SPARQL ASK query can be used as an argument of the SQL EXISTS predicate.
create function sparql_ask_demo () returns varchar { if (exists (sparql ask where { graph ?g { ?s ?p 4}})) return 'YES'; else return 'NO'; };
Test by executing:
SQL> select sparql_ask_demo ();
Which returns:
_________________________ YES]]>
Sometimes, an application will be making a SPARQL query, using the results from a previous query or using some RDF term found through the other Jena APIs.
SQL has prepared statements - they allow an SQL statement to take a number of parameters. The application fills in the parameters and executes the statement.
One way is to resort to doing this in SPARQL by building
a complete, new query string, parsing it and executing it.
But it takes a little care to handle all cases like
quoting special characters; you can at least use some of the
many utilities in ARQ for producing strings such as
FmtUtils.stringForResource
(it's
not in the application API but in the util
package currently).
Queries in ARQ can be built programmatically but it is tedious, especially when the documentation hasn't been written yet.
Another way is to use query variables and bind them to initial values that apply to all query solutions. Consider the query:
PREFIX dc <http://purl.org/dc/elements/1.1/> SELECT ?doc { ?doc dc:title ?title }
It gets documents and their titles.
Executing a query in program might look like:
import com.hp.hpl.jena.query.* ; Model model = ... ;
String queryString = StringUtils.join('\n', new String[]{ 'PREFIX dc <http://purl.org/dc/elements/1.1/>', 'SELECT ?doc { ?doc dc:title ?title }' }) ; Query query = QueryFactory.create(queryString) ; QueryExecution qexec = QueryExecutionFactory.create(query, model) ; try { ResultSet results = qexec.execSelect() ; for ( ; results.hasNext() ; ) { QuerySolution soln = results.nextSolution() ; Literal l = soln.getLiteral('doc') ; } } finally { qexec.close() ; }
Suppose the application knows the title it's interesting in - can it use this to get the document?
The value of ?title
made a parameter to the query
and fixed by an initial binding. All query solutions will
be restricted to patterns matches where ?title
is that RDF term.
QuerySolutionMap initialSettings = new QuerySolutionMap() ; initialSettings.add('title', node) ;
and this is passed to the factory that creates QueryExecution's:
QueryExecution qexec = QueryExecutionFactory.create(query, model, initialSettings) ;
It doesn't matter if the node is a literal, a resource with URI or a blank node. It becomes a fixed value in the query, even a blank node, because it's not part of the SPARQL syntax, it's a fixed part of every solution.
This gives named parameters to queries enabling something like SQL prepared statements except with named parameters not positional ones.
This can make a complex application easier to structure and clearer to read. It's better than bashing strings together, which is error prone, inflexible, and does not lead to clear code.
(Via ARQtick.)
]]>I added the missing piece regarding the "Virtuoso Conductor" (the Web based Admin UI for Virtuoso) to the original post below. I also added a link to our live SPARQL Demo so that anyone interested can start playing around with SPARQL and SPARQL integrated into SQL right away.
Another good thing about this post is the vast amount of valuable links that it contains. To really appreciate this point simply visit my Linkblog (excuse the current layout :-) - a Tab if you come in via the front door of this Data Space (what I used to call My Weblog Home Page).
]]>"Free" Databases: Express vs. Open-Source RDBMSs: "Open-source relational database management systems (RDBMSs) are gaining IT mindshare at a rapid pace. As an example, BusinessWeek's February 6, 2006 ' Taking On the Database Giants ' article asks 'Can open-source upstarts compete with Oracle, IBM, and Microsoft?' and then provides the answer: 'It's an uphill battle, but customers are starting to look at the alternatives.'
There's no shortage of open-source alternatives to look at. The BusinessWeek article concentrates on MySQL, which BW says 'is trying to be the Ikea of the database world: cheap, needs some assembly, but has a sleek, modern design and does the job.' The article also discusses Postgre[SQL] and Ingres, as well as EnterpriseDB, an Oracle clone created from PostgreSQL code*. Sun includes PostgreSQL with Solaris 10 and, as of April 6, 2006, with Solaris Express.**
*Frank Batten, Jr., the investor who originally funded Red Hat, invested a reported $16 million into Great Bridge with the hope of making a business out of providing paid support to PostgreSQL users. Great Bridge stayed in business only 18 months , having missed an opportunity to sell the business to Red Hat and finding that selling $50,000-per-year support packages for an open-source database wasn't easy. As Batten concluded, 'We could not get customers to pay us big dollars for support contracts.' Perhaps EnterpriseDB will be more successful with a choice of $5,000, $3,000, or $1,000 annual support subscriptions .
**Interestingly, Oracle announced in November 2005 that Solaris 10 is 'its preferred development and deployment platform for most x64 architectures, including x64 (x86, 64-bit) AMD Opteron and Intel Xeon processor-based systems and Sun's UltraSPARC(R)-based systems.'
There is a surfeit of reviews of current MySQL, PostgreSQL andâto a lesser extentâIngres implementations. These three open-source RDBMSs come with their own or third-party management tools. These systems compete against free versions of commercial (proprietary) databases: SQL Server 2005 Express Edition (and its MSDE 2000 and 1.0 predecessors), Oracle Database 10g Express Edition, IBM DB2 Express-C, and Sybase ASE Express Edition for Linux where database size and processor count limitations aren't important. Click here for a summary of recent InfoWorld reviews of the full versions of these four databases plus MySQL, which should be valid for Express editions also. The FTPOnline Special Report article, 'Microsoft SQL Server Turns 17,' that contains the preceding table is here (requires registration.)
SQL Server 2005 Express Edition SP-1 Advanced Features
SQL Server 2005 Express Edition with Advanced Features enhances SQL Server 2005 Express Edition (SQL Express or SSX) dramatically, so it deserves special treatment here. SQL Express gains full text indexing and now supports SQL Server Reporting Services (SSRS) on the local SSX instance. The SP-1 with Advanced Features setup package, which Microsoft released on April 18, 2006, installs the release version of SQL Server Management Studio Express (SSMSE) and the full version of Business Intelligence Development Studio (BIDS) for designing and editing SSRS reports. My 'Install SP-1 for SQL Server 2005 and Express' article for FTPOnline's SQL Server Special Report provides detailed, illustrated installation instructions for and related information about the release version of SP-1. SP-1 makes SSX the most capable of all currently available Express editions of commercial RDBMSs for Windows.
OpenLink Software's Virtuoso Open-Source Edition
OpenLink Software announced an open-source version of it's Virtuoso Universal Server commercial DBMS on April 11, 2006. On the initial date of this post, May 2, 2006, Virtuoso Open-Source Edition (VOS) was virtually under the radar as an open-source product. According to this press release, the new edition includes:VOS only lacks the virtual server and replication features that are offered by the commercial edition. VOS includes a Web-based administration tool called the "Virtuoso Conductor" According to Kingsley Idehen's Weblog, 'The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).'
- SPARQL compliant RDF Triple Store
- SQL-200n Object-Relational Database Engine (SQL, XML, and Free Text)
- Integrated BPEL Server and Enterprise Service Bus
- WebDAV and Native File Server
- Web Application Server that supports PHP, Perl, Python, ASP.NET, JSP, etc.
- Runtime Hosting for Microsoft .NET, Mono, and Java
InfoWorld's Jon Udell has tracked Virtuoso's progress since 2002, with an additional article in 2003 and a one-hour podcast with Kingsley Idehen on April 26, 2006. A major talking point for Virtuoso is its support for Atom 0.3 syndication and publication, Atom 1.0 syndication and (forthcoming) publication, and future support for Google's GData protocol, as mentioned in this Idehen post. Yahoo!'s Jeremy Zawodny points out that the 'fingerprints' of Adam Bosworth, Google's VP of Engineering and the primary force behind the development of Microsoft Access, 'are all over GData.' Click here to display a list of all OakLeaf posts that mention Adam Bosworth.
One application for the GData protocol is querying and updating the Google Base database independently of the Google Web client, as mentioned by Jeremy: 'It's not about building an easier onramp to Google Base. ... Well, it is. But, again, that's the small stuff.' Click here for a list of posts about my experiences with Google Base. Watch for a future OakLeaf post on the subject as the GData APIs gain ground.
Open-Source and Free Embedded Database Contenders
Open-source and free embedded SQL databases are gaining importance as the number and types of mobile devices and OSs proliferate. Embedded databases usually consist of Java classes or Windows DLLs that are designed to minimize file size and memory consumption. Embedded databases avoid the installation hassles, heavy resource usage and maintenance cost associated with client/server RDBMSs that run as an operating system service.
Andrew Hudson's December 2005 'Open Source databases rounded up and rodeoed' review for The Enquirer provides brief descriptions of one commercial and eight open source database purveyors/products: Sleepycat, MySQL, PostgreSQL, Ingres, InnoBase, Firebird, IBM Cloudscape (a.k.a, Derby), Genezzo, and Oracle. Oracle Sleepycat* isn't an SQL Database, Oracle InnoDB* is an OEM database engine that's used by MySQL, and Genezzo is a multi-user, multi-server distributed database engine written in Perl. These special-purpose databases are beyond the scope of this post.
* Oracle purchased Sleepycat Software, Inc. in February 2006 and purchased Innobase OY in October 2005 . The press release states: 'Oracle intends to continue developing the InnoDB technology and expand our commitment to open source software.'
Derby is an open-source release by the Apache Software Foundation of the Cloudscape Java-based database that IBM acquired when it bought Informix in 2001. IBM offers a commercial release of Derby as IBM Cloudscape 10.1. Derby is a Java class library that has a relatively light footprint (2 MB), which make it suitable for client/server synchronization with the IBM DB2 Everyplace Sync Server in mobile applications. The IBM DB2 Everyplace Express Edition isn't open source or free*, so it doesn't qualify for this post. The same is true for the corresponding Sybase SQL Anywhere components.**
* IBM DB2 Everyplace Express Edition with synchronization costs $379 per server (up to two processors) and $79 per user. DB2 Everyplace Database Edition (without DB2 synchronization) is $49 per user. (Prices are based on those when IBM announced version 8 in November 2003.)
** Sybase's iAnywhere subsidiary calls SQL Anywhere 'the industry's leading mobile database.' A Sybase SQL Anywhere Personal DB seat license with synchronization to SQL Anywhere Server is $119; the cost without synchronization wasn't available from the Sybase Web site. Sybase SQL Anywhere and IBM DB2 Everyplace perform similar replication functions.
Sun's Java DB, another commercial version of Derby, comes with the Solaris Enterprise Edition, which bundles Solaris 10, the Java Enterprise System, developer tools, desktop infrastructure and N1 management software. A recent Between the Lines blog entry by ZDNet's David Berlind waxes enthusiastic over the use of Java DB embedded in a browser to provide offline persistence. RedMonk analyst James Governor and eWeek's Lisa Vaas wrote about the use of Java DB as a local data store when Tim Bray announced Sun's Derby derivative and Francois Orsini demonstrated Java DB embedded in the Firefox browser at the ApacheCon 2005 conference.
Firebird is derived from Borland's InterBase 6.0 code, the first commercial relational database management system (RDBMS) to be released as open source. Firebird has excellent support for SQL-92 and comes in three versions: Classic, SuperServer and Embedded for Windows, Linux, Solaris, HP-UX, FreeBSD and MacOS X. The embedded version has a 1.4-MB footprint. Release Candidate 1 for Firebird 2.0 became available on March 30, 2006 and is a major improvement over earlier versions. Borland continues to promote InterBase, now at version 7.5, as a small-footprint, embedded database with commercial Server and Client licenses.
SQLite is a featherweight C library for an embedded database that implements most SQL-92 entry- and transitional-level requirements (some through the JDBC driver) and supports transactions within a tiny 250-KB code footprint. Wrappers support a multitude of languages and operating systems, including Windows CE, SmartPhone, Windows Mobile, and Win32. SQLite's primary SQL-92 limitations are lack of nested transactions, inability to alter a table design once committed (other than with RENAME TABLE and ADD COLUMN operations), and foreign-key constraints. SQLite provides read-only views, triggers, and 256-bit encryption of database files. A downside is the the entire database file is locked when while a transaction is in progress. SQLite uses file access permissions in lieu of GRANT and REVOKE commands. Using SQLite involves no license; its code is entirely in the public domain.The Mozilla Foundation's Unified Storage wiki says this about SQLite: 'SQLite will be the back end for the unified store [for Firefox]. Because it implements a SQL engine, we get querying 'for free', without having to invent our own query language or query execution system. Its code-size footprint is moderate (250k), but it will hopefully simplify much existing code so that the net code-size change should be smaller. It has exceptional performance, and supports concurrent access to the database. Finally, it is released into the public domain, meaning that we will have no licensing issues.'
Vieka Technology, Inc.'s eSQL 2.11 is a port of SQLite to Windows Mobile (Pocket PC and Smartphone) and Win32, and includes development tools for Windows devices and PCs, as well as a .NET native data provider. A conventional ODBC driver also is available. eSQL for Windows (Win32) is free for personal and commercial use; eSQL for Windows Mobile requires a license for commercial (for-profit or business) use.
HSQLDB isn't on most reviewers' radar, which is surprising because it's the default database for OpenOffice.org (OOo) 2.0's Base suite member. HSQLDB 1.8.0.1 is an open-source (BSD license) Java dembedded database engine based on Thomas Mueller's original Hypersonic SQL Project. Using OOo's Base feature requires installing the Java 2.0 Runtime Engine (which is not open-source) or the presence of an alternative open-source engine, such as Kaffe. My prior posts about OOo Base and HSQLDB are here, here and here.
The HSQLDB 1.8.0 documentation on SourceForge states the following regarding SQL-92 and later conformance:
Other less well-known embedded databases designed for or suited to mobile deployment are Mimer SQL Mobile and VistaDB 2.1 . Neither product is open-source and require paid licensing; VistaDB requires a small up-front payment by developers but offers royalty-free distribution.HSQLDB 1.8.0 supports the dialect of SQL defined by SQL standards 92, 99 and 2003. This means where a feature of the standard is supported, e.g. left outer join, the syntax is that specified by the standard text. Many features of SQL92 and 99 up to Advanced Level are supported and here is support for most of SQL 2003 Foundation and several optional features of this standard. However, certain features of the Standards are not supported so no claim is made for full support of any level of the standards.
Java DB, Firebird embedded, SQLite and eSQL 2.11 are contenders for lightweight PC and mobile device database projects that aren't Windows-only.
SQL Server 2005 Everywhere
If you're a Windows developer, SQL Server Mobile is the logical embedded database choice for mobile applications for Pocket PCs and Smartphones. Microsoft's April 19, 2006 press release delivered the news that SQL Server 2005 Mobile Editon (SQL Mobile or SSM) would gain a big brotherâSQL Server 2005 Everywhere Edition.
Currently, the SSM client is licensed (at no charge) to run in production on devices with Windows CE 5.0, Windows Mobile 2003 for Pocket PC or Windows Mobile 5.0, or on PCs with Windows XP Tablet Edition only. SSM also is licensed for development purposes on PCs running Visual Studio 2005. Smart Device replication with SQL Server 2000 SP3 and later databases has been the most common application so far for SSM.
By the end of 2006, Microsoft will license SSE for use on all PCs running any Win32 version or the preceding device OSs. A version of SQL Server Management Studio Express (SSMSE)âupdated to support SSEâis expected to release by the end of the year. These features will qualify SSE as the universal embedded database for Windows client and smart-device applications.
For more details on SSE, read John Galloway's April 11, 2006 blog post and my 'SQL Server 2005 Mobile Goes Everywhere' article for the FTPOnline Special Report on SQL Server."(Via OakLeaf Systems.)
I would like to make an important clarification re. the GData Protocol and what is popularly dubbed as "Adam Bosworth's fingerprints." I do not believe in a one solution (a simple one for the sake of simplicity) to a deceptively complex problem. Virtuoso supports Atom 1.0 (syndication only at the current time) and Atom 0.3 (syndication and publication which have been in place for years)."In my fourth Friday podcast we hear from Kingsley Idehen, CEO of OpenLink Software. I wrote about OpenLink's universal database and app server, Virtuoso, back in 2002 and 2003. Earlier this month Virtuoso became the first mature SQL/XML hybrid to make the transition to open source. The latest incarnation of the product also adds SPARQL (a semantic web query language) to its repertoire. ..."
(Via Jon's Radio.)
BTW - the GData Protocol and Atom 1.0 publishing support will be delivered in both the Open Source and Commercial Edition updates to Virtuoso next week (very little work due to what's already in place).
I make the clarification above to eliminate the possibility of assuming mutual exclusivity of my perspective/vison and Adam's (Jon also makes this important point when he speaks about our opinions being on either side of a spectrum/continuum). I simply want to broaden the scope of this discussion. I am a profound believer in the Semantic Web / Data Web vision, and I predict that we will be querying the Googlebase via SPARQL in the not to distant future (this doesn't mean that netizens will be forced to master SPARQL, absolutely not! But there will be conduit technologies that deal with matter).
Side note: I actually last spoke with Adam at the NY Hilton in 2000 (the day I unveiled Virtuoso to the public for the first time, in person). We bumped into each other and I told him about Virtuoso (at the time the big emphasis was SQL to XML and the vocabulary we had chosen re. SQL extension...), and he told me about his departure from Microsoft and the commencement of his new venture (CrossGain prior to his stint at BEA), what struck me even more was his interest in Linux and Open Source (bearing in mind this was about 3 or so week after he departed Microsoft.)
If you are encountering Virtuoso for the first time via this post or Jon's, please make time to read the product history article on the Virtuoso Wiki (which is one of many Virtuoso based applications that make up our soon to be released OpenLink DataSpace offering).
That said, I better go listen to the podcast :-)
]]>I would like to make an important clarification re. the GData Protocol and what is popularly dubbed as "Adam Bosworth's fingerprints." I do not believe in a one solution (a simple one for the sake of simplicity) to a deceptively complex problem. Virtuoso supports Atom 1.0 (syndication only at the current time) and Atom 0.3 (syndication and publication which have been in place for years)."In my fourth Friday podcast we hear from Kingsley Idehen, CEO of OpenLink Software. I wrote about OpenLink's universal database and app server, Virtuoso, back in 2002 and 2003. Earlier this month Virtuoso became the first mature SQL/XML hybrid to make the transition to open source. The latest incarnation of the product also adds SPARQL (a semantic web query language) to its repertoire. ..."
(Via Jon's Radio.)
BTW - the GData Protocol and Atom 1.0 publishing support will be delivered in both the Open Source and Commercial Edition updates to Virtuoso next week (very little work due to what's already in place).
I make the clarification above to eliminate the possibility of assuming mutual exclusivity of my perspective/vison and Adam's (Jon also makes this important point when he speaks about our opinions being on either side of a spectrum/continuum). I simply want to broaden the scope of this discussion. I am a profound believer in the Semantic Web / Data Web vision, and I predict that we will be querying the Googlebase via SPARQL in the not to distant future (this doesn't mean that netizens will be forced to master SPARQL, absolutely not! But there will be conduit technologies that deal with matter).
Side note: I actually last spoke with Adam at the NY Hilton in 2000 (the day I unveiled Virtuoso to the public for the first time, in person). We bumped into each other and I told him about Virtuoso (at the time the big emphasis was SQL to XML and the vocabulary we had chosen re. SQL extension...), and he told me about his departure from Microsoft and the commencement of his new venture (CrossGain prior to his stint at BEA), what struck me even more was his interest in Linux and Open Source (bearing in mind this was about 3 or so week after he departed Microsoft.)
If you are encountering Virtuoso for the first time via this post or Jon's, please make time to read the product history article on the Virtuoso Wiki (which is one of many Virtuoso based applications that make up our soon to be released OpenLink DataSpace offering).
That said, I better go listen to the podcast :-)
]]>To assist with the general understanding of Virtuoso's SPARQL Implementation, we have released an online version of the RDF DAWG SPARQL Test Suite (hosted by a live Virtuoso Demo & Tutorial Instance).
]]>A powerful next generation server product that implements otherwise distinct server functionality within a single server product. Think of Virtuoso as the server software analog of a dual core processor where each core represents a traditional server functionality realm.
The Virtuoso History page tells the whole story.
90% of the aforementioned functionality has been available in Virtuoso since 2000 with the RDF Triple Store being the only 2006 item.
The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).
Simple, there is no value in a product of this magnitude remaining the "best kept secret". That status works well for our competitors, but absolutely works against the legions of new generation developers, systems integrators, and knowledge workers that need to be aware of what is actually achievable today with the right server architecture.
GPL version 2.
Dual licensing.
The Open Source version of Virtuoso includes all of the functionality listed above. While the Virtual Database (distributed heterogeneous join engine) and Replication Engine (across heterogeneous data sources) functionality will only be available in the commercial version.
On SourceForge.
Of course!
Up until this point, the Virtuoso Product Blog has been a covert live demonstration of some aspects of Virtuoso (Content Management). My Personal Blog and the Virtuoso Product Blog are actual Virtuoso instances, and have been so since I started blogging in 2003.
Is There a product Wiki?
Sure! The Virtuoso Product Wiki is also an instance of Virtuoso demonstrating another aspect of the Content Management prowess of Virtuoso.
Yep! Virtuoso Online Documentation is hosted via yet another Virtuoso instance. This particular instance also attempts to demonstrate Free Text search combined with the ability to repurpose well formed content in a myriad of forms (Atom, RSS, RDF, OPML, and OCS).
The Virtuoso Online Tutorial Site has operated as a live demonstration and tutorial portal for a numbers of years. During the same timeframe (circa. 2001) we also assembled a few Screencast style demos (their look feel certainly show their age; updates are in the works).
BTW - We have also updated the Virtuoso FAQ and also released a number of missing Virtuoso White Papers (amongst many long overdue action items).
]]>
The Dublin Core Metadata Initiative is updating the RDF expression of DC and might add range restrictions to some properties. Mikael Nilsson wondered if we would use the Swoogle Semantic Web search engine to see what types of values are being used with DC properties.
This kind of query is just the ticket for Swoogle. Well, almost. The current web-based interface supports a limited number of query types. Many more can be asked if you use SQL directly to query Swoogle’s underlying databases. We don’t want to provide a direct SQL query service over the main Swoogle database because it’s easy to ask a query that will take a looooooong time to answer and some could even crash the database server. We are planning to put up a second server with a copy of the database and we give Swoogle Power Users (SPUs) access to it.
We ran a simple SQL query to generate some initial data for Mikael showing fall of the DC properties. For each one, we list all of the ranges that values were drawn from and the number of separate documents and triples for each combination. For example
Property
|
Range
|
Documents
|
Triples
|
dc:creater | rdfs:Literal |
32
|
648
|
dc:creator | rdfs:Literal |
234655
|
2477665
|
dc:creator | wn:Person |
2714
|
1138250
|
dc:creator | cc:Agent |
4090
|
6359
|
dc:creator | foaf:Person |
2281
|
5969
|
dc:creator | foaf:Agent |
1723
|
3234
|
Notice that the first property in this partial table is an obvious typo. You can see the complete table as pdf file or as an excel spreadsheet.
[Tim Finin, UMBC ebiquity lab]
(Via Planet RDF.)
]]>Techniques for presenting and managing syndication XML (feeds) are disclosed. In one embodiment, a user can modify how a feed is displayed, such as which content (and how much) is displayed, in what order, and how it is formatted. In another embodiment, a modification regarding how a feed is displayed is stored so that it can be used again at a later time. In yet another embodiment, a user can create a custom feed through aggregation and/or filtering of existing feeds. Aggregation includes, for example, merging the articles of multiple feeds to form a new feed. Filtering includes, for example, selecting a subset of articles of a feed based on whether they satisfy a search query. In yet another embodiment, a user can find articles by entering a search query into a search engine that searches feeds, which will identify one or more articles that satisfy the query.
Clearly Apple don't seem to understand the world of XML, so let me give them a quick recap:
The Blogosphere is a Galaxy within Cyberspace comprised of Solar systems of Blogs that revolve around X-list bloggers, Topics, or more recently Tags; through the gravitational pull of links to RSS (today), Atom (in due course), and RDF (the future).
Unfortunately, Apple (a major late-comer to RSS) doesn't seem to understand that "RSS content search, aggregation and transformation" is practically the same thing as "XML search, aggregation and transformation". Subject matter covered extensively by XML based languages such as XSLT, XPath, XPointer, and XQuery.
Without XML there would be no RSS (as we know it today), and without RSS there would be no Blogosphere.
Repurposing Blogosphere content isn't a novel invention at all. Therefore, filing a patent along such lines is simply uncool by Apple's standards (like the inextricable binding of iWeb to .mac that was touted as innovative and open).
Final note: this blog is driven by a database engine that has understood XML for a long time. This blog has been my live demo of this fact since its inception. Here are a few things that it has done for a very long time (talking prior art here):
The Semantic Web is only the beginning and an enabling technology for realizing the dreams of Vannevar Bush, Doug Engelbart and Tim Berners-Lee: My current and future objective is the creation and wide dissemination of the next generation collaboration and augmentation infrastructure - the Social Semantic Desktop.
To ensure the loop is closed I have deliberately added the following references to this post: Vannevar Bush wrote the seminal article; "As We May Think" in which he describes a theoretical analog computer called: "The Memex" - a World Wide Web precursor. This document was also a source of inspiration for Ted Nelson (discussed briefly in an earlier post re. compatibility of his his vision and those of Tim Berners-Lee).
]]>"Ok, my first attempt at a round-up (in response to Philâs observation of Planetary damage). Thanks to the conference thereâs loads more here than thereâs likely to be subsequent weeks, although itâs still only a fairly random sample and some of the links here are to heaps of other resourcesâ¦
Incidentally, if anyoneâs got a list/links for SemWeb-related blogs that arenât on Planet RDF, Iâd be grateful for a pointer. PS. Ok, I forget⦠are there any blogs that arenât on Daveâs list yet..?
Quote of the week:
In the Semantic Web, it is not the Semantic which is new, it is the Web which is new.
- Chris Welty, IBM (lifted from TimBLâs slides)
I just noticed the article from Dan Zambonini âIs Web 2.0 killing the Semantic Web?â. From my perspective the article shows a misconception that people seems to have around the Semantic Web: the Semantic Web effort itself is not provide applications (like the Web 2.0 meme indicates) - it rather provides standards to interlink applications.
Blog post title of the week:
Alsoâ¦a new threat to Semantic Web developers has been discovered: typhoid!, and the key to the Webâs full potential isâ¦Tetris."
]]>Hot from the Galway sportsdesk:
1. Prize: CONFOTO, appmosphere web applications, Germany
2. Prize: FungalWeb, Concordia University, Canada
3. Prize: Personal Publication Reader, Universität Hannover, Germany
CONFOTO is a browsing and annotation service for conference photos. It combines recent Web trends (tag-based categorization, interactive user interfaces, syndication) with the advantages of Semantic Web platforms (machine-understandable information, an extensible data model, the possibility to mix arbitrary RDF vocabularies).
Congrats bengee!!
(Benjamin had a string of bad luck just prior to the conference, there may still be glitches in the app - ‘my sparql store exploded last week’)
"(Via Raw.)
]]>As you can see Web 2.0 and the Semantic Web are mutually inclusive paradigms as reemphasized via this additional "mashup" (I don't really like the word "mashup", especially as it isn't different from/than "repurposing"?)
. ]]>Stop whatever you are doing ...: "
.. and go and read Tom Coates' explanation of his last project with the BBC. After 21 years working in broadcasting Ireckon this is one of the coolest things to happen for a very, very long time.
The ramifications of this will go very deep indeed."
(Spotted Via The Obvious?.)
Yes, the ramifications are deep! Tom Coates' screencast demonstrates an internal variation of an activity that is taking place on many fronts (concurrently) across the NET. I tend to refer to this effort as "Self Annotation"; the very process that will ultimately take us straight to "Semantic Web". It is going to happen much quicker than anticipated because technology is taking the pain out of metadata annotation (e.g. what you do when you tag everything that is ultimately URI accessible). Technology is basically delivering what Jon Udell calls: "reducing the activation threshold".
Using my comments above for context placement, I suggest you take a look at, or re-read Jon Udell's post titled: Many Meanings of Metadata.
Once again, the Web 2.0 brouhaha (in every sense of the word) is a reaction to a critical inflection that ultimately transitions the "Semantic Web" from "Mirage" to "Nirvana". Put differently (with humor in mind solely!), Web 2.0 is what I tend to call a "John the Baptist" paradigm, and we all know what happened to him :-)
Web 2.0 is a conduit to a far more important destination. The tendency to treat Web 2.0 as a destination rather than a conduit has contributed to the recent spate of Bozo bit flipping posts all over the blogosphere (is this an attempt to behead John, metaphorically speaking?). Humor aside, a really important thing about the Web 2.0 situation is that when we make the quantum evolutionary leap (internet time, mind you) to the "Semantic Web" (or whatever groovy name we dig up for it in due course) we will certainly have a plethora of reference points (I mean Web 2.0 URIs) ensuring that we do not revisit the "Missing Link" evolutionary paradox :-)
BTW - You can see some example of my contribution to the ongoing annotation process by looking at:
SIOC
(Semantically Interlinked Online Communities) is an attempt to link online community sites and to use Semantic Web technologies to describe the information community sites have about their structure and contents and to find related information and new connections between posts.
From the spec, main terms:
I think I probably linked to this before, but it’s come on apace. They’ve now got plugins for Drupal and WordPress, and from the look of it, a fair load more…
There’s obviously some intersection here with the Atom/OWL stuff, and for that matter a hAtom. Heh, gonna be fun figuring out the equivalences.
(via Uldis)
"(Via Raw.)
]]>Anyway, Marc's article is a very refreshing read because it provides a really good insight into the general landscape of a rapidly evolving Web alongside genuine appreciation of our broader timeless pursuit of "Openness".
To really help this document provide additional value have scrapped the content of the original post and dumped it below so that we can appreciate the value of the links embedded within the article (note: thanks to Virtuoso I only had to paste the content into my blog, the extraction to my Linkblog and Blog Summary Pages are simply features of my Virtuoso based Blog Engine):
]]>Breaking the Web Wide Open! (complete story)
Even the web giants like AOL, Google, MSN, and Yahoo need to observe these open standards, or they'll risk becoming the "walled gardens" of the new web and be coolio no more.
Editorial Note: Several months ago, AlwaysOn got a personal invitation from Yahoo founder Jerry Yang "to see and give us feedback on our new social media product, y!360." We were happy to oblige and dutifully showed up, joining a conference room full of hard-core bloggers and new, new media types. The geeks gave Yahoo 360 an overwhelming thumbs down, with comments like, "So the only services I can use within this new network are Yahoo services? What if I don't use Yahoo IM?" In essence, the Yahoo team was booed for being "closed web," and we heartily agreed. With Yahoo 360, Yahoo continues building its own "walled garden" to control its 135 million customersÂan accusation also hurled at AOL in the early 1990s, before AOL migrated its private network service onto the web. As the Economist recently noted, "Yahoo, in short, has old media plans for the new-media era."
The irony to our view here is, of course, that today's AO Network is also a "closed web." In the end, Mr. Yang's thoughtful invitation and our ensuing disappointment in his new service led to the assignment of this article. It also confirmed our existing plan to completely revamp the AO Network around open standards. To tie it all together, we recruited the chief architect of our new site, the notorious Marc Canter, to pen this piece. We look forward to our reader feedback.
Breaking the Web Wide Open!
By Marc Canter
For decades, "walled gardens" of proprietary standards and content have been the strategy of dominant players in mainframe computer software, wireless telecommunications services, and the World Wide WebÂit was their successful lock-in strategy of keeping their customers theirs. But like it or not, those walls are tumbling down. Open web standards are being adopted so widely, with such value and impact, that the web giantsÂAmazon, AOL, eBay, Google, Microsoft, and YahooÂare facing the difficult decision of opening up to what they don't control.
The online world is evolving into a new open web (sometimes called the Web 2.0), which is all about being personalized and customized for each user. Not only open source software, but open standards are becoming an essential component.
Many of the web giants have been using open source software for years. Most of them use at least parts of the LAMP (Linux, Apache, MySQL, Perl/Python/PHP) stack, even if they aren't well-known for giving back to the open source community. For these incumbents that grew big on proprietary web services, the methods, practices, and applications of open source software development are difficult to fully adopt. And the next open source movementsÂwhich will be as much about open standards as about codeÂwill be a lot harder for the incumbents to exploit.
While the incumbents use cheap open source software to run their back-ends systems, their business models largely depend on proprietary software and algorithms. But our view a new slew of open software, open protocols, and open standards will confront the incumbents with the classic Innovator's Dilemma. Should they adopt these tools and standards, painfully cannibalizing their existing revenue for a new unproven concept, or should they stick with their currently lucrative model with the risk that eventually a bunch of upstarts eat their lunch?
Credit should go to several of the web giants who have been making efforts to "open up." Google, Yahoo, eBay, and Amazon all have Open APIs (Application Programming Interfaces) built into their data and systems. Any software developer can access and use them for whatever creative purposes they wish. This means that the API provider becomes an open platform for everyone to use and build on top of. This notion has expanded like wildfire throughout the blogosphere, so nowadays, Open APIs are pretty much required.
Other incumbents also have open strategies. AOL has got the RSS religion, providing a feedreader and RSS search in order to escape the "walled garden of content" stigma. Apple now incorporates podcasts, the "personal radio shows" that are latest rage in audio narrowcasting, into iTunes. Even Microsoft is supporting open standards, for example by endorsing SIP (Session Initiation Protocol) for internet telephony and conferencing over Skype's proprietary format or one of its own devising.
But new open standards and protocols are in use, under construction, or being proposed every day, pushing the envelope of where we are right now. Many of these standards are coming from startup companies and small groups of developers, not from the giants. Together with the Open APIs, those new standards will contribute to a new, open infrastructure. Tens of thousands of developers will use and improve this open infrastructure to create new kinds of web-based applications and services, to offer web users a highly personalized online experience.
A Brief History of Openness
At this point, I have to admit that I am not just a passive observer, full-time journalist or "just some blogger"Âbut an active evangelist and developer of these standards. It's the vision of "open infrastructure" that's driving my company and the reason why I'm writing this article. This article will give you some of the background behind on these standards, and what the evolution of the next generation of open standards will look like.
Starting back in the 1980s, establishing a software standard was a key strategy for any software company. My former company, MacroMind (which became Macromedia), achieved this goal early on with Director. As Director evolved into Flash, the world saw that other companies besides Microsoft, Adobe, and Apple could establish true cross-platform, independent media standards.
Then Tim Berners-Lee and Marc Andreessen came along, and changed the rules of the software business and of entrepreneurialism. No matter how entrenched and "standardized" software was, the rug could still get pulled out from under it. Netscape did it to Microsoft, and then Microsoft did it back to Netscape. The web evolved, and lots of standards evolved with it. The leading open source standards (such as the LAMP stack) became widely used alternatives to proprietary closed-source offerings.
Open standards are more than just technology. Open standards mean sharing, empowering, and community support. Someone floats a new idea (or meme) and the community runs with it â with each person making their own contributions to the standard â evolving it without a moment's hesitation about "giving away their intellectual property."
One good example of this was Dave Sifry, who built the Technorati blog-tracking technology inspired by the Blogging Ecosystem, a weekend project by young hacker Phil Pearson. Dave liked what he saw and he ran with itÂturning Technorati into what it is today.
Dave Winer has contributed enormously to this area of open standards. He defined and personally created several open standards and protocolsÂsuch as RSS, OPML, and XML-RPC. Dave has also helped build the blogosphere through his enthusiasm and passion.
By 2003, hundreds of programmers were working on creating and establishing new standards for almost everything. The best of these new standards have evolved into compelling web services platforms â such as del.icio.us, Webjay, or Flickr. Some have even spun off formal standards â like XSPF (a standard for playlists) or instant messaging standard XMPP (also known as Jabber).
Today's Open APIs are complemented by standardized SchemasÂthe structure of the data itself and its associated meta-data. Take for example a podcasting feed. It consists of: a) the radio show itself, b) information on who is on the show, what the show is about and how long the show is (the meta-data) and also c) API calls to retrieve a show (a single feed item) and play it from a specified server.
The combination of Open APIs, standardized schemas for handling meta-data, and an industry which agrees on these standards are breaking the web wide open right now. So what new open standards should the web incumbentsÂand youÂbe watching? Keep an eye on the following developments:
Identity
Attention
Open Media
Microcontent Publishing
Open Social Networks
Tags
Pinging
Routing
Open Communications
Device Management and Control
1. Identity
Right now, you don't really control your own online identity. At the core of just about every online piece of software is a membership system. Some systems allow you to browse a site anonymouslyÂbut unless you register with the site you can't do things like search for an article, post a comment, buy something, or review it. The problem is that each and every site has its own membership system. So you constantly have to register with new systems, which cannot share dataÂeven you'd want them to. By establishing a "single sign-on" standard, disparate sites can allow users to freely move from site to site, and let them control the movement of their personal profile data, as well as any other data they've created.
With Passport, Microsoft unsuccessfully attempted to force its proprietary standard on the industry. Instead, a world is evolving where most people assume that users want to control their own data, whether that data is their profile, their blog posts and photos, or some collection of their past interactions, purchases, and recommendations. As long as users can control their digital identity, any kind of service or interaction can be layered on top of it.
Identity 2.0 is all about users controlling their own profile data and becoming their own agents. This way the users themselves, rather than other intermediaries, will profit from their ID info. Once developers start offering single sign-on to their users, and users have trusted places to store their dataÂwhich respect the limits and provide access controls over that data, users will be able to access personalized services which will understand and use their personal data.
Identity 2.0 may seem like some geeky, visionary future standard that isn't defined yet, but by putting each user's digital identity at the core of all their online experiences, Identity 2.0 is becoming the cornerstone of the new open web.
The Initiatives:
Right now, Identity 2.0 is under construction through various efforts from Microsoft (the "InfoCard" component built into the Vista operating system and its "Identity Metasystem"), Sxip Identity, Identity Commons, Liberty Alliance, LID (NetMesh's Lightweight ID), and SixApart's OpenID.
More Movers and Shakers:
Identity Commons and Kaliya Hamlin, Sxip Identity and Dick Hardt, the Identity Gang and Doc Searls, Microsoft's Kim Cameron, Craig Burton, Phil Windley, and Brad Fitzpatrick, to name a few.
2. Attention
How many readers know what their online attention is worth? If you don't, Google and Yahoo doÂthey make their living off our attention. They know what we're searching for, happily turn it into a keyword, and sell that keyword to advertisers. They make money off our attention. We don't.
Technorati and friends proposed an attention standard, Attention.xml, designed to "help you keep track of what you've read, what you're spending time on, and what you should be paying attention to." AttentionTrust is an effort by Steve Gillmor and Seth Goldstein to standardize on how captured end-user performance, browsing, and interest data are used.
Blogger Peter Caputa gives a good summary of AttentionTrust:"As we use the web, we reveal lots of information about ourselves by what we pay attention to. Imagine if all of that information could be stored in a nice neat little xml file. And when we travel around the web, we can optionally share it with websites or other people. We can make them pay for it, lease it ... we get to decide who has access to it, how long they have access to it, and what we want in return. And they have to tell us what they are going to do with our Attention data."
So when you give your attention to sites that adhere to the AttentionTrust, your attention rights (you own your attention, you can move your attention, you can pay attention and be paid for it, and you can see how your attention is used) are guaranteed. Attention data is crucial to the future of the open web, and Steve and Seth are making sure that no one entity or oligopoly controls it.
Movers and Shakers:
Steve Gillmor, Seth Goldstein, Dave Sifry and the other Attention.xml folks.
3. Open Media
Proprietary media standardsÂFlash, Windows Media, and QuickTime, to name a few Âhelped liven up the web. But they are proprietary standards that try to keep us locked in, and they weren't created from scratch to handle today's online content. That's why, for many of us, an Open Media standard has been a holy grail. Yahoo's new Media RSS standard brings us one step closer to achieving open media, as do Ogg Vorbis audio codecs, XSPF playlists, or MusicBrainz. And several sites offer digital creators not only a place to store their content, but also to sell it.
Media RSS (being developed by Yahoo with help from the community) extends RSS and combines it with "RSS enclosures" Âadds metadata to any media itemÂto create a comprehensive solution for media "narrowcasters." To gain acceptance for Media RSS, Yahoo knows it has to work with the community. As an active member of this community, I can tell you that we'll create Media RSS equivalents for rdf (an alternative subscription format) and Atom (yet another subscription format), so no one will be able to complain that Yahoo is picking sides in format wars.
When Yahoo announced the purchase of Flickr, Yahoo founder Jerry Yang insinuated that Yahoo is acquiring "open DNA" to turn Yahoo into an open standards player. Yahoo is showing what happens when you take a multi-billion dollar company and make openness one of its core valuesÂso Google, beware, even if Google does have more research fellows and Ph.D.s.
The open media landscape is far and wide, reaching from game machine hacks and mobile phone downloads to PC-driven bookmarklets, players, and editors, and it includes many other standardization efforts. XSPF is an open standard for playlists, and MusicBrainz is an alternative to the proprietary (and originally effectively stolen) database that Gracenote licenses.
Ourmedia.org is a community front-end to Brewster Kahle's Internet Archive. Brewster has promised free bandwidth and free storage forever to any content creators who choose to share their content via the Internet Archive. Ourmedia.org is providing an easy-to-use interface and community to get content in and out of the Internet Archive, giving ourmedia.org users the ability to share their media anywhere they wish, without being locked into a particular service or tool. Ourmedia plans to offer open APIs and an open media registry that interconnects other open media repositories into a DNS-like registry (just like the www domain system), so folks can browse and discover open content across many open media services. Systems like Brightcove and Odeo support the concept of an open registry, and hope to work with digital creators to sell their work to fulfill the financial aspect of the "Long Tail."
More Movers and Shakers:
Creative Commons, the Open Media Network, Jay Dedman, Ryanne Hodson, Michael Verdi, Eli Chapman, Kenyatta Cheese, Doug Kaye, Brad Horowitz, Lucas Gonze, Robert Kaye, Christopher Allen, Brewster Kahle, JD Lasica, and indeed, Marc Canter, among others.
4. Microcontent Publishing
Unstructured content is cheap to create, but hard to search through. Structured content is expensive to create, but easy to search. Microformats resolve the dilemma with simple structures that are cheap to use and easy to search.
The first kind of widely adopted microcontent is blogging. Every post is an encapsulated idea, addressable via a URL called a permalink. You can syndicate or subscribe to this microcontent using RSS or an RSS equivalent, and news or blog aggregators can then display these feeds in a convenient readable fashion. But a blog post is just a block of unstructured textânot a bad thing, but just a first step for microcontent. When it comes tostructured data, such as personal identity profiles, product reviews, or calendar-type event data, RSS was not designed to maintain the integrity of the structures.
Right now, blogging doesn't have the underlying structure necessary for full-fledged microcontent publishing. But that will change. Think of local information services (such as movie listings, event guides, or restaurant reviews) that any college kid can access and use in her weekend programming project to create new services and tools.
Today's blogging tools will evolve into microcontent publishing systems, and will help spread the notion of structured data across the blogosphere. New ways to store, represent and produce microcontent will create new standards, such as Structured Blogging and Microformats. Microformats differ from RSS feeds in that you can't subscribe to them. Instead, Microformats are embedded into webpages and discovered by search engines like Google or Technorati. Microformats are creating common definitions for "What is a review or event? What are the specific fields in the data structure?" They can also specify what we can do with all this information.OPML (Outline Processor Markup Language) is a hierarchical file format for storing microcontent and structured data. It was developed by Dave Winer of RSS and podcast fame.
Events are one popular type of microcontent. OpenEvents is already working to create shared databases of standardized events, which would get used by a new generation of event portalsâsuch as Eventful/EVDB, Upcoming.org, and WhizSpark. The idea of OpenEvents is that event-oriented systems and services can work together to establish shared events databases (and associated APIs) that any developer could then use to create and offer their own new service or application. OpenReviews is still in the conceptual stage, but it would make it possible to provide open alternatives to closed systems like Epinions, and establish a shared database of local and global reviews. Its shared open servers would be filled with all sorts of reviews for anyone to access.
Why is this important? Because I predict that in the future, 10 times more people will be writing reviews than maintaining their own blog. The list of possible microcontent standards goes on: OpenJobpostings, OpenRecipes, and even OpenLists. Microsoft recently revealed that it has been working on an important new kind of microcontent: Listsâso OpenLists will attempt to establish standards for the kind of lists we all use, such as lists of Links, lists of To Do Items, lists of People, Wish Lists, etc.
Movers and Shakers:
Tantek Ãelik and Kevin Marks of Technorati, Danny Ayers, Eric Meyer, Matt Mullenweg, Rohit Khare, Adam Rifkin, Arnaud Leene, Seb Paquet, Alf Eaton, Phil Pearson, Joe Reger, Bob Wyman among others.
5. Open Social Networks
I'll never forget the first time I met Jonathan Abrams, the founder of Friendster. He was arrogant and brash and he claimed he "owned"Â all his users, and that he was going to monetize them and make a fortune off them. This attitude robbed Friendster of its momentum, letting MySpace, Facebook, and other social networks take Friendster's place.
Jonathan's notion of social networks as a way to control users is typical of the Web 1.0 business model and its attitude towards users in general. Social networks have become one of the battlegrounds between old and new ways of thinking. Open standards for Social Networking will define those sides very clearly. Since meeting Jonathan, I have been working towards finding and establishing open standards for social networks. Instead of closed, centralized social networks with 10 million people in them, the goal is making it possible to have 10 million social networks that each have 10 people in them.
FOAF (which stands for Friend Of A Friend, and describes people and relationships in a way that computers can parse) is a schema to represent not only your personal profile's meta-data, but your social network as well. Thousands of researchers use the FOAF schema in their "Semantic Web" projects to connect people in all sorts of new ways. XFN is a microformat standard for representing your social network, while vCard (long familiar to users of contact manager programs like Outlook) is a microformat that contains your profile information. Microformats are baked into any xHTML webpage, which means thatany blog, social network page, or any webpage in general can "contain" your social network in itÂand be used byany compatible tool, service or application.
PeopleAggregator is an earlier project now being integrated into open content management framework Drupal. The PeopleAggregator APIs will make it possible to establish relationships, send messages, create or join groups, and post between different social networks. (Sneak preview: this technology will be available in the upcoming GoingOn Network.)
All of these open social networking standards mean that inter-connected social networks will form a mesh that will parallel the blogosphere. This vibrant, distributed, decentralized world will be driven by open standards: personalized online experiences are what the new open web will be all aboutÂand what could be more personalized than people's networks?
Movers and Shakers:
Eric Sigler, Joel De Gan, Chris Schmidt, Julian Bond, Paul Martino, Mary Hodder, Drummond Reed, Dan Brickley, Randy Farmer, and Kaliya Hamlin, to name a few.
6. Tags
Nowadays, no self-respecting tool or service can ship without tags. Tags are keywords or phrases attached to photos, blog posts, URLs, or even video clips. These user- and creator-generated tags are an open alternative to what used to be the domain of librarians and information scientists: categorizing information and content using taxonomies. Tags are instead creating "folksonomies."
The recently proposed OpenTags concept would be an open, community-owned version of the popular Technorati Tags service. It would aggregate the usage of tags across a wide range of services, sites, and content tools. In addition to Technorati's current tag features, OpenTags would let groups of people share their tags in "TagClouds." Open tagging is likely to include some of the open identity features discussed above, to create a tag system that is resilient to spam, and yet trustable across sites all over the web.
OpenTags owes a debt to earlier versions of shared tagging systems, which include Topic Exchange and something called the k-collectorÂa knowledge management tag aggregatorÂfrom Italian company eVectors.
Movers & Shakers:
Phil Pearson, Matt Mower , Paolo Valdemarin, and Mary Hodder and Drummond Reed again, among others.
7. Pinging
Websites used to be mostly static. Search engines that crawled (or "spidered") them every so often did a good enough job to show reasonably current versions of your cousin's homepage or even Time magazine's weekly headlines. But when blogging took off, it became hard for search engines to keep up. (Google has only just managed to offer blog-search functionality, despite buying Blogger back in early 2003.)
To know what was new in the blogosphere, users couldn't depend on services that spidered webpages once in a while. The solution: a way for blogs themselves to automatically notify blog-tracking sites that they'd been updated. Weblogs.com was the first blog "ping service": it displayed the name of a blog whenever that blog was updated. Pinging sites helped the blogosphere grow, and more tools, services, and portals started using pinging in new and different ways. Dozens of pinging services and sitesÂmost of which can't talk to each otherÂsprang up.
Matt Mullenweg (the creator of open source blogging software WordPress) decided that a one-stop service for pinging was needed. He created Ping-o-MaticÂwhich aggregates ping services and simplifies the pinging process for bloggers and tool developers. With Ping-o-Matic, any developer can alert all of the industry's blogging tools and tracking sites at once. This new kind of open standard, with shared infrastructure, is a critical to the scalability of Web 2.0 services.
As Matt said:There are a number of services designed specifically for tracking and connecting blogs. However it would be expensive for all the services to crawl all the blogs in the world all the time. By sending a small ping to each service you let them know you've updated so they can come check you out. They get the freshest data possible, you don't get a thousand robots spidering your site all the time. Everybody wins.
Movers and Shakers:
Matt Mullenweg, Jim Winstead, Dave Winer
8. Routing
Bloggers used to have to manually enter the links and content snippets of blog posts or news items they wanted to blog. Today, some RSS aggregators can send a specified post directly into an associated blogging tool: as bloggers browse through the feeds they subscribe to, they can easily specify and send any post they wish to "reblog" from their news aggregator or feed reader into their blogging tool. (This is usually referred to as "BlogThis.") As structured blogging comes into its own (see the section on Microcontent Publishing), it will be increasingly important to maintain the structural integrity of these pieces of microcontent when reblogging them.
Promising standard RedirectThis will combine a "BlogThis"-like capability while maintaining the integrity of the microcontent. RedirectThis will let bloggers and content developers attach a simple "PostThis" button to their posts. Clicking on that button will send that post to the reader/blogger's favorite blogging tool. This favorite tool is specified at the RedirectThis web service, where users register their blogging tool of choice. RedirectThis also helps maintain the integrity and structure of microcontentÂthen it's just up to the user to prefer a blogging tool that also attains that lofty goal of microcontent integrity.
OutputThis is another nascent web services standard, to let bloggers specify what "destinations" they'd like to have as options in their blogging tool. As new destinations are added to the service, more checkboxes would get added to their blogging toolÂallowing them to route their published microcontent to additional destinations.
Movers and Shakers:
Michael Migurski, Lucas Gonze
9. Open Communications
Likely, you've experienced the joys of finding friends on AIM or Yahoo Messenger, or the convenience of Skyping with someone overseas. Not that you're about to throw away your mobile phone or BlackBerry, but for many, also having access to Instant Messaging (IM) and Voice over IP (VoIP) is crucial.
IM and VoIP are mainstream technologies that already enjoy the benefits of open standards. Entire industries are bornÂright this secondÂbased around these open standards. Jabber has been an open IM technology for yearsÂin fact, as XMPP, it was officially dubbed a standard by the IETF. Although becoming an official IETF standard is usually the kiss of death, Jabber looks like it'll be around for a while, as entire generations of collaborative, work-group applications and services have been built on top of its messaging protocol. For VoIP, Skype is clearly the leading standard todayÂthough one could argue just how "open" it is (and defenders of the IETF's SIP standard often do). But it is free and user-friendly, so there won't be much argument from users about it being insufficiently open. Yet there may be a cloud on Skype's horizon: web behemoth Google recently released a beta of Google Talk, an IM client committed to open standards. It currently supports XMPP, and will support SIP for VoIP calls.
Movers and Shakers:
Jeremie Miller, Henning Schulzrinne, Jon Peterson, Jeff Pulver
10. Device Management and Control
To access online content, we're using more and more devices. BlackBerrys, iPods, Treos, you name it. As the web evolves, more and more different devices will have to communicate with each other to give us the content we want when and where we want it. No-one wants to be dependent on one vendor anymoreÂlike, say, SonyÂfor their laptop, phone, MP3 player, PDA, and digital camera, so that it all works together. We need fully interoperable devices, and the standards to make that work. And to fully make use of how content is moving online content and innovative web services, those standards need to be open.
MIDI (musical instrument digital interface), one of the very first open standards in music, connected disparate vendors' instruments, post-production equipment, and recording devices. But MIDI is limited, and MIDI II has been very slow to arrive. Now a new standard for controlling musical devices has emerged: OSC (Open SoundControl). This protocol is optimized for modern networking technology and inter-connects music, video and controller devices with "other multimedia devices." OSC is used by a wide range of developers, and is being taken up in the mainstream MIDI marketplace.
Another open-standards-based device management technology is ZigBee, for building wireless intelligence and network monitoring into all kinds of devices. ZigBee is supported by many networking, consumer electronics, and mobile device companies.
   · · · · · ·  Â
The Change to Openness
The rise of open source software and its "architecture of participation" are completely shaking up the old proprietary-web-services-and-standards approach. Sun MicrosystemsÂwhose proprietary Java standard helped define the Web 1.0Âis opening its Solaris OS and has even announced the apparent paradox of an open-source Digital Rights Management system.
Today's incumbents will have to adapt to the new openness of the Web 2.0. If they stick to their proprietary standards, code, and content, they'll become the new walled gardensÂplaces users visit briefly to retrieve data and content from enclosed data silos, but not where users "live." The incumbents' revenue models will have to change. Instead of "owning" their users, users will know they own themselves, and will expect a return on their valuable identity and attention. Instead of being locked into incompatible media formats, users will expect easy access to digital content across many platforms.
Yesterday's web giants and tomorrow's users will need to find a mutually beneficial new balanceÂbetween open and proprietary, developer and user, hierarchical and horizontal, owned and shared, and compatible and closed.
Marc Canter is an active evangelist and developer of open standards. Early in his career, Marc founded MacroMind, which became Macromedia. These days, he is CEO of Broadband Mechanics, a founding member of the Identity Gang and of ourmedia.org. Broadband Mechanics is currently developing the GoingOn Network (with the AlwaysOn Network), as well as an open platform for social networking called the PeopleAggregator.
A version of the above post appears in the Fall 2005 issue of AlwaysOn's quarterly print blogozine, and ran as a four-part series on the AlwaysOn Network website.(Via Marc's Voice.)
[You donât expect me to work out the CSS right after making it semantic, do you?]
Shift to another universe. Itâs sometime in the late 1990âs. Ramanathan Guha, Tim Bray, Dave Winer, Tantek Ãelik, Dan Libby and Dan Connolly are sharing a jacuzzi*. As they sip Marghueritas, their conversation goes like this:
So, weâve got this idea for publishing content thatâs a bit like CDF, but weâve made the system more of a service than just a desktop thing.
Sounds cool. Might be a good fit with this RDF thing Iâve been working on.
Hmm, Danâs stuff does sound cool, but with all due respect dude, RDF does seem a bit complicated. I really donât think the folks out in userland would get it. And they majored in graphs.
Maybe we could make it a bit more straightforward, you know, like put pointy brackets around it?
Straightforwardâs good. Better still, simple. They like simple.
But what about the rest of the Web, you know, like HTML?
Hmm, but how do we do the timestamping kind of thing, and wrap it up in a âmicropostyâ way, the things that makes this distribution mode work?
Yeah, metadata is cool. Keep the metadata.
Not cheap though. The Web must be cheap. Did Andreesen show you his pictures..?
â¦âMicropostyâ? you mean like my newsletter thing, but on the Web?
Yep, like Cool Diary Entry of the Day
But do we really need 1000 pages of spec for that?
â¦Incidentally, did you see my Box Model Hack?
Yup.
Yup.
Yup.
Yup. I explained that on DaveNet last year.
Hey! Iâve got it: âMyDigitalCocktailâ..?
Hang on, that gives me an ideaâ¦
There was a tangible outcome to this conversation: a document format which supports content and unambiguous, explicit, data and metadata, timestamping and much, much more. Itâs viewable in a regular browser. Can be syndicated; can be aggregated. Unlike forgetful RSS, archives are almost always retrievable using regular HTTP methods. In this universe there was no RSS. No syndication wars. No talking-at-cross-purposes conflict between docheads and dataheads, syntax fans and model fans. No-one had to publish simple data in Byzantine RDF/XML. No-one had to deal with doubly-escaped content and silent data loss. There was no need for any new format for business cards, calendars, blogs, link lists, reviews, pet profiles. XHTML with CSS was more than enough. DanL got the MyNetscape he wanted. Tim got the simple, tight format he wanted. Guha got the AI. Tantek got to do presentations in a cool black raincoat. DanC finally got his schedule on his Palm Pilot. Dave got the credit. MarcC got the parasols and a grass skirt none of the others would admit to having brought.
Shift back to this universe. Check out hAtom. Itâs not finished yet, but Davidâs been methodically working through the (utterly sound) microformats process. Looks good to me.
* apologies for the imagery, but how else do think Silicon Valley might seem to someone raised in the cowpat-coated hills of Derbyshire?
PS. Apologies to everyone mentioned. And before you suggest it, blogging *is* therapy.
"(Via Raw.)
]]>The future of the Web is Semantic: "
A nice quick overview (if you don’t mind the RDF/XML approach) at IBM developerWorks: The future of the Web is Semantic
"(Via Raw.)
]]>just take a look at the oxymoronic Wikipedia 2.0 imbroglio to get my drift. In retrospect, I should have called on Esquire magazine to get the Web 2.0 article going :-) ).Anyway, back to Dare's analysis of Tim's 7 Web 2.0 litmus test items listed below:
And trimmed down to 3 by Dare:
- Services, not packaged software, with cost-effective scalability
- Control over unique, hard-to-recreate data sources that get richer as more people use them
- Trusting users as co-developers
- Harnessing collective intelligence
- Leveraging the long tail through customer self-service
- Software above the level of a single device
- Lightweight user interfaces, development models, AND business models
Well, I would like to summarize this a little further using a few excerpts from my numerous contributions to the Web 2.0 talk page on Wikipedia (albeit mildly revised; see strikeouts etc.):
Exposes Web services that can be accessed on any device or platform by any developer or user. RSS feeds, RESTful APIs and SOAP APIs are all examples of Web services. Harnesses the collective knowledge of its user base to benefit users Leverages the long tail through customer self-service
Web 2.0 is a web ofexecutableservice invocation endpoints (those Web Services URIs) and well-formed content (all of that RSS, Atom, RDF, XHTML, etc. based Web Content out on the NET). Theexecutableservice invocation endpoints and well-formed content are accessible via URIs.Put in even simpler terms, Web 2.0 is an incarnation of the web defined by URIs for invoking Web Services and/or consuming or syndicating well-formed content.
Looks like I've self edited my own definition in the process. :-)
If you don't grok this definition then consider using it as a trigger for taking a closer look at the dynamics that genuinely differentiate Web 1.0 and Web 2.0.
In another Wikipedia "talk page" contribution (regarding "Web 2.0 Business Impact") I attempt to answer the question posed here, which should also shed light on the premise of my definition above:Web 1.0 was about web sites geared towards an interaction with human beings as opposed to computers. In a sense this mirrors the difference between HTML and XML.
A simple example (purchasing a book):
amazon.com provides value to you by enabling you to search and purchase the desired book online via the site http://www.amazon.com.
In the Web 1.0 era the process of searching for your desired book, and then eventually purchasing the book in question, required visible interaction with the site http://www.amazon.com. In today's Web 2.0 based Web the process of discovering a catalog of books, searching for your particular book of interest, and eventually purchasing the book, occurs via Web Services which amazon has chosen to expose via an executable endpoint (the Web point of presence for exposing its Web Services).
Direct interaction via http://www.amazon.com is no longer required. A weblog can quite easily associate keywords, tags, and post categories with items in amazon.com's catalogs. In addition, weblogs can also act as entry points for consuming the amazon.com value proposition (making books available for purchase online), by enabling you to purchase a book directly from the weblog (assuming the blog owner is an amazon associate etc..). Now compare the impact of this kind of value discovery and consumption cycle driven by software to the same process driven by humans interaction with a static or dynamic HTML page (Web 1.0 site).
To surmise, Web 2.0 is a reflection of the potential of XML expressed through the collective impact of Web Services (XML based distributed computing) and Well-formed Content (Blogosphere, Wikisphere, XHTML micro content etc.). The potential simply comes down to the ability to ultimately connect events, triggers, impulses (chatter, conversation, etc.), and data in general via URIs.
Let's never forget that XML is the reason why we have a blogosphere (RSS/Atom/RDF are applications of XML). Likewise, XML is also the reason why we have Web Services (doesn't matter what format).
As I have stated in the past, we must go by Web 2.0 en route what is popularly referred to as the Semantic Web (it will be known by another name by the time we get there; 3.0 or 4.0, who knows or cares?). At the current time, the prerequisite activity of self annotation is in full swing on the current Web, thanks to the inflective effects of Web 2.0.
BTW - Would this URI to all Semantic Web related posts on my blog pass the Web 2.0 litmus test? Likewise, this URI to all Web 2.0 related posts? I wonder :-)
]]>New release of Piggy Bank, the Semantic Web extension for Firefox. It harvests data as you browse (when you click a status bar indicator), which can later be searched and viewed in a facetted browser.
The docs have come along some too -
Piggy Bank can collect pure information in the following cases:
1. The web page has invisible link(s) to RDF data (encoded in RDF/XML or N3 formats).
2. The web page exports an RSS feeds.
3. The address of the web page is a file:/ URL pointing to a directory.
4. Piggy Bank has a "screen scraper" [XSLT or Javascript] that can re-structure the web page HTML code into RDF data.There's a tutorial on writing Javascript screenscrapers on the site, nice touch.
I have also added an architecture diagram to accelerate comprehension (a picture speaks a thousand words...):
The value of the Internet as a repository of useful information is very low. Carl Shapiro in âInformation Rulesâ suggests that the amount of actually useful information on the Internet would fit within roughly 15,000 books, which is about half the size of an average mall bookstore. To put this in perspective: there are over 5 billion unique, static & publicly accessible web pages on the www. Apparently Only 6% of web sites have educational content (Maureen Henninger, âDonât just surf the net: Effective research strategiesâ. UNSW Press). Even of the educational content only a fraction is of significant informational value.
..As Stanford students, Larry Page and Sergey Brin looked at the same problemâhow to impart meaning to all the content on the Webâand decided to take a different approach. The two developed sophisticated software that relied on other clues to discover the meaning of content, such as which Web sites the information was linked to. And in 1998 they launched Google..
You mean noise ranking. Now, I don't think Larry and Sergey set out to do this, but Google page ranks are ultimately based on the concept of "Google Juice" (aka links). The value quotient of this algorithm is accelerating at internet speed (ironically, but naturally). Human beings are smarter than computers, we just process data (not information!) much slower that's all. Thus, we can conjure up numerous ways to bubble up the google link ranking algorithms in no time (as is the case today).
..What most differentiates Google's approach from Berners-Lee's is that Google doesn't require people to change the way they post content..
The Semantic Web doesn't require anyone to change how they post content either! It just provides a roadmap for intelligent content managment and consumption through innovative products.
..As Sergey Brin told Infoworld's 2002 CTO Forum, "I'd rather make progress by having computers under-stand what humans write, than by forcing -humans to write in ways that computers can understand." In fact, Google has not participated at all in the W3C's formulation of Semantic Web standards, says Eric Miller..
Semantic Content generated by next generation content managers will make more progress, and they certainly won't require humans to write any differently. If anything, humans will find the process quite refreshing as and when participation is required e.g. clicking bookmarklets associated with tagging services such as 'del.icio.us', 'de.lirio.us', or Unalog and others. But this is only the beginning, if I can click on a bookmarklet to post this blog post to a tagging service, then why wouldn't I be able to incorporate the "tag service post" into the same process that saves my blog post (the post is content that ends up in a content management system aka blog server)?
Yet Google's impact on the Web is so dramatic that it probably makes more sense to call the next generation of the Web the "Google Web" rather than the "Semantic Web."
Ah! so you think we really want the noisy "Google Web" as opposed to a federation of distributed Information- and Knowledgbases ala the "Semantic Web"? I don't think so somehow!
Today we are generally excited about "tagging" but fail to see its correlation with the "Semantic Web", somehow? I have said this before, and I will say it again, the "Semantic Web" is going to be self-annotated by humans with the aid of intelligent and unobtrusive annotation technology solutions. These solutions will provide context and purpose by using our our social essence as currency. The annotation effort will be subliminal, there won't be a "Semantic Web Day" parade or anything of the like. It will appear before us all, in all its glory, without any fanfare. Funnily enough, we might not even call it "The Semantic Web", who cares? But it will have the distinct attributes of being very "Quiet" and highly "Valuable"; with no burden on "how we write", but constructive burden on "why we write" as part of the content contribution process (less Google/Yahoo/etc juice chasing for more knowledge assembly and exchange).
We are social creatures at our core. The Internet and Web have collectively reduced the connectivity hurdles that once made social network oriented solutions implausible. The eradication of these hurdles ultimately feeds the very impulses that trigger the critical self-annotation that is the basis of my fundamental belief in the realization of TBL's Semantic Web vision.
Â
]]>While I'm still trying to figure this out, you should read Shelley's original post, Steve Levy, Dave Sifry, and NZ Bear: You are Hurting Us and see whether you think the arguments against blogrolls are as wrong as I think they are.
"..The Technorati Top 100 is too much like Google in that âÂÂnoiseâ becomes equated with âÂÂauthorityâÂÂ. Rather than provide a method to expose new voices, your list becomes nothing more than a way for those on top to further cement their positions. More, it can be easily manipulated with just the release of a piece of software.."
Here goes:
Blog Editing
I can use any editor that supports the following Blog Post APIs:
- Moveable Type
- Meta Weblog
- Blogger
Typically I use Virtuoso (which has an unreleased WYSIWYG blog post editor), Newzcrawler, ecto, Zempt, or w.bloggar for my posts. If a post is of interest to me, or relevant to our company or customers I tend to perform one of the following tasks:
- Generate a post using the "Blog This" feature of my blog editor
- Write a new post that was triggered by a previously read post etc.
Either way, the posts end up in our company wide blog server that is Virtuoso based (more about this below). The internal blog server automatically categorizes my blog posts, and automagically determines which posts to upstream to other public blogs that I author (e.g http://kidehen.typepad.com ) or co-author (e.g http://www.openlinksw.com/weblogs/uda and http://www.openlinksw.com/weblogs/virtuoso ). I write once and my posts are dispatched conditionally to multiple outlets.
RSS/Atom/RDF Aggregation & Reading
I discover, subscribe to, and view blog feeds using Newzcrawler (primarily), and from time to time for experimentation and evaluation purposes I use RSS Bandit, FeedDemon, and Bloglines. I am in the process of moving this activity over to Virtuoso completely due to the large number of feeds that I consume on a daily basis (scalability is a bit of a problem with current aggregators).
Blog Publishing
When you visit my blog you are experiencing the soon to be released Virtuoso Blog Publishing engine first hand, which is how WebDAV, SQLX, XQuery/XPath, and Free Text etc. come into the mix.
Each time I create a post internally, or subscribe to an external feed, the data ends up in Virtuoso's SQL Engine (this is how we handle some of the obvious scalability challenges associated with large subscription counts). This engine is SQL2000N based, which implies that it can transform SQL to XML on the fly using recent extensions to SQL in the form of SQLX (prior to the emergence of this standard we used the FOR XML SQL syntax extensions for the same result). It also has its own in-built XSLT processor (DB Engine resident), and validating XML parser (with support for XML Schema). Thus, my RSS/RDF/Atom archives, FOAF, BlogRoll, OPML, and OCS blog syndication gems are all live examples of SQLX documents that leverage Virtuoso's WebDAV engine for exposure to Blog Clients.
Blog Search
When you search for blog posts using the basic or advanced search features of my blog, you end up interacting with one of the following methods of querying data hosted in Virtuoso: Free Text Search, XPath, or XQuery. The result sets produced by the search feature uses SQLX to produce subscription gems (RSS/Atom/RDF/OpenSearch) and URIs that enable dynamic tracking of my posts using your search keywords.
BTW - the http://www.openlinksw.com/blog/~kidehen blog home page exists as a result of Virtuoso's Virtual Domain / Multi-Homing Web Server functionality. The entire site resides in an Object Relational DBMS, and I can take my DB file across Windows, Solaris, Linux, Mac OS X, FreeBSD, AIX, HP-UX, IRIX, and SCO UnixWare without missing a single beat! All I have to do is instantiate my Virtuoso server and my weblog is live.
]]>The Internet Archive initiative is building up an amazing collection of content that includes this "must watch" movie about the somewhat forgotten hypercard development environment.
As I watched the hypercard movie I obtained clear reassurance that my vision of Web 2.0 as critical infrastructure for a future Semantic Web isn't unfounded. The solution building methodology espoused by hypercard is exactly how Semantic Web applications will be built, and this will be done by orchestrating the componentary of Web 2.0.
When watching this clip make the following mental adjustments:
Web 2.0 is a reflection of the web taking its first major step out of the technology stone age (certainly the case relative to the hypercard movie and "pre web" application development in general).
]]>
It finally dawned on me what OpenSearch does. Basically you tell it about different search engines by showing it how to query something in each, and get back an RSS return. Then when you search for some term, say foo+bar, it performs the search in all the engines you have configured it for. So it's a way to group a bunch of search engines together and command them all to look for the same thing. It is clever. It is something that hasn't been done before, to my knowledge. That's the good news. The bad news is that Amazon is a leading patent abuser. So as good as this idea is, it's bad for all the rest of us, unless they tell us that they're granting us some kind of license to use the idea. [via Scripting News]
Today is one of those days where one topic appears to be on the mind of many across cyberspace. You guessed right! Its that Web 2.0 thing again.
Paul Bausch brings Yahoo!'s most recent Web 2.0 contribution to our broader attention in this excerpt from his O'Reilly Network article:
I browse news, check stock prices, and get movie times with Yahoo! Even though I interact with Yahoo! technology on a regular basis, I've never thought of Yahoo! as a technology company. Now that Yahoo! has released a Web Services interface, my perception of them is changing. Suddenly having programmatic access to a good portion of their data has me seeing Yahoo! through the eyes of a developer rather than a user.
The great thing about this move by Yahoo! is two fold (IMHO):
The great thing about the Platform oriented Web 2.0 is the ability to syndicate your value proposition (aka products and services) instead of pursuing fallable email campaigns. It enables the auto-discovery of products and services by user agents (the content aspect). Web 2.0 also provides an infrastructure for user agents to enter into a consumptive interactions with discrete or composite Web Services via published endpoints exposed by a platform (the execution aspect).
A scenario example:
You can obtain RSS feeds (electronic product catalogs) from Amazon today, although you have to explicitly locate these catalog-feeds since Amazon doesn't exploit feed auto-discovery within their domain.
If you use Firefox or another auto-discovery supporting RSS/Atom/RDF user agent; visit this URL; Firefox users should simply click on the little orange icon bottom right of the browser's window to its RSS feed auto-discovery in action.
Anyway, once you have the feeds the next step is execution endpoints discovery within the Amazon domain (the conduits to Amazon's order processing system in this example). At the current time there isn't broad standardization of Web Services auto-discovery but it's certainly coming; WSIL is a potential front runner for small scale discovery while UDDI provides a heavier duty equivalent for larger scale tasks that includes discovery and other related functionality realms.
Back to the example trail, by having the RSS/Atom/RDF feed data within the confines of a user agent (an Internet Application to be precise) nothing stops the extraction of key purchasing data from these feeds, plus your consumer data en route to assembling an execution message (as prescribed by the schema of the service in question)for Amazon's order processing/ shopping cart service. All of this happens without ever seeing/eye-balling the Amazon site (a prerequisite of Web 1.0 hence the dated term: Web Site).
To summarize: Web 2.0 enables you to syndicate your value proposition and then have it consumed via Web Services, leveraging computer, as opposed to human interaction cycles. This is how I believe Web 2.0 will ultimately impact the growth rates (in most cases exponentially) of those companies that comprehend its potential.
]]>It is clear that in comparison to the Web of the last century, the nature of data on the Web later in this decade will be very different in the following aspects:
- Volume of data is growing by orders of magnitudes every year
Multimedia and sensor data are becoming more and more common.- Spatio-temporal attributes of data are important.
- Different data sources provide information to form the holistic picture.
- Users are not concerned with the location of data source, as long as its quality and credibility is assured. They want to know the result of the data assimilation (the big picture of the event).
- Real-time data processing is the only way to extract meaningful information
Exploration, not querying, is the predominant mode of interaction, which makes context and state critical.- The user is interested in experience and information, independent of the medium and the source.
Effectively, the nature of the knowledge on the Web is changing very fast. It used to be mostly static text documents; now it will be a combination of live and static multimedia, including text, data and documents with spatio-temporal attributes. Considering these changes, can the search engines developed for static text documents be able to deal with the needs of the Web? [via E M E R G I C . o r g]
No, but this doesn't render them useless since we wouldn't be at this point without the likes of Google, Yahoo! et al. But building upon the data substrate that web data oriented search engines provide is where the next batch of Information access and Knowledge discovery solutions will carve out their space. The symbiotic relationship between Google (data) and Gurunet's Answers.com (Information and Knowledge) is one interesting example.
The Web is a distributed collection of databases that implement variety of data storage models but are commonly accessible via protocols that rely on HTTP for transport (in-bound and out-bound messages) services. These databases increasingly using well-formed XML for query result (data contextualization) persistence and URIs for permenant reference. 'What Database?" you might ask, "What you once called your Web Site, Blog, Wiki, etc.." my time-less reply.
When you have the database that I describe above, and a collection of entry points from which discrete or composite Web Services can be invoked available from one or more internet domains, you end up with what I prefer to call "Web 2.0" presence, or what Richard McManus describes as: "The Web as a Platform".
Here is a collection of posts I have made in the past relating to Web 2.0, note that this list is dynamic since this blog is Virtuoso based (predictably):
Free Text Search with XHTML results page (with Virtuoso generated URIs for RSS, Atom, and RDF): http://www.openlinksw.com/blog/search.vspx?blogid=127&q=web+2.0&type=text&output=html
It's also no secret that I believe that Virtuoso is a bleeding edge Web 2.0 technology platform (and more..). The URIs that I am exposing provide the foundation layer for other complimentary Web initiatives such as the Semantic Web (Web 2.0 provides infrastructure for the Semantic Web as time will show). They are also completely usable outside the realm of this blog.
BTW - Jon Udell is writing, experimenting with, and demonstrating similar concepts across feeds within his Web 2.0 domain.
These are indeed fun times!
]]>Have RSS feeds killed the email star? silicon.com Feb 28 2005 12:58PM GMT
Anyway, back to cognitive dissonance. Could this be the reason for the following?
And more...
]]>By Uche Ogbuji, IBM developerWorks
The world of XML and Web services is huge, and growing. developerWorks does much to map it out for you, but when you're looking for a schema or a public Web service to meet some pressing need, it's useful to have handy several key resources. This tip shows you how to comb through the enormous variety of Internet resources to find schemata and Web services using common search criteria. The best known source for finding public SOAP Web services is XMethods. It has a comprehensive list of SOAP services that you can sort by several criteria. It also provides a demo client so you can try out the services right from the index site. You can also keep track of the listings on XMethods programmatically using UDDI, RSS, and other means.sites that provide directories of Web services include RemoteMethods.com and Web Service List. A chronicle of interesting Web services is Web service of the Day.
One resource that straddles the Web services/Semantic Web is WSindex.org, a directory of Web services, XML, SOAP, UDDI, WSDL, and Semantic Web resources. This site is a hierarchical and searchable directory.
http://www-106.ibm.com/developerworks/xml/library/x-tiplkws.html
]]>Email As A Platform It looks like more people are starting to realize that email is more than it seems. Especially given the drastic increase in storage size of web-based email applications, more people are realizing that email is basically a personal database. People simply store information in their email, from contact information that was emailed to them to schedule information to purchase tracking from emailed receipts. Lots of people email messages to themselves, realizing that email is basically the best "permanent" filing system they have. That's part of the reason why good email search is so important. Of course, what the article doesn't discuss is the next stage of this evolution. If you have a database of important information, the next step is to build useful applications on top of it. In other words, people are starting to realize that email, itself, is a platform for personal information management.
Answers.com was launched a month ago, and its stock is practically on fire! Does this graph tell you anything about subject searches vs keyword searches?
The burgeoning Semantic Web will disrupt the search market in a big way (and for the better IMHO).
]]>
By Jeremy J. Carroll, MultiLingual Computing and Technology
The author gives a brief introduction to the Semantic Web and describes difficulties -- and occasionally solutions -- related to building multilingual Semantic Web sites and applications. The initial drivers for the Semantic Web came from metadata about web pages. Who wrote it?
When? Who owns the copyright? And so on. Conveying such metadata requires agreement about the key terms such as author and date. This agreement has been reached by the Dublin Core community. For example, they have an agreed definition for the term creator, generalizing author for use in metadata records. The Semantic Web does not, however, draw a sharp distinction between metadata about the page and data contained within the page. In both cases, the idea is to provide sufficient structure around the data to turn it into information and to connect the concepts used to express such information with concepts used by others so that this information can become knowledge that can be acted upon.
See also W3C Semantic Web:
http://www.w3.org/2001/sw/]]>I have always believed that self-annotation will ultimately drive the realization of the semantic web vision. GuruNet is an interesting effort that should lead down this path.
Here is GuruNet's answer to the question: What Is SQL?
The Web Services, XML, and RDF angles should be pretty obvious (I hope!).
BTW - GuruNet does have a sync latency issue re. Wikipedia that it will need to address sooner rather than later.
]]>A quick head to blog dispatch of these thoughts (while they remain fresh):
Data is an expression of feedback; a statement (rightly or wrongly so) about an observation. If you think about it, didn't we used to capture observed data on paper in tabular form (row and columns which are analogous to Relational Database Tables and Columns)?
Information is data in context, or as I would prefer to say: contextualized data. Thus, information provides an understanding of data (provides insight about statements of observation). I also recall a myriad of context oriented hierarchical presentation forms: taxonomies and ontologies or conceptual schemas (nowadays expressed in an hierarchical tree form called XML and persisted for future reference in an XML aware database).
Knowledge isn't contextualized information, and it is certainly distinct from information (contrary to many dictionary definitions as highlighted in this post by Amy Gahran). I prefer to define knowledge as the basis of what you can, will, would, should, or might do with information. And all cases we express our levels knowledge by the way we act on the information (or lack there of) at our disposal. Think about brainstorming for a moment; you are trying to determine a path of action based on information at your disposal, a typical action would be to draw conceptual or topic relationship maps (graphing, with direction driven by the information processing action) on a whiteboard or piece of paper. Expressing, sharing, processing, and persisting these concepts and topics graphs are what the 'Graph Model' based semantic/knowledge database is all about.
Our industry has derived appropriate technology solution realms for Data, Information, and Knowledge Management (although we mix them up more often than not). Thus, there is room for Network, Hierarchical, SQL, XML (Semi-Structured Model), Object, Object-Relational, and Associative Model (graph based modeling of: source, verb, target; analogous to subject, predicate, object as per RDF).
We are spawning data, databases, infobases, knowledgebases, networks, and eventually agents, that will reflect the timeless relationships that exist across; data, information, and knowledge.
]]>
Stickiness is a defining characteristic of Web 1.0 . It's all about eyeballs (site visitors) which implied ultimately that all early Web business models ended up down the advertising route.
I always felt that Web 1.0 was akin to having a crowd of people at your reception area seeking a look at your corporate brochures, and then someone realizes that you could start selling AD space in these brochures in response to the growing crowd size and frequency of congregation. The long-term folly of this approach is now obvious, as many organizations forgot their core value propositions (expressed via product offerings) in the process and wandered blindly down the AD model cul-de-sac, and we all know what happened down there..
Web 2.0 is taking shape (the inflection is in its latter stages), and the defining characteristics of Web 2.0 are:
When you factor in all of the above, the real question is whether Google and others are equipped to exploit Web 2.0? To some degree, is the best answer at the current time as they have commenced the transition from "content only" web site to web platform (via the many Web Services initiatives that expose SOAP and REST interfaces to various services), but there is much more to this journey, and that's the devil in the "competitive landscape details".
From my obviously biased perspective, I think Virtuoso and Yukon+WinFS provide the server models for driving Web 2.0 points of presence (single server instances that implement multiple protocols). Thus, if Google, Yahoo! et al. aren't exploiting these or similar products, then they will be vulnerable over the long term to the competitve challenges that a Web 2.0 landscape will present.
]]>In section, 4.1 Human-friendly Syntax, you say "There must be a text-based form of the query language which can be read and written by users of the language", and you list the status as "pending".
As background for section 4.1, you may be interested in RDFQueryLangComparison1 (original text replaced with live link).
It shows how to write queries in a form that includes English meanings.
The example queries can be run by pointing a browser to www.reengineeringllc.com .
Perhaps importantly, given the intricacy of RDF for nonprogrammers, one can get an English explanation of the result of each query.
-- Dr. Adrian Walker of Internet Business Logic
The Semantic Web continues to take shape, and Infonauts (information centric agents) are already taking shape.
A great thing about the net is the "back to the future" nature of most Web and Internet technology. For instance we are now frenzied about Service Oriented Architecture (SOA), Event Drivent Architecture (EDA), Loose Coupling of Composite Services etc. Basically rehashing the CORBA vision.
I see the Semantic Web playing a similar role in relation to artificial intelligence.
BTW - It still always comes down to data, and as you can imagine Virtuoso will be playing its usual role of alleviating the practical implementation and ulization challenges of all of the above :-)
]]>
In section, 4.1 Human-friendly Syntax, you say "There must be a text-based form of the query language which can be read and written by users of the language", and you list the status as "pending".
As background for section 4.1, you may be interested in RDFQueryLangComparison1 (original text replaced with live link).
It shows how to write queries in a form that includes English meanings.
The example queries can be run by pointing a browser to www.reengineeringllc.com .
Perhaps importantly, given the intricacy of RDF for nonprogrammers, one can get an English explanation of the result of each query.
-- Dr. Adrian Walker of Internet Business Logic
The Semantic Web continues to take shape, and Infonauts (information centric agents) are already taking shape.
A great this about the net is the "back to the future" nature of most Web and Internet technology. For instance we are now frenzied about Service Oriented Architecture (SOA), Event Drivent Architecture (EDA), Loose Coupling of Composite Services etc. Basically rehashing the CORBA vision.
I see the Semantic Web playing a similar role in relation to artificial intelligence.
BTW - It still always comes down to data, and as you can imagine Virtuoso will be playing its usual role of alleviating the practical implementation and ulization challenges of all of the above :-)
]]>
Tim Berners-Lee provided a keynote at WWW2004 earlier this week, and Paul Ford provided a keynote breakdown from which I have scrapped a poignant excerpt that helps me illuminate Virtuoso's role in the inevitable semantic web.
First off, I see the Semantic Web as a core component of Web 2.x (a minor upgrade of Web 2.0), and I see Virtuoso as a definitive Web 2.0 (and beyond) technology, hence the use today of the branding term "Universal Server". A term that I expect to become a common product moniker in the not too distant future.
The first challenge that confronts the semantic web is the creation of Semantic content. How will the content be created? Ideally, this should come from data, at the end of the day this is a data contextualization process. The excerpt below from Paul's article highlights the point:
Rather than concerning themselves unduly with hewing to existing ontologies, Berners-Lee pushed developers to start using RDF and triples more aggressively. In particular, he wants to see existing databases exported as RDF, with ontologies created ad-hoc to match the structure of that data. Rather than using PHP scripts only to produce HTML, he suggested, create RDF as well. Then, when all of the RDF is aggregated, apply rules and see what happens. "Let's not fall back on handmade markup."
Data in existing databases does not have to be exported as RDF, especially if sensitivity to change is a specific contextual requirement. Naturally, the assumption is made that most databases don't have the ability to produce RDF so an additonal tool would be required to perform the data exports and transformation, and then a separate HTTP server makes this repurposed RDF data accessible over HTTP.
Later in the talk, he described a cascade of Semantic Web connections, postulating that one day, individuals may be able to follow links from a parts catalog to order status, from location to weather to taxes.
The final excerpt (above) outlines the kinds of interactions that the Semantic Web facilitates. The traversal from a "part catalog" to "order status", or from "location" to "weather" to "taxes", illustrates the roles that services and service orchestration will also play in the Semantic Web era.
Thus, we can safely deduce the following about the semantic web:
I would also like to conclude that what we know today, as the monolithic "point of presence" on the web called a "Web Site" (which infers browsing and page serving), is naturally going to morph into a different kind of "point of presence" that is capable of delivering the following from a single process:
This is what Virtuoso is all about, and why it is described as a "Universal Server"; a server instance that speaks many protocols, delivering a plethora of functionality (Database, Web Services Platform, Orchestration Engine, and more).
]]>Web technologists who are still undecided about that big leap into RDF might take heart from the experience of my own developer, Derek.
The picture below is Derek as I found him just a short year ago in Denbigh, situated in the heart of the 'silicon valley' of North Wales. The poor bloke was on the rocks, desperate for even Access database or ActionScript work.
And now, after making the move to RDF, we see that Derek is now a go-ahead semantic web executive with a large desk and two computers.
Now isn't that living proof that RDF does you (and your wallet) good, all you RDF sceptics!
Man this is interesting. Is Dave Winer getting into the idea of ontology merging? Read this comment and judge for yourself:
?The hierarchy itself is separate ? you (could) publish an OPML file of that hierarchy and put it in a public place.
In answering the question, ?Why do it this way?? Dave gave an interesting response ? the atomization of a blog into feeds would allow users to merge the ?my world? of their blog with content from the many ?their worlds? on the net. Such merged topical hierarchies could then themselves be exported as an OPML file and a defacto statement of ?this is my point of view, my information and other information from beyond my domain that I think is important.?
There?s a number of folk hard at work on this problem: the XFML folks, Paolo Valdemarin & Matt Mower on their k-collector, and our very own SWAD-E thesaurus project which discusses mapping issues . Heck, we even discussed the issue in our requirements specification. All at different levels of sophistication and complexity. I suspect that something as simple as blog categorization won't run into the really hard thesaurus mapping problems, but that it won?t be straightforward as Dave?s comments might lead one to think.
To expand on this theme, let?s think about how people might want to share categories. Firstly, you might want to simply reuse someone else?s categorization scheme. That?s fine as a bootstrap, but what if you already have a scheme? What if the 2 schemes overlap? What happens to your previously categorized blog entries? You might, I suggest, want at least the ability to say ?these two categories are the same?.
Then there?s the aggregation of categories. Without some sort of mapping, two blogs using different categorization schemes are just that ? 2 blogs.
A feasible approach is the decentralized ontology creation favoured by Topic Exchange and k-collector. Here, people suggest new topics/categories, and the (ever growing) structure is shared among the community. A fine idea, but one I fear is not scaleable beyond a very small community.
Finally, there is the idea of ?semantic lenses? ? using 2 different categorization schemes to view the same content.
Oh, one other thing. Reading on, I note ?Dave?s idea is that supporting views other than reverse-chron gives new participants entry points into the data?. I couldn?t agree more, this is what I was trying to demonstrate with semantic view, navigation and query on the semantic blogging prototype.
]]>Planet RDF is an aggregate of the weblogs of software developers in and around the semantic web community. We hope both to take advantage of the community that exists, and also to foster more collaboration between independent developers.
Although by nature not always 100% focused on semantic web content, it provides a great snapshot of the work being done and new web sites of interest to those working on the semantic web.
The participant weblogs are sourced from Dave Beckett's Semantic Web bloggers list,
http://journal.dajobe.org/journal/2003/07/semblogs/, with a bit of additional editorial control to keep the web site focused loosely on topic. Send mail to Dave, dave.beckett@bristol.ac.uk, if you think you have a blog (with a valid RSS 1.0 feed, naturally) that we'd be interested in, and we'll check it out.For the technically curious: web standards are used as much as possible and the usual electically invalid input of HTML from weblogs has been cleaned up to be as near XHTML-valid as we could muster, both in the web page and the aggregated RDF,
http://planetrdf.com/index.rdfPlanet RDF was developed by Matt Biddulph, Dave Beckett and Phil McCarthy.
]]>The thing that most surprised me today in the SoftEdge panel on Social Software was the reaction to RSS. I should be clear that I am an RSS true believer. It seems to me that metadata as a byproduct of social software engines (be it blogging or social networking or whatever) is not only enviable, it is inevitable. RSS and FOAF and other yet-to-be-determined social software data protocols will become standards because it simply makes good sense for them to be standardized. Anyone paying attention to the unbelievable development and adoption curve of wireless can appreciate the immense value driven by standards -- and, in particular, standards that are truly standard. So it came as a bit of a shock to me that when I questioned the panelists on the implications of RSS and the Semantic Web, they were less sold on the inevitability of it all.
When asked the question of whether the proliferation of RSS and FOAF might make it possible for reader technology to be the next killer application in knowledge management, I got very strong reactions from both Reid Hoffman and Meg Hourihan. Reid stated that he did not believe that RSS was sufficiently robust to provide significant value an any level. Meg followed up with a general indictment of the semantic web, which she views merely as a geek utopia. I will admit that I'm a fan of Candide (particularly at the hands of Bernstein), but I hardly view myself as Panglos. One need look no further than, for example, the tools that Oddpost has incorporated into its web email client to allow an integrated email and blog experience. Better yet, through a relatively simple web service, Oddpost can deliver an RSS feed of a particular Google News search so that you can keep track of keywords that are of interest to you without having to visit Google repeatedly to find out if your company or candidate or favorite band has been mentioned in today's news. The same is true of watch lists on Technorati. Rather than periodically check to see if someone has linked to your blog, Technorati will do the work for you and deliver the info to your inbox only when there is information to be delivered. These examples are just the tip of the iceberg but the demonstrate the nascent power of RSS and related standards. I'll have to wait for another panel to have that argument with Reid and Meg.
Q: Amazon.com now runs sites and on-line operations for retailers such as Target and Toys 'R' Us. What's the future for that services business? A: It's a rapidly growing part of our business. And that goes from [large] companies that are customers of that all the way down to individuals using our Web services to tap into the fundamental platform that is Amazon.com. They can build their own applications very effectively. It's almost closer to an ecosystem. Q: So Amazon is becoming a kind of software platform a bit like Microsoft (MSFT )? A: People are building stuff that surprises us. That's what's so interesting about this. We've built this big base of technology to serve ourselves, and now we're opening it up and letting people access it. They're taking these fundamental pieces and building completely new things that not only would we have never gotten around to but in some cases maybe never even have thought of. There are thousands of developers who are building applications using Amazon Web services. The sky's the limit on their creativity. Q: What arises from all those efforts? A: People will be able to build very powerful applications by hooking together a whole bunch of Web services from a whole bunch of different companies. Q: What benefit is Amazon.com getting from this? A: It's too early to say. It's certainly not a major source of revenue for us. But when people use our Web services, they give us credit for that. That turns out to be very helpful.A few years ago the race was on to simply have a Web Site, then this requirement evolved into a requirement for a database driven site. Today we are seeing the final stages of the Web 2.0 inflection which will inevitably change the focus toward the need for a Point of Presence on the Web for exposing or invoking Web Services and/or Syndicating or Subscribing to XML based content. ]]>
]]>MedicineNet.com Announces Free RSS News Syndication Service URLwire Sep 24 2003 10:50AM ET
Back to the article. This is an essay by George Gregorio who is so into auto discovery that he deliberately stuffed his contact details in an FOAF file that you need to auto discover using a FOAF auto discovery aware client (e.g. FOAFnaut or the human brain for instance :-) ) . Anyway, he is an excerpt from his essay (a very good read).
Over a month ago Paul Ford published a great essay entitled How Google beat Amazon and Ebay to the Semantic Web. After reading it the first time I thought it was a great introduction to the Semantic Web, an idea I had been trying to wrap my head around even since encountering RDF as it is baked into RSS 1.0. I had seen the light and bought into the promise of the Semantic Web.
Time passes...
With Dave Winer's floating of the idea of RSS 2.0 discussions ensue about the RDF in RSS 1.0. After spending some time badgering poor Bill Kearney for a concrete benefit of having RDF in RSS 1.0 and not getting a really satisfactory answer I went back and read Paul Ford's essay again. I wanted to get that old religious feeling back again. It didn't work. The magic was gone.
]]>
In the year 2000 the question of the shape and form of XML data was unclear to many, and reading the article below basically took me back in time to when we released Virtuoso 2.0 (we are now at release 3.0 commercially with a 3.2 beta dropping any minute).
RSS is a great XML application, and it does a great job of demonstrating how XML --the new data access foundation layer-- will galvanize the next generation Web (I refer to this as Web 2.0.).
RSS: INJAN (It's not just about news)
RSS is not just about news, according to Ian Davis on rss-dev.
He presents a nice list of alternatives, which I reproduce here (and to which I�d add, of course, bibliography management)
- Sitemaps: one of the S�s in RSS stands for summary. A sitemap is a summary of the content on a site, the items are pages or content areas. This is clearly a non-chronological ordering of items. Is a hierarchy of RSS sitemaps implied here � how would the linking between them work? How hard would it be to hack a web browser to pick up the RSS sitemap and display it in a sidebar when you visit the site?
- Small ads: also known as classifieds. These expire so there�s some kind of dynamic going on here but the ordering of items isn�t necessarily chronological. How to describe the location of the seller, or the condition of the item or even the price. Not every ad is selling something � perhaps it�s to rent out a room.
- Personals: similar model to the small ads. No prices though (I hope). Comes with a ready made vocabulary of terms that could be converted to an RDF schema. Probably should do that just for the hell of it anyway � gsoh
- Weather reports: how about a week�s worth of weather in an RSS channel. If an item is dated in the future, should an aggregator display it before time? Alternate representations include maps of temperature and pressure etc.
- Auctions: again, related to small ads, but these are much more time limited since there is a hard cutoff after which the auction is closed. The sequence of bids could be interesting � would it make sense to thread them like a discussion so you can see the tactics?
- TV listings: this is definitely chronological but with a twist � the items have durations. They also have other metadata such as cast lists, classification ratings, widescreen, stereo, program type. Some types have additional information such as director and production year.
- Top ten listings: top ten singles, books, dvds, richest people, ugliest, rear of the year etc. Not chronological, but has definate order. May update from day to day or even more often.
- Sales reporting: imagine if every department of a company reported their sales figures via RSS. Then the divisions aggregate the departmental figures and republish to the regional offices, who aggregate and add value up the chain. The chairman of the company subscribes to one super-aggregate feed.
- Membership lists / buddy lists: could I publish my buddy list from Jabber or other instant messengers? Maybe as an interchange format or perhaps could be used to look for shared contacts. Lots of potential overlap with FOAF here.
- Mailing lists: or in fact any messaging system such as usenet. There are some efforts at doing this already (e.g. yahoogroups) but we need more information � threads; references; headers; links into archives.
- Price lists / inventory: the items here are products or services. No particular ordering but it�d be nice to be able to subscribe to a catalog of products and prices from a company. The aggregator should be able to pick out price rises or bargains given enough history.
Thus, if we can comprehend RSS (the blog article below does a great job) we should be able to see the fundamental challenges that are before any organization seeking to exploit the potential of the imminent Web 2.0 inflection; how will you cost-effectively create XML data from existing data sources? Without upgrading or switching database engines, operating systems, programming languages? Put differently how can you exploit this phenomenon without losing your ever dwindling technology choices (believe me choices are dwindling fast but most are oblivious to this fact).
Â
xmlrsssyndication]]>Netscan is an interesting NNTP based project and it is pretty much along the same lines of what Virtuoso has provided (albeit with an inferior UI) for NNTP since 1999.
Using Virtuoso the data presented by Netscan could very easily be presented as XML which could then be further processed using XPath, XQuery, and XSL-T with the final result RDF (since this is metadata afterall - another contribution to the Semantic Web)
]]>Certinaly an interesting proposition, or should I say vision, but I don't think this proposition does justice to some of the valid insights contained in this recent IDG interview with Tim O'Reilly. Here are some of Tim's quotes:
"Nobody is pointing out something that I think is way more significant: all of the killer apps of the Internet era: Amazon (.com, Inc), Google (Inc.), and Maps.yahoo.com. They run on Linux or FreeBSD, but they're not apps in the way that people have traditionally thought of applications, so they just don't get considered. Amazon is built with Perl on top of Linux. It's basically a bunch of open source hackers, but they're working for a company that's as fiercely proprietary as any proprietary software company."
Solutions are always more important that the technology that makes up the solutions from a business development perspective. The trouble is that the constituent parts of a solution ultimately affect the longevity of the solution (the future adaptability of the solution), hence the middleware and components segments of the software industry.
"With eBay it's even clearer. The fact is, it's the critical mass of marketplace buyers and sellers and all the information that people have put in that marketplace as a repository."
"So I think we're going to find more and more places where that happens, where somebody gets a critical mass of customers and data and that becomes their source of value. On that basis, I will predict that -- this is an outrageous prediction -- but eBay will buy Oracle someday. The value will have moved so much to people who are not now seen as software suppliers."
In reading this article that I can only assume that Tim does realize the inevitable; computing is, and always will be about data -- creation, transformation, dissemination, and exploitation. That said, you don't maximize the opportunities that such a realization accords by acquiring the largest vendor of database software.
The largest database vendor doesn't imply dominance in any of the following areas:
I see the Internet as the Database (comprising various forms), and the Web as a dominant database segment within Internet realm. Every Internet Point of Presence is really a point of Data interaction; Creation, Storage, Access, Dissemination, and Exploitation.
eBay can acquire a license from Oracle or any other database vendor and still be sucessful, and all they need to do is come to the actual realization that like Amazon and Google they could become a very important Executable and Semantic Web platform by finally understanding that their home page isn't that important, it's the interactions with the site that matter. All of this is certainly achievable without acquiring Oracle.
In short, this applies to any organization that seeks to incorporate the Internet into their operational strategy (Business Development, Customer Services, Intranets, Extranets etc.). I am inclined to believe that Sofware Commoditization (which has been with us for a very long time) is the new moniker for "its all about data" or to quote Sam Ruby, "It's just data".
]]>RSS feeds are everywhere, and they are changing the Web landscape fast. The Web is shifting from distributed freeform database, to distributed semi-structured database.
Amazon.com RSS Feeds They never got around to it, so we set up 160+ separate RSS channels for darn near every type of product on Amazon.com for you. If you have any feedback for this new (free) service, please let us know immediately! We're looking to make it an outstanding and permanent part to your collection. Enjoy! (Chris) [via Lockergnome's Bits and Bytes]
Your Web Site is gradually becoming a database (what?). Yes, your Web Site needs to be driven by database software that can rapidly create RSS feeds for your organizations non XML and XML data sources. Your web site needs to provide direct data access to users, bots, Web Services.
Here is my blog database for instance, you can query the XML data in this database using XQuery, XPath, and Web Services (if I decide to publish any of my XML Query Templates as Web Services).
Note the teaser here, each XML document is zero bytes! This is becuase these are live Virtuoso SQL-XML documents that are producing a variety of XML documents on the fly, which means that they retain a high degree of sensitivity to changes in the underlying databases supplying the data. I could have chosen to make these persistent XML docs with interval based synchronization with the backen data sources (but I chose not to for maximum effect).
As you can see SQL and XML (Relational and Hierarchical Models) engines can co-exist in a single server, ditto Object-Relational (which might be hidden from view but could be used in the SQL that serves the SQL-XML docs), ditto Full Text (see the search feature of this blog) and finally, ditto directed graph model for accessing my RDF data.(more on this as the RDF data pool increases).
]]>Amazon has pretty much got it right!
The perennial question re. Web Services has how does one define Web Services in simple terms. My response has always been:
The ability to interact with a Web Point of Presence without visual navigation. A good example being the ability to send the "amazon.com" site a message in order to order a book instead of physically navigating to the site.
This has been my definition since 2001 long before Amazon implemented it's Web Services APIs.
In recent times I came a cross this post in the general blogsphere at Ecademy(sheer coincedence I might add. I wasn't looking for it, but that's what this emerging semantic web experience is all about):
I thought I'd kick off that old chestnut - "What is a web service?" - again with the definition according to the W3C. They should know ... shouldn't they ...
A Web service is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML. Its definition can be discovered by other software systems. These systems may then interact with the Web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols.
http://www.w3.org/TR/2002/WD-ws-arch-20021114/#whatisws
Accurate, but kind of obscure for the none technical reader.
Sofware companies always seek to reach the land of critical mass (this is the single destination of every software vendor), and critical mass implies the creation of an ecosystem served by the software vendor (Microsoft is king of critical mass and this is the secret of their success!).
Amazon as an eCommerce pioneer has pretty much figure this out (their patent pounding sometime compromises this reality, I certainly don't like this part of their behavior), and they have correctly used Web Services as the vehicle.
Google has pretty much figured this out too, and before Amazon I might add.
Amazon's Software Emerges As Valuable Product I'm surprised that it's taken people this long to realize that the most valuable part of Amazon.com's business might not be their stores, but their ability to run stores for others. Amazon.com still has, by far, some of the best technology out there for running an e-commerce site. In the early days of e-commerce, any good online shopping innovation was quickly copied, but more recently it seems that no one has been able to keep up with Amazon's advancements. It's not clear if this is due to Amazon's patent-crazy nature, or if most others have simply given up the fight. Either way, Amazon is doing their best to capitalize on their technology lead, and it seems that there's no shortage of willing customers. [via Techdirt]
See this futuristic piece (How Google beat Amazon and eBay to the Semantic Web) that sheds some speculative light on how this could play out.
Key excerpt of relevance to us (as potential providers of an application that demonstrates RDFs value prop.):
It's not the syntax that makes the difference, it's the app. History supports this view. How many people tried to pry apart the obscure Excel file format on the Mac? Or the Lotus file format on the PC? Name all the market leaders of the past, and only the Web had both the killer app and a transparent format. Maybe the relationship is multiplicative. Maybe Excel would have been the Web if it had used an open file format that anyone could understand. What if you could have created a spreadsheet with BBEdit or a HyperTalk script? The mind boggles at the possibilities (it never happened, of course).
Even in Office 2003 there is a failure to really open things up.
An aside, Jean Paoli rushes into the room, jumping up and down and saying "That's what I'm doing that's what I'm doing."
Anyway, I don't see any killer apps in the RDF crowd. I see lots of people with strong opinions and not much software. Killer apps are not something you wish into existence. Lots of people have said that RDF models a relational database. Okay that tells me something important, the killer app is a relational database.
Ha Ha!
But we already have relational databases. They were new when I was a grad student, and that was a long time ago.
Yeah, but what we don't have is a relational databases that incorporate RDF as part of the database technology evolution roadmap. Of course many will get it (and FUD-emulate) when we unveil something via Virtuoso.
Key excerpt of relevance to us (as potential providers of an application that demonstrates RDFs value prop.):
It's not the syntax that makes the difference, it's the app. History supports this view. How many people tried to pry apart the obscure Excel file format on the Mac? Or the Lotus file format on the PC? Name all the market leaders of the past, and only the Web had both the killer app and a transparent format. Maybe the relationship is multiplicative. Maybe Excel would have been the Web if it had used an open file format that anyone could understand. What if you could have created a spreadsheet with BBEdit or a HyperTalk script? The mind boggles at the possibilities (it never happened, of course).
Even in Office 2003 there is a failure to really open things up.
An aside, Jean Paoli rushes into the room, jumping up and down and saying "That's what I'm doing that's what I'm doing."
Anyway, I don't see any killer apps in the RDF crowd. I see lots of people with strong opinions and not much software. Killer apps are not something you wish into existence. Lots of people have said that RDF models a relational database. Okay that tells me something important, the killer app is a relational database.
Ha Ha!
But we already have relational databases. They were new when I was a grad student, and that was a long time ago.
Yeah, but what we don't have is a relational databases that incorporate RDF as part of the database technology evolution roadmap. Of course many will get it (and FUD-emulate) when we unveil something via Virtuoso.
How was this acheived?
This is my modified version of #upstream.xml
<!--
You also have to make the following change via the following Userland Radio menu path "Radio"->Window->Radio.root->user->radio->prefs->upstream->servers:
'serverCapabilities'->flError = true;
New Architecture
----------------------
| Blogging Clients
---------------------
|
------------
| Local Radio Userland Web Server
--------------------------------
|
-----------
| Virtuoso Server (RSS, RDF, XML, SQL etc.. in one place for further use)
----------------------
End result is productive blogging, and reusable content storage in my Virtuoso knowledgebase.
]]>