The DBpedia + BBC Combo Linked Dataset is a preconfigured Virtuoso Cluster (4 Virtuoso Cluster Nodes, each comprised of one Virtuoso Instance; initial deployment is to a single Cluster Host, but license may be converted for physically distributed deployment), available via the Amazon EC2 Cloud, preloaded with the following datasets:
The BBC has been publishing Linked Data from its Web Data Space for a number of years. In line with best practices for injecting Linked Data into the World Wide Web (Web), the BBC datasets are interlinked with other datasets such as DBpedia and MusicBrainz.
Typical follow-your-nose exploration using a Web Browser (or even via sophisticated SPARQL query crawls) isn't always practical once you get past the initial euphoria that comes from comprehending the Linked Data concept. As your queries get more complex, the overhead of remote sub-queries increases its impact, until query results take so long to return that you simply give up.
Thus, maximizing the effects of the BBC's efforts requires Linked Data that shares locality in a Web-accessible Data Space â i.e., where all Linked Data sets have been loaded into the same data store or warehouse. This holds true even when leveraging SPARQL-FED style virtualization â there's always a need to localize data as part of any marginally-decent locality-aware cost-optimization algorithm.
This DBpedia + BBC dataset, exposed via a preloaded and preconfigured Virtuoso Cluster, delivers a practical point of presence on the Web for immediate and cost-effective exploitation of Linked Data at the individual and/or service specific levels.
Download Virtuoso installer archive(s). You must deploy the Personal or Enterprise Edition; the Open Source Edition does not support Shared-Nothing Cluster Deployment.
Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell and install directory):
. /opt/virtuoso/virtuoso-enterprise.sh
Optional: To keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME
environment variable to a different directory, e.g.,
export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Note: You will have to adjust this setting every time you shift between this cluster setup and your single-server setup. Either may be made your environment's default through the virtuoso-enterprise.sh
and related scripts.
Set up your cluster by running the mkcluster.sh
script. Note that initial deployment of the DBpedia + BBC Combo requires a 4 node cluster, which is the default for this script.
Start the Virtuoso Cluster with this command:
virtuoso-start.sh
Stop the Virtuoso Cluster with this command:
virtuoso-stop.sh
Navigate to your installation directory.
Download the combo dataset installer script â bbc-dbpedia-install.sh
.
For best results, set the downloaded script to fully executable using this command:
chmod 755 bbc-dbpedia-install.sh
Shut down any Virtuoso instances that may be currently running.
Optional: As above, if you have decided to keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME
environment variable appropriately, e.g.,
export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Run the combo dataset installer script with this command:
sh bbc-dbpedia-install.sh
The combo dataset typically deploys to EC2 virtual machines in under 90 minutes; your time will vary depending on your network connection speed, machine speed, and other variables.
Once the script completes, perform the following steps:
Verify that the Virtuoso Conductor (HTTP-based Admin UI) is in place via:
http://localhost:[port]/conductor
Verify that the Virtuoso SPARQL endpoint is in place via:
http://localhost:[port]/sparql
Verify that the Precision Search & Find UI is in place via:
http://localhost:[port]/fct
Verify that the Virtuoso hosted PivotViewer is in place via:
http://localhost:[port]/PivotViewer
A simple guide usable by any Perl developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.
# # Demonstrating use of a single query to populate a # Virtuoso Quad Store via Perl. # # # HTTP URL is constructed accordingly with CSV query results format as the default via mime type. # use CGI qw/:standard/; use LWP::UserAgent; use Data::Dumper; use Text::CSV_XS; sub sparqlQuery(@args) { my $query=shift; my $baseURL=shift; my $format=shift; %params=( "default-graph" => "", "should-sponge" => "soft", "query" => $query, "debug" => "on", "timeout" => "", "format" => $format, "save" => "display", "fname" => "" ); @fragments=(); foreach $k (keys %params) { $fragment="$k=".CGI::escape($params{$k}); push(@fragments,$fragment); } $query=join("&", @fragments); $sparqlURL="${baseURL}?$query"; my $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); my $req = HTTP::Request->new(GET => $sparqlURL); my $res = $ua->request($req); $str=$res->content; $csv = Text::CSV_XS->new(); foreach $line ( split(/^/, $str) ) { $csv->parse($line); @bits=$csv->fields(); push(@rows, [ @bits ] ); } return \@rows; } # Setting Data Source Name (DSN) $dsn="http://dbpedia.org/resource/DBpedia"; # Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET using the IRI in # FROM clause as Data Source URL en route to DBMS # record Inserts. $query="DEFINE get:soft \"replace\"\n # Generic (non Virtuoso specific SPARQL # Note: this will not add records to the # DBMS SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; $data=sparqlQuery($query, "http://localhost:8890/sparql/", "text/csv"); print "Retrieved data:\n"; print Dumper($data);
Retrieved data: $VAR1 = [ [ 's', 'p', 'o' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2002/07/owl#Thing' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://dbpedia.org/ontology/Work' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://dbpedia.org/class/yago/Software106566077' ], ...
CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Perl developer that already knows how to use Perl for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
DBpedia is a community effort to provide a contemporary deductive database derived from Wikipedia content. Project contributions can be partitioned as follows:
Comprising the nucleus of the Linked Open Data effort, DBpedia also serves as a fulcrum for the burgeoning Web of Linked Data by delivering a dense and highly-interlinked lookup database. In its most basic form, DBpedia is a great source of strong and resolvable identifiers for People, Places, Organizations, Subject Matter, and many other data items of interest. Naturally, it provides a fantastic starting point for comprehending the fundamental concepts underlying TimBL's initial Linked Data meme.
Depending on your particular requirements, whether personal or service-specific, DBpedia offers the following:
OpenLink Software has preloaded the DBpedia 3.6 datasets into a preconfigured Virtuoso Cluster Edition database, and made the package available for easy installation.
The DBpedia+Virtuoso package provides a cost-effective option for personal or service-specific incarnations of DBpedia.
For instance, you may have a service that isn't best-served by competing with the rest of the world for ad-hoc query time and resources on the live instance, which itself operates under various restrictions which enable this ad-hoc query service to be provided at Web Scale.
Now you can easily commission your own instance and quickly exploit DBpedia and Virtuoso's database feature set to the max, powered by your own hardware and network infrastructure.
Pre-requisites are simply:
To install the Virtuoso Cluster Edition simply perform the following steps:
Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell):
. /opt/virtuoso/virtuoso-enterprise.sh
mkcluster.sh
script which defaults to a 4 node cluster
VIRTUOSO_HOME
environment variable -- if you want to start cluster databases distinct from single server databases via distinct root directory for database files (one that isn't adjacent to single-server database directories)
virtuoso-start.sh
virtuoso-stop.sh
To install your personal or service specific edition of DBpedia simply perform the following steps:
dbpedia-install.sh
)
chmod 755 dbpedia-install.sh
VIRTUOSO_HOME
environment variable, e.g., to the current directory, via command (this may vary depending on your shell):
export VIRTUOSO_HOME=`pwd`
sh dbpedia-install.sh
Once the installation completes (approximately 1 hour and 30 minutes from start time), perform the following steps:
http://localhost:[port]/conductor
http://localhost:[port]/fct
http://localhost:[port]/resource/DBpedia
A simple guide usable by any Javascript developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.
/* Demonstrating use of a single query to populate a # Virtuoso Quad Store via Javascript. */ /* HTTP URL is constructed accordingly with JSON query results format as the default via mime type. */ function sparqlQuery(query, baseURL, format) { if(!format) format="application/json"; var params={ "default-graph": "", "should-sponge": "soft", "query": query, "debug": "on", "timeout": "", "format": format, "save": "display", "fname": "" }; var querypart=""; for(var k in params) { querypart+=k+"="+encodeURIComponent(params[k])+"&"; } var queryURL=baseURL + '?' + querypart; if (window.XMLHttpRequest) { xmlhttp=new XMLHttpRequest(); } else { xmlhttp=new ActiveXObject("Microsoft.XMLHTTP"); } xmlhttp.open("GET",queryURL,false); xmlhttp.send(); return JSON.parse(xmlhttp.responseText); } /* setting Data Source Name (DSN) */ var dsn="http://dbpedia.org/resource/DBpedia"; /* Virtuoso pragma "DEFINE get:soft "replace" instructs Virtuoso SPARQL engine to perform an HTTP GET using the IRI in FROM clause as Data Source URL with regards to DBMS record inserts */ var query="DEFINE get:soft \"replace\"\nSELECT DISTINCT * FROM <"+dsn+"> WHERE {?s ?p ?o}"; var data=sparqlQuery(query, "/sparql/");
Place the snippet above into the <script/> section of an HTML document to see the query result.
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Javascript developer that already knows how to use Javascript for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any PHP developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. PHP.
#!/usr/bin/env php <?php # # Demonstrating use of a single query to populate a # Virtuoso Quad Store via PHP. # # HTTP URL is constructed accordingly with JSON query results format in mind. function sparqlQuery($query, $baseURL, $format="application/json") { $params=array( "default-graph" => "", "should-sponge" => "soft", "query" => $query, "debug" => "on", "timeout" => "", "format" => $format, "save" => "display", "fname" => "" ); $querypart="?"; foreach($params as $name => $value) { $querypart=$querypart . $name . '=' . urlencode($value) . "&"; } $sparqlURL=$baseURL . $querypart; return json_decode(file_get_contents($sparqlURL)); }; # Setting Data Source Name (DSN) $dsn="http://dbpedia.org/resource/DBpedia"; #Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET #using the IRI in FROM clause as Data Source URL $query="DEFINE get:soft \"replace\" SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; $data=sparqlQuery($query, "http://localhost:8890/sparql/"); print "Retrieved data:\n" . json_encode($data); ?>
Retrieved data: {"head": {"link":[],"vars":["s","p","o"]}, "results": {"distinct":false,"ordered":true, "bindings":[ {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/www.w3.org\/2002\/07\/owl#Thing"}}, {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/dbpedia.org\/ontology\/Work"}}, {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/dbpedia.org\/class\/yago\/Software106566077"}}, ...
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a PHP developer that already knows how to use PHP for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Python developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Python.
#!/usr/bin/env python # # Demonstrating use of a single query to populate a # Virtuoso Quad Store via Python. # import urllib, json # HTTP URL is constructed accordingly with JSON query results format in mind. def sparqlQuery(query, baseURL, format="application/json"): params={ "default-graph": "", "should-sponge": "soft", "query": query, "debug": "on", "timeout": "", "format": format, "save": "display", "fname": "" } querypart=urllib.urlencode(params) response = urllib.urlopen(baseURL,querypart).read() return json.loads(response) # Setting Data Source Name (DSN) dsn="http://dbpedia.org/resource/DBpedia" # Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET # using the IRI in FROM clause as Data Source URL query="""DEFINE get:soft "replace" SELECT DISTINCT * FROM <%s> WHERE {?s ?p ?o}""" % dsn data=sparqlQuery(query, "http://localhost:8890/sparql/") print "Retrieved data:\n" + json.dumps(data, sort_keys=True, indent=4) # # End
Retrieved data: { "head": { "link": [], "vars": [ "s", "p", "o" ] }, "results": { "bindings": [ { "o": { "type": "uri", "value": "http://www.w3.org/2002/07/owl#Thing" }, "p": { "type": "uri", "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" }, "s": { "type": "uri", "value": "http://dbpedia.org/resource/DBpedia" } }, ...
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Python developer that already knows how to use Python for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Ruby developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Ruby.
#!/usr/bin/env ruby # # Demonstrating use of a single query to populate a # Virtuoso Quad Store. # require 'net/http' require 'cgi' require 'csv' # # We opt for CSV based output since handling this format is straightforward in Ruby, by default. # HTTP URL is constructed accordingly with CSV as query results format in mind. def sparqlQuery(query, baseURL, format="text/csv") params={ "default-graph" => "", "should-sponge" => "soft", "query" => query, "debug" => "on", "timeout" => "", "format" => format, "save" => "display", "fname" => "" } querypart="" params.each { |k,v| querypart+="#{k}=#{CGI.escape(v)}&" } sparqlURL=baseURL+"?#{querypart}" response = Net::HTTP.get_response(URI.parse(sparqlURL)) return CSV::parse(response.body) end # Setting Data Source Name (DSN) dsn="http://dbpedia.org/resource/DBpedia" #Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET #using the IRI in FROM clause as Data Source URL query="DEFINE get:soft \"replace\" SELECT DISTINCT * FROM <#{dsn}> WHERE {?s ?p ?o} " #Assume use of local installation of Virtuoso #otherwise you can change URL to that of a public endpoint #for example DBpedia: http://dbpedia.org/sparql data=sparqlQuery(query, "http://localhost:8890/sparql/") puts "Got data:" p data # # End
Got data: [["s", "p", "o"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://www.w3.org/2002/07/owl#Thing"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://dbpedia.org/ontology/Work"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://dbpedia.org/class/yago/Software106566077"], ...
CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Ruby developer that already knows how to use Ruby for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).
SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.
Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.
Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.
What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:
Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:
Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.
Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:
The problems typically take the following form:
To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.
Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.
Many options allow you to easily and quickly generate RDF data from other data sources:
Install the Faceted Browser VAD package (fct_dav.vad
) which delivers the following:
Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --
http://<cname>[:<port>]/describe/?uri=<entity-uri>
<cname>[:<port>]
gets replaced by the host and port of your Virtuoso instance<entity-uri>
gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
How Does Linked Data Address This Problem? It provides critical infrastructure for the WebID Protocol that enables an innovative tweak of SSL/TLS.
What about OpenID? The WebID Protocol embraces and extends OpenID (in an open and positive way) via the WebID + OpenID Hybrid variant of the protocol -- basic effect is that OpenID calls are re-routed to the WebID aspect which simply removes Username and Password Authentication from the authentication challenge interaction pattern.
I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.
Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).
It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.
SPARQL Endpoint: Linked Open Data Cache (8.5 Billion+ Quad Store which includes data from Geonames and the Linked GeoData Project Data Sets) .
In this post I provide a brief re-introduction to this essential aspect of Virtuoso.
This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).
In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.
The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!
You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).
You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).
It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.
As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.
Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.
A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.
As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.
OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).
The steps are as follows:
In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.
It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.
In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.
Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:
Note to Web Programmers: Linked Data is about Data (Wine) and not about Code (Fish). Thus, it isn't a "programmer only zone", far from it. More than anything else, its inherently inclusive and spreads its participation net widely across: Data Architects, Data Integrators, Power Users, Knowledge Workers, Information Workers, Data Analysts, etc.. Basically, everyone that can "click on a link" is invited to this particular party; remember, it is about "Linked Data" not "Linked Code", after all. :-)
Here is an example of a Linked Data value pyramid that I am stumbling across --with some frequency-- these days (note: 1 being the pyramid apex):
Basically, Linked Data deployment (assigning de-referencable HTTP URIs to DBMS records, their attributes, and attribute values [optionally] ) is occurring last. Even worse, this happens in the context of Linked Open Data oriented endeavors, resulting in nothing but confusion or inadvertent perpetuation of the overarching pragmatically challenged "Semantic Web" stereotype.
As you can imagine, hitting SPARQL as your introduction to Linked Data is akin to hitting SQL as your introduction to Relational Database Technology, neither is an elevator-style value prop. relay mechanism.
In the relational realm, killer demos always started with desktop productivity tools (spreadsheets, report-writers, SQL QBE tools etc.) accessing, relational data sources en route to unveiling the "Productivity" and "Agility" value prop. that such binding delivered i.e., the desktop application (clients) and the databases (servers) are distinct, but operating in a mutually beneficial manner to all, courtesy of a data access standards such as ODBC (Open Database Connectivity).
In the Linked Data realm, learning to embrace and extend best practices from the relational dbms realm remains a challenge, a lot of this has to do with hangovers from a misguided perception that RDF databases will somehow completely replace RDBMS engines, rather than compliment them. Thus, you have a counter productive variant of NIH (Not Invented Here) in play, taking us to the dreaded realm of: Break the Pot and You Own It (exemplified by the 11+ year Semantic Web Project comprehension and appreciation odyssey).
From my vantage point, here is how I believe the Linked Data value pyramid should be layered, especially when communicating the essential value prop.:
Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.
A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.
As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.
OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).
The steps are as follows:
In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist without a fully populated SPARQL compliant Quad Store. Last but not least, it doesn't exist if you have a fully loaded SPARQL compliant Quad Store isn't up to the cocktail of challenges (query load and complexity) presented by live Web database accessibility.
It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.
In the most basic sense, simply browse the HTML based resource decriptor pages en route to discovering erstwhile undiscovered relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records), comprised of HTTP URIs from both realms e.g., via owl:sameAs relations.
Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:
2009 is over. Yeah, sure, trueg, we know that, it has been over for a while now! Ok, ok, I am a bit late, but still I would like to get this one out - if only for my archive. So here goes.
Letâs start with the major topic of 2009 (and also the beginning of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the same problems: you either used the sesame2 backend which depends on Java and steals all of your memory or you were stuck with Redland which had the worst performance and missed some SPARQL features making important parts of Nepomuk like queries unusable. So more than a year ago I had the idea to use the one GPLâed database server out there that supported RDF in a professional manner: OpenLinkâs Virtuoso. It has all the features we need, has a very good performance, and scales up to dimensions we will probably never reach on the desktop (yeah, right, and 64k main memory will be enough forever!). So very early I started coding the necessary Soprano plugin which would talk to a locally running Virtuoso server through ODBC. But since I ran into tons of small problems (as always) and got sidetracked by other tasks I did not finish it right away. OpenLink, however, was very interested in the idea of their server being part of every KDE installation (why wouldnât they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable for the desktop but also helped in debugging all the problems that I had left. Many test runs, patches, and a Virtuoso 5.0.12 release later I could finally announce the Virtuoso integration as usable.
Then end of last year I dropped the support for sesame2 and redland. Virtuoso is now the only supported database backend. The reason is simple: Virtuoso is way more powerful than the rest - not only in terms of performance - and it is fully implemented in C(++) without any traces of Java. Maybe even more important is the integration of the full text index which makes the previously used CLucene index unnecessary. Thus, we can finally combine full text and graph queries in one SPARQL query. This results in a cleaner API and way faster return of search results since there is no need to combine the results from several queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss later.
So now the only thing I am waiting for is the first bugfix release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make 6.0.0 fail with Nepomuk. Should be out any day now. :)
Querying data in Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the very limited capabilities of the ResourceManager to list resources with certain properties or of a certain type; or 2. Write your own SPARQL query using ugly QString::arg replacements.
With the introduction of Virtuoso and its awesome power we can now do pretty much everything in one query. This allowed me to finally create a query API for KDE: Nepomuk::Query::Query and friends. I wonât go into much detail here since I did that before.
All in all you should remember one thing: whenever you think about writing your own SPARQL query in a KDE application - have a look at libnepomukquery. It is very likely that you can avoid the hassle of debugging a query by using the query API.
The first nice effect of the new API (apart from me using it all over the place obviously) is the new query interface in Dolphin. Internally it simply combines a bunch of Nepomuk::Query::Term objects into a Nepomuk::Query::AndTerm. All very readable and no ugly query strings.
An important part of the Nepomuk research project was the creation of a set of ontologies for describing desktop resources and their metadata. After the Xesam project under the umbrella of freedesktop.org had been convinced to use RDF for describing file metadata they developed their own ontology. Thanks to Evgeny (phreedom) Egorochkin and Antonie Mylka both the Xesam ontology and the Nepomuk Information Elements Ontology were already very close in design. Thus, it was relatively easy to merge the two and be left with only one ontology to support. Since then not only KDE but also Strigi and Tracker are using the Nepomuk ontologies.
At the Gran Canaria Desktop Summit I met some of the guys from Tracker and we tried to come up with a plan to create a joint project to maintain the ontologies. This got off to a rough start as nobody really felt responsible. So I simply took the initiative and released the shared-desktop-ontologies version 0.1 in November 2009. The result was a s***-load of hate-mails and bug reports due to me breaking KDE build. But in the end it was worth it. Now the package is established and other projects can start to pick it up to create data compatible to the Nepomuk system and Tracker.
Today the ontologies (and the shared-desktop-ontologies package) are maintained in the Oscaf project at Sourceforge. The situation is far from perfect but it is a good start. If you need specific properties in the ontologies or are thinking about creating one for your own application - come and join us in the bug trackerâ¦
It was at the Akonadi meeting that Will Stephenson and myself got into talking about mimicking some Zeitgeist functionality through Nepomuk. Basically it meant gathering some data when opening and when saving files. We quickly came up with a hacky patch for KIO and KFileDialog which covered most cases and allowed us to track when a file was modified and by which application. This little experiment did not leave that state though (it will, however, this year) but another one did: Zeitgeist also provides a fuse filesystem which allows to browse the files by modification dates. Well, whatever fuse can do, KIO can do as well. Introducing the timeline:/ KIO slave which gives a calendar view onto your files.
Well, I thought I would mention the Tips And Tricks section I wrote for the techbase. It might not be a big deal but I think it contains some valuable information in case you are using Nepomuk as a developer.
This time around I had the privilege to mentor two students in the Google Summer of Code. Alessandro Sivieri and Adam Kidder did outstanding work on Improved Virtual Folders and the Smart File Dialog.
Adamâs work lead me to some heavy improvements in the Nepomuk KIO slaves myself which I only finished this week (more details on that coming up). Alessandro continued his work on faceted file browsing in KDE and created:
Alessandro is following up on his work to make faceted file browsing a reality in 2010 (and KDE SC 4.5). Since it was too late to get faceted browsing into KDE SC 4.4 he is working on Sembrowser, a stand-alone faceted file browser which will be the grounds for experiments until the code is merged into Dolphin.
In 2009 I organized the first Nepomuk workshop in Freiburg, Germany. And also the second one. While I reported properly on the first one I still owe a summary for the second one. I will get around to that - sooner or later. ;)
Soprano gives us a nice command line tool to create a C++ namespace from an ontology file: onto2vocabularyclass. It produces nice convenience namespaces like Soprano::Vocabulary::NAO. Nepomuk adds another tool named nepomuk-rcgen. Both were a bit clumsy to use before. Now we have nice cmake macros which make it very simple to use both.
See the techbase article on how to use the new macros.
Without my knowledge (imagine that!) Andrew Lake created an amazing new media player named Bangarang - a Jamaican word for noise, chaos or disorder. This player is Nepomuk-enabled in the sense that it has a media library which lets you browse your media files based on the Nepomuk data. It remembers the number of times a song or a video has been played and when it was played last. It allows to add detail such as the TV series name, season, episode number, or actors that are in the video - all through Nepomuk (I hope we will soon get tvdb integration).
I am especially excited about this since finally applications not written or mentored by me start contributing Nepomuk data.
2009 was also the year of the first Gnome-KDE joint-conference. Let me make a bulletin for completeness and refer to my previous blog post reporting on my experiences on the island.
Well, that was by far not all I did in 2009 but I think I covered most of the important topics. And after all it is âjust a blog entryâ - there is no need for completeness. Thanks for reading.
Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:
As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:
The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.
I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.
Enjoy!
]]>Enjoy!
]]>As the "Linked Data" meme has gained momentum you've more than likely been on the receiving end of dialog with Linked Open Data community members (myself included) that goes something like this:
"Do you have a URI", "Get yourself a URI", "Give me a de-referencable URI" etc..
And each time, you respond with a URL -- which to the best of your Web knowledge is a bona fide URI. But to your utter confusion you are told: Nah! You gave me a Document URI instead of the URI of a real-world thing or object etc..
Well our everyday use of the Web is an unfortunate conflation of two distinct things, which have Identity: Real World Objects (RWOs) & Address/Location of Documents (Information bearing Resources).
The "Linked Data" meme is about enhancing the Web by unobtrusively reintroducing its core essence: the generic HTTP URI, a vital piece of Web Architecture DNA. Basically, its about so realizing the full capabilities of the Web as a platform for Open Data Identification, Definition, Access, Storage, Representation, Presentation, and Integration.
People, Places, Music, Books, Cars, Ideas, Emotions etc..
A Uniform Resource Identifier. A global identifier mechanism for network addressable data items. Its sole function is Name oriented Identification.
The constituent parts of a URI (from URI Generic Syntax RFC) are depicted below:
A location oriented HTTP scheme based URI. The HTTP scheme introduces a powerful and inherent duality that delivers:
So far so good!
The kind of URI Linked Data aficionados mean when they use the term: URI.
An HTTP URI is an HTTP scheme based URI. Unlike a URL, this kind of HTTP scheme URI is devoid of any Web Location orientation or specificity. Thus, Its inherent duality provides a more powerful level of abstraction. Hence, you can use this form of URI to assign Names/Identifiers to Real World Objects (RWO). Even better, courtesy of the Identity/Address duality of the HTTP scheme, a single URI can deliver the following:
Data about Data. Put differently, data that describes other data in a structured manner.
The predominant model for metadata is the Entity-Attribute-Value + Classes & Relationships model (EAV/CR). A model that's been with us since the inception of modern computing (long before the Web).
The Resource Description Framework (RDF) is a framework for describing Web addressable resources. In a nutshell, its a framework for adding Metadata bearing Information Resources to the current Web. Its comprised of:
The ubiquitous use of the Web is primarily focused on a Linked Mesh of Information bearing Documents. URLs rather than generic HTTP URIs are the prime mechanism for Web tapestry; basically, we use URLs to conduct Information -- which is inherently subjective -- instead of using HTTP URIs to conduct "Raw Data" -- which is inherently objective.
Note: Information is "data in context", it isn't the same thing as "Raw Data". Thus, if we can link to Information via the Web, why shouldn't we be able to do the same for "Raw Data"?
The meme simply provides a set of guidelines (best practices) for producing Web architecture friendly metadata. Meaning: when producing EAV/CR model based metadata, endow Subjects, their Attributes, and Attribute Values (optionally) with HTTP URIs. By doing so, a new level of Link Abstraction on the Web is possible i.e., "Data Item to Data Item" level links (aka hyperdata links). Even better, when you de-reference a RWO hyperdata link you end up with a negotiated representations of its metadata.
Linked Data is ultimately about an HTTP URI for each item in the Data Organization Hierarchy :-)
The HTTP URI is the secret sauce of the Web that is powerfully and unobtrusively reintroduced via the Linked Data meme (classic back to the future act). This powerful sauce possess a unique power courtesy of its inherent duality i.e., how it uniquely combines Data Item Identity (think keys in traditional DBMS parlance) with Data Access (e.g. access to negotiable representations of associated metadata).
As you can see, I've made no mention of RDF or SPARQL, and I can still articulate the inherent value of the "Linked Data" dimension that the "Linked Data" meme adds to the World Wide Web.
As per usual this post is a live demonstration of Linked Data (dog-food style) :-)
As a leading media organization, the BBC's use of Linked Data provides a clear beacon to other media players re. the imminence of a serious Linked Data induced sector inflection. In a nutshell, every Web Site has to evolve into a Linked Data Space: a location on the Web that provides granular access to discrete data items in line with the core principles of the Linked Data meme.
Remember, the essence of the Linked Data meme is simply this: you reference data items and access their metadata, in variety of formats via a single HTTP based URI. This approach to Web data publishing is compatible with any HTTP aware user agent (e.g., your Web Browser or tools & applications that provide abstracted access to HTTP).
There a number of very powerful things available to end-users and developers alike.
The most powerful feature of our variant of the BBC's Linked Data Space is the exposure of Faceted Find (think Search++ and beyond). Thus, you can go the the home page of the service and commence data discovery and exploration via any of the following interfaces:
Once you are comfortable with at least one of the items above, you can exploit the system further by performing any of the following:
In line with the time-tested "embrace and extend" pattern, we provide Full Text search capability, but unlike Google, Yahoo!, Bing and other search engines, we don't use use "Page Rank" algorithm to sort results; instead, we use an "Entity Rank" algorithm since we are dealing with an RDF based Graph model DBMS where links exist between entities across instance data and data dictionary (vocabularies, schemas, ontologies) boundaries. In addition, when you get results (by clicking "show values" or "show values with distinct counts") that list entities associated with a full text search pattern, we take a quantum leap beyond search engines by allowing you to use "Entity Type" and/or "Entity Properties" (all of these have HTTP URIs too) to set your own context for what you seek.
Much more to come in the form of BBC specific demo queries and tutorials :-)
The observations above triggered a discussion thread on Twitter that involved: @edsu, @iand, and moi. Naturally, it morphed into a live demonstration of: human vs machine, interpretation of claims expressed in the RDF graph.
It showcases (in Man vs Machine style) the issue of unambiguously discerning the meaning of the owl:sameAs claim expressed in the LCSH Linked Data Space.
From the Linked Data perspective, it may spook a few people to see owl:sameAs values such as: "info:lc/authorities/sh95000541", that cannot be de-referenced using HTTP.
It may confuse a few people or user agents that see URI de-referencing as not necessarily HTTP specific, thereby attempting to de-reference the URI.URN on the assumption that it's associated with a "handle system", for instance.
It may even confuse RDFizer / RDFization middleware that use owl:sameAs as a data provider attribution mechanism via hint/nudge URI values derived from original content / data URI.URLs that de-reference to nothing e.g., an original resource URI.URL plus "#this" which produces URI.URN-URL -- think of this pattern as "owl:shameAs" in a sense :-)
Simply bring OWL reasoning (inference rules and reasoners) into the mix, thereby negating human dialogue about interpretation which ultimately unveils a mesh of orthogonal view points. Remember, OWL is all about infrastructure that ultimately enables you to express yourself clearly i.e., say what you mean, and mean what you say.
The SPARQL queries against the Graph generated and automatically populated by the Sponger reveal -- without human intervention-- that: "info:lc/authorities/sh95000541", is just an alternative name for < xmlns="http" id.loc.gov="id.loc.gov" authorities="authorities" sh95000541="sh95000541" concept="concept">, and that the graph produced by LCSH is self-describing enough for an OWL reasoner to figure this all out courtesy of the owl:sameAs property :-).
Hopefully, this post also provides a simple example of how OWL facilitates "Reasonable Linked Data".
The acronym stands for: Resource Description Framework. And that's just what it is.
RDF is comprised of a Data Model (EAV/CR Graph) and Data Representation Formats such as: N3, Turtle, RDF/XML etc.
RDF's essence is about: "Entities" and "Attributes" being URI based, while "Values" may be URI or Literals (typed or untyped) based.
URIs are Entity Identifiers.
Short for "Web of Linked Data" or "Linked Data Web".
A term coined by TimBL that describes an HTTP based "data access by reference pattern" that uses a single pointer or handle for "referring to" and "obtaining actual data about" an entity.
Linked Data uses the deceptively simple messaging scheme of HTTP to deliver a granular entity reference and access mechanism that transcends traditional computing boundaries such as: operating system, application, database engines, and networks.
Linked Data simply mandates the following re. RDF:
Note: by Entity I am also referring to: a resource (Web parlance), data item, data object, real-world object, or datum.
Linked Data is also about, using URIs and HTTP's content negotiation feature to separate: presentation, representation, access, and identity of data items. Even better, content negotiation can be driven by user agent and/or data server based quality of service algorithms (representation preference order schemes).
To conclude, Linked Data is ultimately about the realization that: Data is the new Electricity, and it's conductors are URIs :-)
Tip to governments of the world: we are in exponential times, the current downturn is but one side of the "exponential times ledger", the other side of the "exponential times ledger" is simply about unleashing "raw data" -- in structured form -- into the Web, so that "citizen analysts" can blossom and ultimately deliver the transparency desperately sought at every level of the economic value chain. Think: "raw data ready" whenever you ponder about "shovel ready" infrastructure projects!
]]>Your Life, Profession, Web, and Internet do not need to become mutually exclusive due to "information overload".
A platform or service that delivers a point of online presence that embodies the fundamental separation of: Identity, Data Access, Data Representation, Data Presentation, by adhering to Web and Internet protocols.
Typical post installation (Local or Cloud) task sequence:
I've just outlined a snippet of the capabilities of the OpenLink Data Spaces platform. A platform built using OpenLink Virtuoso, architected to deliver: open, platform independent, multi-model, data access and data management across heterogeneous data sources.
All you need to remember is your URI when seeking to interact with your data space.
At the current time we have loaded 100% of all the very large data sets from the LOD Cloud. As result, we can start the process of exposing Linked Data virtues in a manner that's palatable to users, developers, and database professionals across the Web 1.0, 2.0, and 3.0 spectrums.
You can use the "Search & Find" or"URI Lookup" or SPARQL endpoint associated with the LOD cloud hosting instance to perform the following tasks:
If you don't want to use the SPARQL based Web Service, or other Linked Data Web oriented APIs for interacting with the LOD cloud programmatically, you can simply use the powerful REST style Web Service that provides URL parameters for performing full text oriented "Search", entity oriented "Find" queries, and faceted navigation over the huge data corpus with results data returned in JSON and XML formats.
Amazon have agreed to add all the LOD Cloud data sets to their existing public data sets collective. Thus, the data sets we are loading will be available in "raw data" (RDF) format on the public data sets page via Named Elastic Block Storage (EBS) Snapshots); meaning, you can make an EC2 AMI (e.g. a Linux, Windows, Solaris) and install an RDF quad or triple store of choice into your AMI, then simply load data from the LOD cloud based on your needs.
In addition to the above, we are also going to offer a Virtuoso 6.0 Cluster Edition based LOD Cloud AMI (as we've already done with DBpedia, MusicBrainz, NeuroCommons, and Bio2Rdf) that will enable you to simply instantiate a personal and service specific edition of Virtuoso with all the LOD data in place and fully tuned for performance and scalability; basically, you will simply press "Instantiate AMI" and a LOD cloud data space, in true Linked Data from, will be at your disposal within minutes (i.e. the time it takes the DB to start).
Work on the migration of the LOD data to EC2 starts this week. Thus, if you are interested in contributing an RDF based data set to the LOD cloud now is the time to get your archive links in place on the (see: ESW Wiki page for LOD Data Sets).
]]>Â | Web 1.0 | Web 2.0 | Web 3.0 |
Simple Definition | Interactive / Visual Web | Programmable Web | Linked Data Web |
Unit of Presence | Web Page | Web Service Endpoint | Data Space (named structured data enclave) |
Unit of Value Exchange | Page URL | Endpoint URL for API | Resource / Entity / Object URI |
Data Granularity | Low (HTML) | Medium (XML) | High (RDF) |
Defining Services | Search | Community (Blogs to Social Networks) | Find |
Participation Quotient | Low | Medium | High |
Serendipitous Discovery Quotient | Low | Medium | High |
Data Referencability Quotient | Low (Documents) | Medium (Documents) | High (Documents and their constituent Data) |
Subjectivity Quotient | High | Medium (from A-list bloggers to select source and partner lists) | Low (everything is discovered via URIs) |
Transclusence | Low | Medium (Code driven Mashups) | HIgh (Data driven Meshups) |
What You See Is What You Prefer (WYSIWYP) | Low | Medium | High (negotiated representation of resource descriptions) |
Open Data Access (Data Accessibility) | Low | Medium (Silos) | High (no Silos) |
Identity Issues Handling | Low | Medium (OpenID) | High (FOAF+SSL) |
Solution Deployment Model | Centralized | Centralized with sprinklings of Federation | Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia) |
Data Model Orientation | Logical (Tree based DOM) | Logical (Tree based XML) | Conceptual (Graph based RDF) |
User Interface Issues | Dynamically generated static interfaces | Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath) | Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF |
Data Querying | Full Text Search | Full Text Search | Full Text Search + Structured Graph Pattern Query Language (SPARQL) |
What Each Delivers | Democratized Publishing | Democratized Journalism & Commentary (Citizen Journalists & Commentators) | Democratized Analysis (Citizen Data Analysts) |
Star Wars Edition Analogy | Star Wars (original fight for decentralization via rebellion) | Empire Strikes Back (centralization and data silos make comeback) | Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation") |
Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)
Robin:
Web 3.0 is fundamentally about the World Wid Web becoming a structured database equipped with a formal data model (RDF which is a moniker for Entity-Attribute-Value with Classes & Relationships based Graph Model), query language, and a protocol for handling divrerse data representational requirements via negotiation
.Web 3.0 is about a Web that facilitates serendipitous discovery of relevant things; thereby making serendipitous discovery quotient (SDQ), rather than search engine optimization (SEO), the critical success factor that drives how resources get published on the Web.
Personally, I believe we are on the cusp of a major industry inflection re. how we interact with data hosted in computing spaces. In a nutshell, the conceptual model interaction based on real-world entities such as people, places, and other things (including abstract subject matter) will usurp traditional logical model interaction based on rows and columns of typed and/or untyped literal values exemplified by relational data access and management systems.
Labels such as "Web 3.0", "Linked Data", and "Semantic Web", are simply about the aforementioned model transition playing out on the World Wide Web and across private Linked Data Webs such as Intranets & Extranets, as exemplified emergence of the "Master Data Management" label/buzzword.
As was the case with Web Services re. Web 2.0, there is a critical piece of infrastructure driving the evolution in question, and in this case it comes down to the evolution of Hyperlinking.
We now have a new and complimentary variant of Hyperlinking commonly referred to as "Hyperdata" that now sits alongside "Hypertext". Hyperdata when used in conjunction with HTTP based URIs as Data Source Names (or Identifiers), delivers a potent and granular data access mechanism scoped down to the datum (object or record) level; which is much different from the document (record or entity container) level linkage that Hypertext accords.
In addition, the incorporation of HTTP into this new and enhanced granular Data Source Naming mechanism also addresses past challenges relating to separation of data, data representation, and data transmission protocols -- remember XDR woes familiar to all sockets level programmers -- courtesy of in-built content negotiation. Hence, via a simple HTTP GET --against a Data Source Name exposed by a Hyperdata link -- I can negotiate (from client or server sides) the exact representation of the description (entity-attribute-value graph) of an Entity / Data Object / Resource, dispatched by a data server.
For example, this is how a description of entity "Me" ends up being available in (X)HTML or RDF document representations (as you will observe when you click on that link to my Personal URI).
The foundation of what I describe above comes from:
Some live examples from DBpedia:
Today, I revisited the same article -- and to my shock and horror -- my comments do not exist (note: the site did accept my comments yesterday!). Even more frustrating for me, I now have to expend time I don't have re-writing my comments due to the depth and danger of the inaccuracies in this post re. RDF in general.
Please look into what happened to my comments. It's too early for me to conclude that subjective censorship is a play on the Web -- which isn't a hard copy journalistic format style of platform where editors get away with such shenanigans. The Web is a sticky database, and outer joining is well and truly functional (meaning: exclusion and omission ultimately come back to bite via full outer join query results against the Web DB).
By the way, if you publish the comments I made to the post (yesterday), I will add a note to this post, accordingly.
Yes! David just confirmed to me via Twitter that this is yet another comment system related issue and absolutely no intent to censor etc. His words Twervatim :-)
For sake of clarity, I've itemized the inaccuracies and applied my correction comments (inline) accordingly:
Inaccuracy #1:
Resource Description Framework (RDF), a part of the XML story, provides interoperability between applications that exchange information.
Correction #1:
RDF and XML are not inextricably linked in any way. RDF is part Data Model (EAV/CR style Graph) with associated markup and data serialization formats that include: N3, Turtle, TriX, RDF/XML etc.
Inaccuracy #2:
RDF uses XML to define a foundation for processing metadata and to provide a standard metadata infrastructure for both the Web and the enterprise.
Correction #2:
RDF/XML is an XML based markup and data serialization format. As a markup language it can be used for creating RDF model records/statements (using Subject, Predicate, Object or Entity, Attribute, Value). As a serialization format, it provides a mechanism for marshaling RDF data across data managers and data consumers.
Inaccuracy #3:
The difference between the two is that XML is used to transport data using a common format, while RDF is layered on top of XML defining a broad category of data.
Correction #3:
See earlier corrections above.
Inaccuracy #4:
When the XML data is declared to be of the RDF format, applications are then able to understand the data without understanding who sent it.
Correction #4:
You do not declare data to be of RDF format. RDF isn't a format it is a data model (as stated above). You can "up lift" or map data from XML to RDF (hierarchical to graph model mapping). Likewise you can "down shift" or map data from RDF to XML (example: SPARQL SELECT query patterns "down shift" to SPARQL Results XML, which isn't RDF/XML, while keeping access to graphs via URIs or Entity Identifiers that reside within the serialization).
Inaccuracy #5:
RDF extends the XML model and syntax to be specified for describing either resources or a collection of information. (XML points to a resource in order to scope and uniquely identify a set of properties known as the schema.).
Correction #5:
See earlier comments.
The single accurate paragraph in this ebiz article lies right at the end and it states the following:
"I've always thought RDF has been underutilized for data integration, and it's really an old standard. Now that we're focused on both understanding and integrating data, perhaps RDF should make a comeback."
What is our "Search" and "Find" demonstration about? It is about how you use the "Description" of "Things" to unambiguously locate things in a database at Web Scale.
To our perpetual chagrin, we are trying to demonstrate an engine -- not UI prowess -- but the immediate response is to jump to the UI aesthetics.
Google, Yahoo etc.. offer a simple input form for full text search patterns, they have a processing window for completing full text searches across Web Content indexed on their servers. Once the search patterns are processed, you get a page ranked result set (collection of Web pages basically that claim/state: we found N pages out of a document corpus of about M indexed pages).
Note: the estimate aspect of traditional search results in like "advertising small print" the user lives with the illusion that all possible documents on the Web (or even Internet) have been searched whereas in reality: 25% of the possible total is a major stretch; since the Web and Internet are fractal networks and scale-free, inherently growing at exponential rates "ad infinitum" across boundless dimensions of human comprehension.
The power of Linked Data ultimately comes down to the fact that the user constructs the path to what they seek via the properties of the "Things" in question. The routes are not hardwired since URI de-referencing (follow your nose pattern) is available to Linked Data aware query engines and crawlers.
We are simply trying to demonstrate how you can combine the best of full text search with the best of structured querying while reusing familiar interaction patterns from Google/Yahoo. Thus, you start with full text search, find get all the entities associated with the pattern, then use the entity types or entity properties to find what you seek.
You state in your post:
"To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed."
Correct.
"On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?".
Not exactly correct. We are performing aggregates using a configurable interactive time factor. Example: tell me how many entities of type: Person, with interest: Semantic Web, exist in this database within 2 seconds. Also understand that you could retry the same query and get different numbers within the same interactive time factor. It isn't your basic "query cut-off".
"And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can."
Yes, "Microsoft" was a poor example for sure, the example could have been pattern: "glenn mcdonald", which should demonstrate the fundamental utility of what we are trying to demonstrate i.e., entity disambiguation courtesy of entity properties and/or entity type filtering.
Compare Googles results for: Glenn McDonald with those from our demo (which dissambiguate "Glenn McDonald" via associated properties and/or types), assuming we both agree that your Web Site or Blog Home isn't the center of your entity graph or personal data space (i.e., data about you); so getting your home page at the top of the Google page rank offers limited value, in reality.
What are we bragging about? A little more than what you attempt to explain. Yes, we are showing that we can find stuff within a processing window, but understand the following:
I hope I've clarified what's going on with our demo? If not, pose your challenge via examples and I will respond with solutions or simply cry out loud: "no mas!".
As for your "Mac OX X Leopard" comments, I can only say this: I emphasized that this is a demo, the data is pretty old, and the input data has issues (i.e. some of the input data is bad as your example shows). The purpose of this demo is not about the text per se., it's about the size of the data corpus and faceted querying. We are going to have the entire LOD Cloud loaded into the real thing, and in addition to that our Sponger Middleware will be enabled, and then you can take issue with data quality as per your reference to "Cyndi Lauper" (btw - it takes one property filter to find information about her quickly using "dbpprop:name" after filtering for properties with text values).
Of all things, this demo had nothing to do with UI and Information presentation aesthetics. It was all about combining full text search and structured queries (sparql behind the scenes) against a huge data corpus en route to solving challenges associated with faceted browsing over large data sets. We have built a service that resides inside Virtuoso. The Service is naturally of the "Web Service" variety and can be used from any consumer / client environment that speaks HTTP (directly or indirectly).
To be continued ...
]]>A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.
It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.
The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.
Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.
You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.
Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.
"..There is evidence that they promote LINKED DATA at any expense without understanding the rationale behind other approaches...".
To answer the question above, Linked Data is always relevant as long as we are actually talking about "Data" which is simply the case all of the time, irrespective of interaction medium.
If XBRL can be disconnected in anyway from Linked Data, I desperately would like to be enlightened (as per my comments to the post). Why wouldn't anyone desire the ability to navigate the linked data inherent in any financial report? Every entity in an XBRL instance document is an entity, directly or indirectly related to other entities. Why "Mash" the data when you can harmonize XBRL data via a Generic Financial Dictionary (schema or ontology) such that descriptions of Balance Sheet, P&L, and other entities are navigable via their attributes and relationships? In short, why "Mash" (code based brute force joining across disparately shaped data) when you can "Mesh" (natural joining of structured data entities)?
"Linked Data" is about the ability to connect all our observations (data)? , perceptions (information), and inferences / conclusions (knowledge) across a spectrum of interaction media. And it just so happens that the RDF data model (Entity-Attribute-Vaue + Class Relationships + HTTP based Object Identifiers), a range of RDF data model serialization formats, and SPARQL (Query Language and Web Service combo) actually make this possible, in a manner consistent with the essence of the global space we know as the World Wide Web.
A community developed knowledgebase comprised of Bio Informatics data from across 30 or so public data sources. The standard deployment of Bio2Rdf includes a a federation of SPARQL endpoints provided by project members and collaborators.
An Amazon EC2 hosted variant of the Bio2Rdf knowledgebase. In addition to providing a SPARQL endpoint, the data exposed by the Amazon AMI is published in compliance with Linked Data publishing best practices espoused by the Linking Open Data community (LOD).
The ability to instantiate a personal or service-specific variant of this powerful knowledgebase via the Amazon EC2 Cloud. Instead of a 22+ hour error prone odyssey - you simply get down to the task of data analysis and integration within 1.5 hrs (when setting up you AMI for the first time).
Excerpted from the project home page:
The NeuroCommons project seeks to make all scientific research materials - research articles, annotations, data, physical materials - as available and as useable as they can be. We do this by both fostering practices that render information in a form that promotes uniform access by computational agents - sometimes called "interoperability". We want knowledge sources to combine meaningfully, enabling semantically precise queries that span multiple information sources.
In a nutshell, a great project that makes practical use of Linked Data Web technology in the areas of computational biology and neuroscience.
A pre-installed and fully tuned edition of Virtuoso that includes a fully configured Neurocommons Knowledgebase (in RDF Linked Data form) on Amazon's EC2 Cloud platform.
Generally, it provides a no-hassles mechanism for instantiating personal-, organization-, or service-specific instances of a very powerful research knowledgebase within approximately 1.15 hours compared to a lengthy rebuild from RDF source data alternative that takes 14 hours or more, depending on machine hardware configuration and host operating system resources.
A pre-installed and fully tuned edition of Virtuoso that includes a fully configured DBpedia instance on Amazon's EC2 Cloud platform.
Generally, it provides a no hassles mechanism for instantiating personal, organization, or service specific instances of DBpedia within approximately 1.5 hours as opposed to a lengthy rebuild from RDF source data that takes between 8 - 22 hours depending on machine hardware configuration and host operating system resources.
From a Web Entrepreneur perspective it offers all of the generic benefits of a Virtuoso EC2 AMI plus the following:
Here are a few live examples of DBpedia resource URIs deployed and de-referencable via one of my EC2 based personal data spaces:
A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.
From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:
From a Middleware perspective it provides:
From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering
From the general System Administrator's perspective it provides:
Higher level user oriented offerings include:
For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:
Basically this is how it works.
DBpedia replica implies:
Tomorrow is the official go live day (due to last minute price changes), but you can instantiate a paid Virtuoso AMI starting now :-)
To be continued...
]]>If we could just take "The Semantic Web" moniker for what it was -- a code name for an aspect of the Web -- and move on, things will get much clearer, fast!
Basically, what is/was the "Semantic Web" should really have been code named: ("You" Oriented Data Access) as a play on: Yoda's appreciation of the FORCE (Fact ORiented Connected Entities) -- the power of inter galactic, interlinked, structured data, fashioned by the World Wide Web courtesy of the HTTP protocol.
As stated in a earlier post, the next phase of the Web is all about the magic of entity "You". The single most important item of reference to every Web user would be the Person Entity ID (URI). Just by remembering your Entity ID, you will have intelligent pathways across, and into, the FORCE that the Linked Data Web delivers. The quality of the pathways and increased density of the FORCE are the keys to high SDQ (tomorrows SEO). Thus, the SDQ of URIs will ultimately be the unit determinant of value to Web Users, along the following personal lines, hence the critical platform questions:
While most industry commentators continue to ponder and pontificate about what "The Semantic Web" is (unfortunately), the real thing (the "FORCE") is already here, and self-enhancing rapidly.
Assuming we now accept the FORCE is simply an RDF based Linked Data moniker, and that RDF Linked Data is all about the Web as a structured database, we should start to move our attention over to practical exploitation of this burgeoning global database, and in doing so we should not discard knowledge from the past such as the many great examples available gratis from the Relational Database realm. For instance, we should start paying attention to the discovery, development, and deployment of high level tools such as query builders, report writers, and intelligence oriented analytic tools, none of which should -- at first point of interaction -- expose raw RDF or the SPARQL query language. Along similar lines of thinking, we also need development environments and frameworks that are counterparts to Visual Studio, ACCESS, File Maker, and the like.
From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).
Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)
Courtesy of Linked Data, we are now able to extend the "document to document" linking mechanism of the Web (Hypertext Linking) to more granular "entity to entity" level linking. And in doing so, we have a layer of abstraction that in one swoop alleviates all of the infrastructure oriented data access impediments of yore. I know this sounds simplistic, but be rest assured, imbibing Linked Data's value proposition is really just that simple, once you engage solutions (e.g. Virtuoso) that enable you to deploy Linked Data across your enterprise.
Microsoft ACCESS, SQL Server, and Virtuoso all use the Northwind SQL DB Schema as the basis of the demonstration database shipped with each DBMS product. This schema is comprised of common IS/MIS entities that include: Customers, Contacts, Orders, Products, Employees etc.
What we all really want to do as data, information, and knowledge consumers and/or dispatchers, is be no more than a single "mouse click" away from relevant data/information/knowledge data access and/or exploration. Even better (but not always so obvious), we also want anyone in our network (company, division, department, cube-cluster) to inherit these data access efficiencies.
In this example, the Web Page about the Customer "ALKI" provides me with a myriad of exploration and data access paths e.g., when I click on the foaf:primarytopic property value link.
This simple example, via a single Web Page, should put to rest any doubts about the utility of Linked Data. Of course this is an old demo, but this time around the UI is minimalist as my prior attempts skipped a few steps i.e., starting from within a Linked Data explorer/browser.
Important note: I haven't exported SQL into an RDF data warehouse, I am converting the SQL into RDF Linked Data on the fly which has two fundamental benefits:
Enjoy!
CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).
CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.
CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.
CrunchBase: Can you put into laymanâs terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.
CrunchBase: On your website you wrote that âRDF and SPARQL as productivity boosters in everyday web developmentâ. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).
Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.
Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:
]]>
Key points:
SPASQL (SPARQL extension for SQL) enables the intelligent resource representation request handling and URI dereferencing, that underlies "Linked Data" (i.e., Hyperdata Linking) to occur in-process.
My contribution to the developing discourse takes the form of a Q&A session. I've taken the questions posed and provided answers that express my particular points of view:
Q: Is the desktop of the future going to just be a web-hosted version of the same old-fashioned desktop metaphors we have today?
A: No, it's going to be a more Web Architecture aware and compliant variant exposed by appropriate metaphors.
Q: The desktop of the future is going to be a hosted web service
A: A vessel for exploiting the virtues of the Linked Data Web.
Q: The Browser is Going to Swallow Up the Desktop
A: Literally, of course not! Metaphorically, of course! And then the Browser metaphor will decomposes into function specific bits of Web interaction amenable to orchestration by its users.
Q: The focus of the desktop will shift from information to attention
A: No! Knowledge, Information, and Data sharing courtesy of Hyperdata & Hypertext Linking.
Q: Users are going to shift from acting as librarians to acting as daytraders
A: They were Librarians at Web 1.0, Journalist at Web 2.0, and Analysts in Web 3.0 (i.e, analyze structured and interlinked data), and CEOs in Web 4.0 (i.e. get Agents to do stuff intelligently en route to making decisions).
Q: The Webtop will be more social and will leverage and integrate collective intelligence
A: The Linked Data Web vessel will only require you to fill in your profile (once) and then serendipitous discovery and meshing of relevant data will simply happen (the serendipity quotient will grow in line with Linked Data Web density).
Q: The desktop of the future is going to have powerful semantic search and social search capabilities built-in
A: It is going to be able to "Find" rather than "Search" for stuff courtesy of the Linked Data Web.
Q: Interactive shared spaces will replace folders
A: Data Spaces and their URIs (Data Source Names) replace everything. You simply choose the exploration metaphor that best suits you space interaction needs.
Q: The Portable Desktop
A: Ubiquitous Desktop i.e. do the same thing (all answers above) on any device connected to the Web.
Q: The Smart Desktop
A: Vessels with access to Smart Data (Linked Data + Action driven Context sprinklings).
Q: Federated, open policies and permissions
A: More federation for sure, XMPP will become a lot more important, and OAuth will enable resurgence of the federated aspects of the Web and Internet.
Q: The personal cloud
A: Personal Data Spaces plugged into Clouds (Intranet, Extranet, Internet).
Q: The WebOS
A: An operating system endowed with traditional Database and Host Operating system functionality such as: RDF Data Model, SPARQL Query Language, URI based Pointer mechanism, and HTTP based message Bus.
Q: Who is most likely to own the future desktop?
A: You! And all you need is a URI (an ID or Data Source Name for "Entity You") and a Profile Page (a place where "Entity You" is Describe by You).
You can get a feel for the future desktop by downloading and then installing the OpenLink Data Explorer plugin for Firefox, which allows you to switch viewing modes between Web Page and Linked Data behind the page. :-)
By coincidence, Glenn and I presented at this month's Cambridge Semantic Web Gathering.
I've provided a dump of Glenn's issues and my responses below:
RDF is a Graph based Data Model it stands for Resource Description Framework. The Metadata data angle comes from it's Meta Content Framework (MCF) origins. You can express and serialize data based on the RDF Data Model using: Turtle, N3, TriX, N-Triples, and RDF/XML.
These are just appeasement:
- old query paradigm: fishing in dark water with superstitiously tied lures; only works well in carefully stocked lakes
- we don't ask questions by defining answer shapes and then hoping they're dredged up whole.
SPARQL, MQL, and Entity-SQL are Graph Model oriented Query Languages. Query Languages always accompany Database Engines. SQL is the Relational Model equivalent.
Noble attempt to ground the abstract, but:
- URI dereferencing/namespace/open-world issues focus too much technical attention on cross-source cases where the human issues dwarf the technical ones anyway
- FOAF query over the people in this room? forget it.
- link asymmetry doesn't scale
- identity doesn't scale
- generating RDF from non-graph sources: more appeasement, right where the win from actually converting could be biggest!
Innovative use of HTTP to deliver "Data Access by Reference" to the Linked Data Web.
When you have a Data Model, Database Engine, and Query Language, the next thing you need is a Data Access mechanism that provides "Data Access by Reference". ODBC and JDBC (amongst others) provide "Data Access by Reference" via Data Source Names. Linked Data is about the same thing (URIs are Data Source Names) with the following differences:
Hugely motivating and powerful idea, worthy of a superhero (Graphius!), but:
- giant and global parts are too hard, and starting global makes every problem harder
- local projects become unmanageable in global context (Cyc, Freebase data-modeling lists...).
And my thus my plea, again. Forget "semantic" and "web", let's fix the database tech first:
- node/arc data-model, path-based exploratory query-model
- data-graph applications built easily on top of this common model; building them has to be easy, because if it's hard, they'll be bad
- given good database tech, good web data-publishing tech will be trivial!
- given good tools for graphs, the problems of uniting them will be only as hard as they have to be.
Giant Global Graph is just another moniker for a "Web of Linked Data" or "Linked Data Web".
Multi-Model Database technology that meshes the best of the Graph & Relational Models exist. In a nutshell, this is what Virtuoso is all about and it's existed for a very long time :-)
Virtuoso is also a Virtual DBMS engine (so you can see Heterogeneous Relational Data via Graph Model Context Lenses). Naturally, it is also a Linked Data Deployment platform (or Linked Data Sever).
The issue isn't the "Semantic Web" moniker per se., it's about how Linked Data (foundation layer of Semantic Web) gets introduced to users. As I said during the MIT Gathering: "The Web is experienced via Web Browsers primarily, so any enhancement to the Web must be exposed via traditional Web Browsers", which is why we've opted to simply add "View Linked Data Sources" to the existing set of common Browser options that includes:
By exposing the Linked Data Web option as described above, you enable the Web user to knowingly transition from the traditional Rendered (X)HTML page view to the Linked Data View (i.e., structured data behind the page). This simple "User Interaction" tweak makes the notion of exploiting a Structured Web becomes somewhat clearer.
The Linked Data Web isn't a panacea. It's just an addition to the existing Web that enrichens the things you can do with the Web. It's predominance, like any application feature, will be subject to the degrees to which it delivers tangible value or matrializes internal and external opportunity costs.
Note: The Web isn't ubiquitous today becuase all it's users groked HTML Markup. It's ubquitity is a function of opportunity costs: there simply came a point in the Web boostrap when nobody could afford the opportunity costs associated with being off the Web. The same thing will play out with Linked Data and the broader Semantic Web vision.
Links:The LinqToRdf project is about binding LINQ to RDF. It sits atop Joshua Tauberer's C# based Semantic Web/RDF library which has been out there for a while and works across Microsoft .NET and it's open source variant "Mono".
Historically, the Semantic Web realm has been dominated by RDF frameworks such as Sesame, Jena and Redland; which by their Open Source orientation, predominantly favor non-Windows platforms (Java and Linux). Conversely, Microsoft's .NET frameworks have sought to offer Conceptualization technology for heterogeneous Logical Data Sources via .NET's Entity Frameworks and ADO.NET, but without any actual bindings to RDF.
Interestingly, believe it or not, .NET already has a data query language that shares a number of similarities with SPARQL, called Entity-SQL, and a very innovative programming language called LINQ; that offers a blend of constructs for natural data access and manipulation across relational (SQL), hierarchical (XML), and graph (Object) models without the traditional object language->database impedance tensions of the past.
With regards to all of the above, we've just released a mini white paper that covers the exploitation of RDF-based Linked Data using .NET via LINQ. The paper offers a an overview of LinqToRdf, plus enhancements we've contributed to the project (available in LinqToRdf v0.8.). The paper includes real-world examples that tap into a MusicBrainz powered Linked Data Space, the Music Ontology, the Virtuoso RDF Quad Store, Virtuoso Sponger Middleware, and our RDfization Cartridges for Musicbrainz.
Enjoy!]]>The Web Universal Plug and Play (WUPnP) Cheatsheet:
Essentially, if you build an application and use the technologies suggested in the ‘glue section’ then your web application/service (whether it’s front-end or back-end) will fit into many many other web applications/services… and therefore also more manageable for the future! This is WUPnP.
Key technologies for making your services/applications as sticky as possible:
Web-based plug and play fun!
"(Via Daniel Lewis.)
]]>As you can see from my recent post about how we've started the process of inoculating DBpedia against the potential dangers of "contextual incoherence", we are entering a newer era in the Semantic Web's evolution. My post and the one from Clark & Parsia both touch different aspects of the "Data Dictionary" for the Semantic Web issue.
Note: in my universe of discourse, a Data Dictionary manifests when the constraints and class hierarchies defined in an ontology (e.g. a web accessible shared ontology) are functionally bound to a data manager. Interestingly the binding can take the following forms:
The classification terminology I use above is very much off-the-cuff, its sole purpose is architectural distinction.
Anyway, it's really nice to see that we are entering an era re. the Semantic Web vision, where the virtues of reasoning are getting simpler to demonstrate and articulate.
In a nutshell, the point-point data integration era is coming to an end! The era of intelligent ontology based enterprise data integration is nigh!
Of course, there is much more to come on the practical utility front, so stay tuned as we work our way through the DBpedia inoculation program.
]]>When the DBpedia & Yago integration took place last year (around WWW2007, Banff) there was a little, but costly omission that occurred: nobody sought to load the Yago Class Hierarchy into the Virtuoso's Inference Engine :-(
Anyway, the Class Hierarchy has now been loaded into the Virtuoso's inference engine (as Virtuoso Inference Rules) and the following queries are now feasible using the live Virtuoso based DBpedia instance hosted by OpenLink Software:
-- Find all Fiction Books associated with a property "dbpedia:name" that has literal value: Â "The Lord of the Rings" .
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
-- Variant of query with Virtuoso's Full Text Index extension via the bif:contains function/magic predicate
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
SELECT DISTINCT ?s ?n
FROM < xmlns="http" dbpedia.org="dbpedia.org">//dbpedia.org>
WHERE {
?s a yago:Fiction106367107 .
?s dbpedia:name ?n .
?n bif:contains 'Lord and Rings'
}
-- Retrieve all individuals instances of Fiction Class which should include all Books.
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
Note: you can also move the inference pragmas to the Virtuoso Sever side i.e place the inference rules in a server instance config file, thereby negating the need to place "define input:inference 'http://dbpedia.org/resource/inference/rules/yago#'" pragmas directly in your SPARQL queries.
Of course, I also believe that Linked Data serves Web Data Integration across the Internet very well too, and the fact that it will be beneficial to businesses in a big way. No individual or organization is an island, I think the Internet and Web have done a good job of demonstrating that thus far :-) We're all data nodes in a Giant Global Graph.
Daniel lewis did shed light on the read-write aspects of the Linked Data Web, which is actually very close to the callout for a Wikipedia for Data. TimBL has been working on this via Tabulator (see Tabulator Editing Screencast), Bengamin Nowack also added similar functionality to ARC, and of course we support the same SPARQL UPDATE into an RDF information resource via the RDF Sink feature of our WebDAV and ODS-Briefcase implementations.
]]>As I can't quite remix Videos on the spur of the moment (yet), I would encourage you to watch the video and then click on the link to my FOAF Profile, then follow the "Linked Data" tab to see how Linked Data oriented platforms (in my case OpenLink Data Spaces) that exist today actually deliver what's explained in the video.
"What You Know" (Data & Friend Networks) ultimately trumps "Who You Know" (Friend only Networks). The exploitation power of this reality is enhanced exponentially via the Linked Data Web once the implications of beaming SPARQL queries down specific URIs (entry points to Linked Data graphs) become clearer :-)
]]>Daniel simplifies my post by using diagrams to depict the different paths for PHP based applications exposing Linked Data - especially those that already provide a significant amount of the content that drives Web 2.0.
If all the content in Web 2.0 information resources are distillable into discrete data objects endowed with HTTP based IDs (URIs), with zero "RDF handcrafting Tax", what do we end up with? A Giant Global Graph of Linked Data; the Web as a Database.
So, what used to apply exclusively, within enterprise settings re. Oracle, DB2, Informix, Ingres, Sybase, Microsoft SQL Server, MySQL, PostrgeSQL, Progress Open Edge, Firebird, and others, now applies to the Web. The Web becomes the "Distributed Database Bus" that connects database records across disparate databases (or Data Spaces). These databases manage and expose records that are remotely accessible "by reference" via HTTP.
As I've stated at every opportunity in the past, Web 2.0 is the greatest thing that every happened to the Semantic Web vision :-) Without the "Web 2.0 Data Silo Conundrum" we wouldn't have the cry for "Data Portability" that brings a lot of clarity to some fundamental Web 2.0 limitations that end-users ultimately find unacceptable.
In the late '80s, the SQL Access Group (now part of X/Open) addressed a similar problem with RDBMS silos within the enterprise that lead to the SAG CLI which is exists today as Open Database Connectivity.
In a sense we now have WODBC (Web Open Database Connectivity), comprised of Web Services based CLIs and/or traditional back-end DBMS CLIs (ODBC, JDBC, ADO.NET, OLE-DB, or Native), Query Language (SPARQL Query Language), and a Wire Protocol (HTTP based SPARQL Protocol) delivering Web infrastructure equivalents of SQL and RDA, but much better, and with much broader scope for delivering profound value due to the Web's inherent openness. Today's PHP, Python, Ruby, Tcl, Perl, ASP.NET developer is the enterprise 4GL developer of yore, without enterprise confinement. We could even be talking about 5GL development once the Linked Data interaction is meshed with dynamic languages (delivering higher levels of abstraction at the language and data interaction levels). Even the underlying schemas and basic design will evolve from Closed World (solely) to a mesh of Closed & Open World view schemas.
]]>In the form above (the norm), Wordpress data can be injected into the Linked Data Web via RDFization middleware such as theVirtuoso Sponger (built into all Virtuoso instances) and Triplr. The downside of this approach is that the blog owner doesn't necessary possess full control over their contributions to the emerging Giant Global Graph or Linked Data.
Another route to Linked Data exposure is via Virtuoso's Metaschema Language for producing RDF Views over ODBC/JDBC accessible Data Sources, that enables the following setup:
Alternatively, you can also exploit Virtuoso as the SQL DBMS, RDF DBMS, Application Server, and Linked Data Deployment platform:
How Do I map the WordPress SQL Schema to RDF using Virtuoso?
Read the Meta Schema Language guide or simply apply our "WordPress SQL Schema to RDF" script to your Virtuoso hosted instance. Of course, there are other mappings that cover other PHP applications deployed via Virtuoso:
The list is nice, but actual execution can be challenging. For instance, when writing a blog post, or constructing a WikiWord, would you have enough disposable time to go searching for these URIs? Or would you compromise and continue to inject "Literal" values into the Web, leaving it to the reasoning endowed human reader to connect the dots?
Anyway, OpenLink Data Spaces is now equipped with a Glossary system that allows me to manage terms, meaning of terms, and hyper-linking of phrases and words matching associated with my terms. The great thing about all of this is that everything I do is scoped to my Data Space (my universe of discourse), I don't break or impede the other meanings of these terms outside my Data Space. The Glossary system can be shared with anyone I choose to share it with, and even better, it makes my upstreaming (rules based replication) style of blogging even more productive :-)
Remember, on the Linked Data Web, who you know doesn't matter as much as what your are connected to, directly or indirectly. Jason Kolb covers this issue in his post: People as Data Connectors, and so doesFrederick Giasson via a recent post titled: Networks are everywhere. For instance, this blog post (or the entire Blog) is a bona fide RDF Linked Data Source, you can use it as the Data Source of a SPARQL Query to find things that aren't even mentioned in this post, since all you are doing is beaming a query through my Data Space (a container of Linked Data Graphs). On that note, let's re-watch Jon Udell's "On-Demand-Blogosphere" screencast from 2006 :-)
]]>*On* the ubiquitous Web of "Linked Documents", HREF means (by definition and usage): Hypertext Reference to an HTTP accessible Data Object of Type: "Document" (an information resource). Of course we don't make the formal connection of Object Type when dealing with the Web on a daily basis, but whenever you encounter the "resource not found" condition notice the message: HTTP/1.0 404 Object Not Found, from the HTTP Server tasked with retrieving and returning the resource.
*In* the Web of "Linked Data", a complimentary addition to the current Web of "Linked Documents", HREF is used to reference Data Objects that are of a variety of "Types", not just "Documents". And the way this is achieved, is by using Data Object Identifiers (URIs / IRIs that are generated by the Linked Data deployment platform) in the strict sense i.e. Data Identity (URI) is separated from Data Address (URL). Thus, you can reference a Person Data Object (aka an instance of a Person Class) in your HREF and the HTTP Server returns a Description of the Data Object via a Document (again, an information resource). A document containing the Description of a Data Object typically contains HREFs to other Data Objects that expose the Attributes and Relationships of the initial Person Data Object, and it this collection of Data Objects that is technically called a "Graph" -- which is what RDF models.
What I describe above is basic stuff for anyone that's familiar with Object Database or Distributed Objects technology and concepts.
The Linked Document Web is a collection of physical resources that traverse the Web Information Bus in palatable format i.e documents. Thus, Document Object Identity and Document Object Data Address can be the same thing i.e. a URL can serve as the ID/URI of a Document Data Object.
The Linked Data Web on the other hand, is a Distributed Object Database, and each Data Object must be uniquely defined, otherwise we introduce ambiguity that ultimately taints the Database itself (making incomprehensible to reasoning challenged machines). Thus we must have unique Object IDs (URIs / IRIs) for People, Places, Events, and other things that aren't Documents. Once we follow the time tested rules of Identity, People can then be associated with the things they create (blog posts, web pages, bookmarks, wikiwords etc). RDF is about expressing these graph model relationships while RDF serialization formats enables the information resources to transport these data object link ladden information resources to requesting User Agents.
Put in more succinct terms, all documents on the Web are compound documents in reality (e.g. mast contain a least an image these days). The Linked Data Web is about a Web where Data Object IDs (URIs) enable us to distill source data from the information contained in a compound document.
The degree of unobtrusiveness of new technology, concepts, or new applications of existing technology, is what ultimately determines eventual uptake and meme virulence (network effects). For a while, the Semantic Web meme was mired in confusion and general misunderstanding due to a shortage of practical use case scenario demos.
The emergence of the SPARQL Query Language has provided critical infrastructure for a number of products, projects, and demos, that now make the utility of the Semantic Web vision mush clearly via the simplicity of Linked Data, as exemplified by the following:
Daniel Lewis also penned an interesting post in response to Ian's, that actually triggered this post.
I think definition time has long expired re. the Web's many interaction dimensions, evolutionary stages, and versions.
On my watch it's simply demo / dog-food time. Or as Dan Brickley states: Just Show It.
Below, I've created a tabulated view of the various lanes on the Web's Information Super Highway. Of course, this is a Linked Data demo should you be interested in the universe of data exposed via the links embedded in this post :-)
1.0 |
2.0 |
3.0 |
|
Desire |
Information Creation & Retrieval |
Information Creation, Retrieval, and Extraction |
Distillation of Data from Information |
Information Linkage (Hypertext) |
Information Mashing (Mash-ups) |
Linked Data Meshing (Hyperdata) |
|
Enabling Protocol |
HTTP |
HTTP |
|
Markup |
|||
Basic Data Unit | Resource (Data Object) of type "Document" |
Resource (Data Object) of type "Document" |
Resource (Data Object) that may be one of a variety of Types: Person, Place, Event, Music etc. |
Basic Data Unit Identity |
Resource URL (Web Data Object Address) Â |
Resource URL (Web Data Object Address) Â |
Unique Identifier (URI) that is indepenent of actual Resource (Web Data Object) Address. Note: An Identifier by itself has no utility beyond Identifying a place around which actual data may be clustered. Â |
Query or Search |
Full Text Search patterns |
Full Text Search patterns |
Structured Querying via SPARQL |
Deployment |
Web Server (Document Server) |
Web Server + Web Services Deployment modules |
Web Server + Linked Data Deployment modules (Data Server) |
Auto-discovery |
<link rel="alternate"..> |
<link rel="alternate"..> |
<link rel="alternate" | "meta"..>, basic and/or transparent content negotiation |
Target User | Humans |
Humans & Text extraction and manipulation oriented agents (Scrappers) |
Agents with varying degrees of data processing intelligence and capacity |
Serendipitous Discovery Quotient (SDQ) | Low | Low | High |
Pain |
Information Opacity |
Information Silos |
Data Graph Navigability (Quality) |
Now I can simply state the following using Linked Data (hyperdata) links:
OpenLink Software's product porfolio is comprised of the following product families:We no longer have to explain (repeatedly) why our drivers exist in Express, Lite, and Multi-Tier Edition formats, or why you ultimately need Multi-Tier Drivers over Single Tier Drivers (Express or Lite Editions) since you ultimately heed high-performance, data encryption, and policy based security across each of the data access driver formats.
]]>OpenLink Data Spaces (ODS) now officially supports:
- Attention Profiling Markup Language (APML).
- Meaning of a Tag (MOAT) in conjunction with Simple Knowledge Organisation System (SKOS) and Social-Semantic Cloud of Tags (SCOT).
- OAuth - an Open Authentication Protocol
Which means that OpenLink Data Spaces support all of the main standards being discussed in the DataPortability Interest Group!
APML Example:
All users of ODS automatically get a dynamically created APML file, for example: APML profile for Kingsley Idehen
The URI for an APML profile is: http://myopenlink.net/dataspace/<ods-username>/apml.xml
Meaning of a Tag Example:
All users of ODS automatically have tag cloud information embedded inside their SIOC file, for example: SIOC for Kingsley Idehen on the Myopenlink.net installation of ODS.
But even better, MOAT has been implemented in the ODS Tagging System. This has been demonstrated in a recent test blog post by my colleague Mitko Iliev, the blog post comes up on the tag search: http://myopenlink.net/dataspace/imitko/weblog/Mitko%27s%20Weblog/tag/paris
Which can be put through the OpenLink Data Browser:
OAuth Example:
OAuth Tokens and Secrets can be created for any ODS application. To do this:
- you can log in to MyOpenlink.net beta service, the Live Demo ODS installation, an EC2 instance, or your local installation
- then go to âSettingsâ
- and then you will see âOAuth Keysâ
- you will then be able to choose the applications that you have instantiated and generate the token and secret for that app.
Related Document (Human) Links
- OpenLink Data Spaces Official Page
- OpenLink Software Page
- OpenLink Data Spaces Wikipedia Page
- Attention Profiling Markup Language Project Website
- Meaning of a Tag Project Website
- Simple Knowledge Organisation Systems Project Website
- Social-Semantic Cloud of Tags Project Website
- OAuth Protocol Website
- DataPortability.org Website
- Semantically Interlinked Online Communities Project Website
Remember (as per my most recent post about ODS), ODS is about unobtrusive fusion of Web 1.0, 2.0, and 3.0+ usage and interaction patterns. Thanks to a lot of recent standardization in the Semantic Web realm (e.g SPARQL), we are now employ the MOAT, SKOS, and SCOT ontologies as vehicles for Structured Tagging.
This is how we take a key Web 2.0 feature (think 2D in a sense), bend it over, to create a Linked Data Web (Web 3.0) experience unobtrusively (see earlier posts re. Dimensions of Web). Thus, nobody has to change how they tag or where they tag, just expose ODS to the URLs of your Web 2.0 tagged content and it will produce URIs (Structured Data Object Identifiers) and a lnked data graph for your Tags Data Space (nee. Tag Cloud). ODS will construct a graph which exposes tag subject association, tag concept alignment / intended meaning, and tag frequencies, that ultimately deliver "relative disambiguation" of intended Tag Meaning (i.e. you can easily discern the taggers meaning via the Tags actual Data Space which is associated with the tagger). In a nutshell, the dynamics of relevance matching, ranking, and the like, change immensely without futile timeless debates about matters such as:
We can just get on with demonstrating Linked Data value using what exists on the Web today. This is the approach we are deliberately taking with ODS.
Tip: This post is best viewed via an RDF aware User Agent (e.g. a Browser or Data Viewer). I say this because the permalink of this post is a URI in a Linked Data Space (My Blog) comprised of more data than meets the eye (i.e. what you see when you read this post via a Document Web Browser) :-)
]]>There are quite a few reasons to use OpenLink Data Spaces (ODS). Here are 10 of the reasons why I use ODS:
- Its native support of DataPortability Recommendations such as RSS, Atom, APML, Yadis, OPML, Microformats, FOAF, SIOC, OpenID and OAuth.
- Its native support of Semantic Web Technologies such as: RDF and SPARQL/SPARUL for querying.
- Everything in ODS is an Object with its own URI, this is due to the underlying Object-Relational Architecture provided by Virtuoso.
- It has all the social media components that you could need, including: blogs, wikis, social networks, feed readers, CRM and a calendar.
- It is expandable by installing pre-configured components (called VADs), or by re-configuring a LAMP application to use Virtuoso. Some examples of current VADs include: MediaWiki, Wordpress and Drupal.
- It works with external webservices such as: Facebook, del.icio.us and Flickr.
- Everything within OpenLink Data Spaces is Linked Data, which provides more meaningful information than just plain structural information. This meaningful information could be used for complex inferencing systems, as ODS can be seen as a Knowledge Base.
- ODS builds bridges between the existing static-document based web (aka âWeb 1.0â), the more dynamic, services-oriented, social and/or user-orientated webs (aka âWeb 2.0â) and the web which we are just going into, which is more data-orientated (aka âWeb 3.0â or âLinked Data Webâ).
- It is fully supportive of Cloud Computing, and can be installed on Amazon EC2.
- Its released free under the GNU General Public License (GPL). [note]However, it is technically dual licensed as it lays on top of the Virtuoso Universal Server which has both Commercial and GPL licensing[/note]
The features above collectively provide users with a Linked Data Junction Box that may reside with corporate intranets or "out in the clouds" (Internet). You can consume, share, and publish data in a myriad of formats using a plethora of protocols, without any programming. ODS is simply about exposing the data from your Web 1.0, 2.0, 3.0 application interactions in structured from, with Linking, Sharing, and ultimately Meshing (not Mashing) in mind.
Note: Although ODS is equipped with a broad array of Web 2.0 style Applications, you do not need to use native ODS apps in order to exploit it's power. It binds to anything that supports the relevant protocols and data formats.
]]>If you want to explore who I know, what I read, and what I've tagged (amongst other things), all you have to do is:
Some Tools that help you comprehend what I am saying:
How Do I create the missing Bitmap Indexes?
Go to the HTML based Virtuoso Conductor, iSQL command line interface, or an ODBC / JDBC / ADO.NET / OLE DB client and execute:
CREATE BITMAP index RDF_QUAD_POGS on DB.DBA.RDF_QUAD (P,O,G,S);
CREATE BITMAP index RDF_QUAD_PSOG on DB.DBA.RDF_QUAD (P,S,O,G);
CREATE BITMAP index RDF_QUAD_SOPG on DB.DBA.RDF_QUAD (S,O,P,G);
A query language for the burgeoning Structured & Linked Data Web (aka Semantic Web / Giant Global Graph). Like SQL, for the Relational Data Model, it provides a query language for the Graph based RDF Data Model.
It's also a REST or SOAP based Web Service that exposes SPARQL access to RDF Data via an endpoint.
In addition, it's also a Query Results Serialization format that includes XML and JSON support.
It brings important clarity to the notion of the "Web as a Database" by transforming existing Web Sites, Portals, and Web Services into bona fide corpus of Mesh-able (rather than Mash-able) Data Sources. For instance, you can perform queries that join one or more of the aforementioned data sources in exactly the same manner (albeit different syntax) as you would one or more SQL Tables.
-- SPARQL equivalent of SQL SELECT * against my personal data space hosted FOAF file
SELECT DISTINCT ?s ?p ?o FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s ?p ?o}
-- SPARQL against my social network -- Note: My SPARQL will be beamed across all of contacts in the social networks of my contacts as long as they are all HTTP URI based within each data space
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?Person FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s a foaf:Person; foaf:knows ?Person}
Note: you can use the basic SPARQL Endpoint, SPARQL Query By Example, or SPARQL Query Builder Demo tool to experiment with the demonstration queries above.
SPARQL is implemented by RDF Data Management Systems (Triple or Quad Stores) just as SQL is implemented by Relational Database Management Systems. The aforementioned data management systems will typically expose SPARQL access via a SPARQL endpoint.
A SPARQL implementors Testimonial page accompanies the SPARQL press release. In addition the is a growing collection of implementations on the ESW Wiki Page for SPARQL compliant RDF Triple & Quad Stores.
Yes! SPARQL facilitates an unobtrusive manifestation of a Linked Data Web by way of natural extension of the existing Document Web i.e these Web enclaves co-exist in symbiotic fashion.
As DBpedia very clearly demonstrates, Linked Data makes the Semantic Web demonstrable and much easier to comprehend. Without SPARQL there would be no mechanism for Linked Data deployment, and without Linked Data there is no mechanism for Beaming Queries (directly or indirectly) across the Giant Global Graph of data hosted by Social Networks, Shard Bookmarks Services, Weblogs, Wikis, RSS/Atom/OPML feeds, Photo Galleries and other Web accessible Data Sources (Data Spaces).
So, unlike Scoble, I am able to make my Facebook Data portable without violating Facebook rules (no data caching outside Facebook realm) by doing the following:
In a nutshell, my Linked Data Space enables you to reference data in my data space via Object Identifiers (URIs), and some cases the Object IDs and Graphs are constructed on the fly via RDFization middleware.
Here are my URIs that provide different paths to my Facebook Data Space:
To conclude, 2008 is clearly the inflection year during which we will final unshackle Data and Identity from the confines of "Web Data Silos" by leveraging the HTTP, SPARQL, and RDF induced virtues of Linked Data.
Related Posts:
Writing a JDBC Driver for SPARQL is a little overkill. OpenOffice.org simply needs to make XML or Web Data (HTML, XHTML, and XML) bonafide data sources within its "Pivot Table" functionality realm. Then all that would then be required is a SPARQL SELECT Query transported via the SPARQL Protocol with results sent back using the SPARQL XML results serialization format (all part of a single SPARQL Protocol URL).
Excel successfully consumes the following information resource URI: http://tinyurl.com/yvoccj (a tiny url for a SPARQL SELECT against my FOAF file).
Alternatively, and currently achievable, you could simply use SPASQL (SPARQL within SQL) using a DBMS engine that supports SQL, SPARQL, and SPARQL e.g. Virtuoso.
Virtuoso SPASQL support is exposed via it's ODBC and/or JDBC Drivers. Thus you can do things such as:
BTW - My News Years Resolution: get my act together and shrink the ever increasing list of "simple & practical Virtuoso use case demos" on my todo which now spans all the way back to 2006 :-(
]]>"The phrase Open Social implies portability of personal and social data. That would be exciting but there are entirely different protocols underway to deal with those ideas. As some people have told me tonight, it may have been more accurate to call this "OpenWidget" - though the press wouldn't have been as good. We've been waiting for data and identity portability - is this all we get?"
[Source: Read/Write Web's Commentary & Analysis of Google's OpenSocial API]
..Perhaps the world will read the terms of use of the API, and realize this is not an open API; this is a free API, owned and controlled by one company only: Google. Hopefully, the world will remember another time when Google offered a free API and then pulled it. Maybe the world will also take a deeper look and realize that the functionality is dependent on Google hosted technology, which has its own terms of service (including adding ads at the discretion of Google), and that building an OpenSocial application ties Google into your application, and Google into every social networking site that buys into the Dream. Hopefully the world will remember. Unlikely, though, as such memories are typically filtered in the Great Noise....
[Source: Poignant commentary excerpt from Shelly Power's Blog (as always)]
The "Semantic Data Web" vision has always been about "Data & Identity" portability across the Web. Its been that and more from day one.
In a nutshell, we continue to exhibit varying degrees of Cognitive Dissonance re the following realities:
The Data Web is about Presence over Eyeballs due to the following realities:
This is why we need to inject a mesh of Linked Data into the existing Web. This is what the often misunderstood vision of the "Semantic Data Web" or "Web of Data" or "Web or Structured Data" is all about.
As stated earlier (point 10 above), "Data is forever" and there is only more of it to come! Sociality and associated Social Networking oriented solutions are at best a spec in the Web's ocean of data once you comprehend this reality.
Note: I am writing this post as an early implementor of GData and an implementor of RDF Linked Data technology and a "Web Purist".
OpenSocial implementation and support across our relevant product families: Virtuoso (i.e the Sponger Middleware for RDF component), OpenLink Data Spaces (Data Space Controller / Services), and the OpenLink Ajaxt Toolkit (i.e OAT Widgets and Libraries), is a triviality now that the OpenSocial APIs are public.
The concern I have, and the problem that remains mangled in the vast realms of Web Architecture incomprehension, is the fact that GData and GData based APIs cannot deliver Structured Linked Data in line with the essence of the Web without introducing "lock-in" that ultimately compromises the "Open Purity" of the Web. Facebook and Google's OpenSocial response to the Facebook juggernaut (i.e. open variant of the Facebook Activity Dashboard and Social Network functionality realms, primarily), are at best icebergs in the ocean we know as the "World Wide Web". The nice and predictable thing about icebergs is that they ultimately melt into the larger ocean :-)
On a related note, I had the pleasure of attending the W3C's RDF and DBMS Integration Workshop, last week. The event was well attended by organizations with knowledge, experience, and a vested interested in addressing the issues associated with exposing none RDF data (e.g. SQL) as RDF, and the imminence of data and/or information overload covered in different ways via the following presentations:Jon Udell recently penned a post titled: The Fourth Platform. The post arrives at a spookily coincidental time (this happens quite often between Jon and I as demonstrated last year during our podcast; the "Fourth" in his Innovators Podcast series).
The platform that Jon describes is "Cloud Based" and comprised of Storage and Computation. I would like to add Data Access and Management (native and virtual) under the fourth platform banner with the end product called: "Cloud based Data Spaces".
As I write, we are releasing a Virtuoso AMI (Amazon Image) labeled: virtuoso-dataspace-server. This edition of Virtuoso includes the OpenLink Data Spaces Layer and all of the OAT applications we've been developing for a while.
There's more to come!
]]>First off, I am going to focus on the Semantic Data Web aspect of the overall Semantic Web vision (a continuum) as this is what we have now. I am also writing this post as a deliberate contribution to the discourse swirling around the real topic: Semantic Web Value Proposition.
We are in the early stages of the long anticipated Knowledge Economy. That being the case, it would be safe to assume that information access, processing, and dissemination are of utmost importance to individuals and organizations alike. You don't produce knowledge in a vacum! Likewise, you can produce Information in a vacum, you need Data.
Increasingly, Blogs, Wikis, Shared Bookmarks, Photo Galleries, Discussion Forums, Shared Calendars and the like, have become invaluable tools for individual and organizational participation in Web enabled global discourse (where a lot of knowledge is discovered). These tools, are typically associated with Web 2.0, implying Read-Write access via Web Services, centralized application hosting, and data lock-in (silos).
The reality expressed above is a recipe for "Information Overload" and complete annihilation of ones effective pursuit and exploitation of knowledge due "Time Scarcity" (note: disconnecting is not an option). Information abundance is inversely related to available processing time (for humans in particular). In my case for instance, I was actively subscribed to over 500+ RSS feeds in 2003. As of today, I've simply stopped counting, and that's just my Weblog Data Space. Then add to that, all of the Discussions I track across Blogs, wikis, message boards, mailing lists, traditional usnet discussion forumns, and the like, and I think you get the picture.
Beyond information overload, Web 2.0 data is "Semi-Structured" by way of it's dominant data containers ((X)HTML, RSS, Atom documents and data streams etc.) lacking semantics that formally expose individual data items as distinct entities, endowed with unambiguous naming / identification, descriptive attributes (a type of property/predicate), and relationships (a type of property/predicate).
Solution:Devise a standard for Structured Data Semantics that is compatible with the Web Information BUS.
Produce structured data (entities, entity types, entity relationships) from Web 1.0 and Web 2.0 resources that already exists on the Web such that individual entities, their attributes, and relationships are accessible and discernible to software agents (machines).
Once the entities are individually exposed, the next requirement is a mechanism for selective access to these entities i.e. a query language.
Semantic Data Web Technologies that facilitate the solution described above include:
Structured Data Standards:Use of URIs or IRIs for uniquely identifying physical (HTML Documents, Image Files, Multimedia Files etc..) and abstract (People, Places, Music, and other abstract things).
Entity Access & Querying:SPARQL Query Language - the SQL analog of the Semantic Data Web that enables query constructs that target named entities, entity attributes, and entity relationships
Organizations are rife with a plethora of business systems that are built atop a myriad of database engines, sourced from a variety of DBMS vendors. A typical organization would have a different database engine, from a specific DBMS vendor, underlying critical business applications such as: Human Resource Management (HR), Customer Relationship Management (CRM), Accounting, Supply Chain Management etc. In a nutshell, you have DBMS Engines, and DBMS Schema heterogeneity permeating the IT infrastructure of organizations on a global scale, making Data & Information Integration the biggest headache across all IT driven organizations.
Solution:Alleviation of the pain (costs) associated with Data & Information Integration.
Semantic Data Web offerings:A dexterous data model (RDF) that enables the construction of conceptual views of disparate data sources across an organization based on existing web architecture components such as HTTP and URIs.
Existing middleware solutions that facilitate the exposure of SQL DBMS data as RDF based Structured Data include:
BTW - There is an upcoming W3C Workshop covering the integration of SQL and RDF data.
The Semantic Data Web is here, it's value delivery vehicle is the URI. The URI is a conduit to Interlinked Structured Data (RDF based Linked Data) derived from existing data sources on the World Wide Web alongside data continuously injected into the Web by organizations world wide. Ironically, the Semantic Data Web only platform that crystallizes the: Information at Your Fingertips vision, without development environment, operating system, application, or database lock-in. You simply click on a Linked Data URI and the serendipitous exploration and discovery of data commences.
The unobtrusive emergence of the Semantic Data Web is a reflection of the soundness of the underlying Semantic Web vision.
If you are excited about Mash-ups then your are a Semantic Web enthusiast and benefactor in the making, because you only "Mash" (brute force data extraction and interlinking) because you can't "Mesh" (natural data extraction and interlinking). Likewise, if you are a social-networking, open social-graph, or portable social-network enthusiast, then you are also a Semantic Data Web benefactor and enthusiasts, because your "values" (yes, the values associated with the properties that define you e.g your interests etc) are the fundamental basis for portable, open, social-networking, which is what the Semantic Data Web hands to you on a platter without compromise (i.e. data lock-in or loss of data ownership).
Some practical examples of Semantic Data Web prowess:My Comments:
Hyperdata is short for HyperLinked Data :-) The same applies to Linked Data. Thus, we have two literal labels for the same core Concept. HTTP is the enabling protocol for "Hyper-linking" Documents and associated Structured Data via the World Wide Web (Web for short). Data Links associated with Structured Data contained in, or hosted by, Documents on the Web.
RDFa, eRDF, GRDDL, SPARQL Query Language, SPARQL Protocol (SOAP or REST service), SPARQL Results Serializations (XML or JSON) collectively provide a myriad of unobtrusive routes to structured data embedded within, or associated with, existing Web Documents.
As Danny already states, ontologies are not prerequisites for producing structured data using the RDF Data Model. They simply aid the ability to express one's self clearly (i.e. no repetition or ambiguity) across a broad audience of machines (directly) and their human masters (indirectly).
Using the crux of this post as the anecdote: The Semantic Data Web would simplify the process of claiming and/or proving that Linked Data and Hyperdata describe the same concept. It achieves this by using Triples (Subject, Predicate, Object) expressed in various forms (N3, Turtle, RDF/XML etc.) to formalize claims in a form palatable to electronic agents (machines) operating on behalf of Humans. In a nutshell, this increases human productive by completely obliterates the erstwhile exponential costs of discovering data, information, and knowledge.
BTW - for full effect, view this post (i.e. cut and paste the Permalink URI of this post, below) into an RDF Browser such as:
"Reasonable Humans" continue to struggle with the congruency of social-networking and the Semantic Web. Funnily enough, our less capable counterparts (in the reasoning department) don't have such struggles. Here is my little interaction with an IRC Bot called "Phenny" (possibly mother of the Agent Web or Web of Agents or Web 4.0):
kidehen: .w web
phenny: web n. 1: An intricate network suggesting something that was formed by weaving or interweaving
phenny: web n. 2: An intricate trap that entangles or ensnares its victim
phenny: web v. 1: Construct or form a web, as if by weaving.
kidehen: .w network
phenny: network n. 1: An interconnected system of things or people
phenny: network n. 2: (broadcasting) a communication system consisting of a group of broadcasting stations that all transmit the same programs
phenny: network v. 1: Communicate with and within a group.
kidehen: .w social
phenny: social n. 1: A party of people assembled to promote sociability and communal activity
phenny: social a. 1: Relating to human society and its members
phenny: social a. 2: Living together or enjoying life in communities or organized groups.
kidehen: .w semantic
phenny: semantic a. 1: Of or relating to meaning or the study of meaning.
kidehen: .w graph
phenny: graph n. 1: A visual representation of the relations between certain quantities plotted with reference to a set of axes
phenny: graph v. 1: Represent by means of a graph
phenny: graph v. 2: Plot upon a graph.Note: Wikipedia's Graph definition is the one the applies to RDF :-)
To conclude, the whole Semantic Data Web vision and Open social-networking go hand-in-hand. You can't attain "Open social-networking" without appreciating the underlying core concepts (Structured Data, Graphs, and Triples). There is no way around the reality that all networks are graphs. Of course, this doesn't imply that solution users (end-users) have to master graph theory en route to network participation and exploitation. It simply implies that solution developers need some graph data model appreciation, or a graph data model aware platform, when developing social-networking solutions that seek to leverage the Web.
chimezie: .w tautology
phenny: tautology n. 1: (logic) a statement that is necessarily true
phenny: tautology n. 2: Useless repetition.
chimezie: .ety tautology
phenny: "1579, from L.L. tautologia 'representation of the same thing' (c.350), from Gk. tautologia, from tautologos 'repeating what has been said,' from tauto 'the same' + -logos 'saying,' related to legein 'to say' (see lecture)." - http://etymonline.com/?term=tautology
That lead me to the following sequence (preceding the initial IRC session dump in this post):
kidehen: .w conflagration
kidehen: .w imbroglio
phenny: conflagration n. 1: A very intense and uncontrolled fire.
phenny: imbroglio n. 1: An intricate and confusing interpersonal or political situation
phenny: imbroglio n. 2: A very embarrassing misunderstanding.
kidehen: .w buzzword
phenny: buzzword n. 1: Stock phrases that have become nonsense through endless repetition.
In sense, proposing the Semantic Data Web as a solution to open social-networiing challenges, more often than not results in your "No Semantic Web here" imbroglio. In a sense, the shortest path to a buzzword fueled conflagration :-)
]]>New Semantic Data Web related features and enhancements include:
A dynamically generated Web Page comprised of Semantic Data Web style data links (formally typed links) and traditional Document Web links (generic links lacking type specificity).
Linked Data Pages will ultimately enable Facebook users to inject their public data into the Semantic Data Web as RDF based Linked Data. For instance, my Facebook Profile & Photo albums data is now available as RDF, without paying a cent of RDF handcrafting tax, thanks to the Virtuoso Sponger (middleware for producing RDF from non RDF data sources) which is now equipped with a new RDFizer Cartridger for the Facebook Query Language (FQL) and RESTful Web Service.
Demo Notes:
When you click on a link in DLD pages, you will be presented with a lookup that exposes the different interaction options associated with a given URI. Examples include:
Remember, the facebook URLs (links to web pages) are being converted, on the fly, into RDF based Structured Data ( graph model database) i.e Entity Sets that possess formally defined characteristics (attributes) and associations (relationships).
On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS.
Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol.
The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers).
HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc).
Examples of Information Resource and Data Source URIs:
Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL).
Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following:
The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as an Information Resource. It's comprised of granular data items called "Entities", that expose fine grained properties values, individual and/or group characteristics (attributes), and relationships (associations) with other Entities.
The Web is in the final stages of the 3rd phase of it's evolution. A phase characterized by the shuttling of structured data payloads (RDF) alongside less data oriented payloads (HTML, XHTML, XML etc.). As you can see, Linked Data and Structured Data are both terms used to describe the addition of more data centric payloads to the Web. Thus, you could view the process of creating a Structured Web of Linked Data as follows:
The Semantic Data Web is an evolution of the current Web (an Information Space) that adds structured data payloads (RDF) to current, less data oriented, structured payloads (HTML, XHTML, XML, and others).
The Semantic Data Web is increasingly seen as an inevitability because it's rapidly reaching the point of critical mass (i.e. network effect kick-in). As a result, Data Web emphasis is moving away from: "What is the Semantic Data Web?" To: "How will Semantic Data Web make our globally interconnected village an even better place?", relative to the contributions accrued from the Web thus far. Remember, the initial "Document Web" (Web 1.0) bootstrapped because of the benefits it delivered to blurb-style content publishing (remember the term electronic brochure-ware?). Likewise, in the case of the "Services Web" (Web 2.0), the bootstrap occurred because it delivered platform independence to Web Application Developers - enabling them to expose application logic behind Web Services. It is my expectation that the Data Integration prowess of the Data Web will create a value exchange realm for data architects and other practitioners from the database and data access realms.
Of course, this also enables me to provide yet another Semantic Data Web demo in the form of additional viewing perspectives for the aforementioned FAQ (just click to see):
Lee also embarked on a similar embellishment effort re. the SPARQL Query Language FAQ thereby enabling me to also offer alternative viewing perspectives along similar lines:
]]>Note: the enhanced hyperlink (typed data link) lookup presents options to perform an Explore (all data about subject across Domains in the data space i.e. data links to and from Subject), Dereference (specific data in the Subject's Domain i.e. data links originating from subject).
I built these Linked Data Pages by simply doing the following:
Naturally, this triggered an obvious opportunity to demonstrate the prowess of Linked Data on the Semantic Web. What follows is a quick dump of what I sent to the foaf-dev mailing list:
Here are variety of FOAF Views built using:
Enabling you to explore the following lines:
Now that broader understanding of the Semantic Data Web is emerging, I would like to revisit the issue of "Data Spaces".
A Data Space is a place where Data Resides. It isn't inherently bound to a specific Data Model (Concept Oriented, Relational, Hierarchical etc..). Neither is it implicitly an access point to Data, Information, or Knowledge (the perception is purely determined through the experiences of the user agents interacting with the Data Space.
A Web Data Space is a Web accessible Data Space.
Real world example:
Today we increasing perform one of more of the following tasks as part of our professional and personal interactions on the Web:
John Breslin has nice a animation depicting the creation of Web Data Spaces that drives home the point.
Web Data Space SilosUnfortunately, what isn't as obvious to many netizens, is the fact that each of the activities above results in the creation of data that is put into some context by you the user. Even worse, you eventually realize that the service providers aren't particularly willing, or capable of, giving you unfettered access to your own data. Of course, this isn't always by design as the infrastructure behind the service can make this a nightmare from security and/or load balancing perspectives. Irrespective of cause, we end up creating our own "Data Spaces" all over the Web without a coherent mechanism for accessing and meshing these "Data Spaces".
What are Semantic Web Data Spaces?Data Spaces on the Web that provide granular access to RDF Data.
What's OpenLink Data Spaces (ODS) About?Short History
In anticipation of this the "Web Data Silo" challenge (an issue that we tackled within internal enterprise networks for years) we commenced the development (circa. 2001) of a distributed collaborative application suite called OpenLink Data Spaces (ODS). The project was never released to the public since the problems associated with the deliberate or inadvertent creation of Web Data silos hadn't really materialized (silos only emerged in concreted form after the emergence of the Blogosphere and Web 2.0). In addition, there wasn't a clear standard Query Language for the RDF based Web Data Model (i.e. the SPARQL Query Language didn't exist).
Today, ODS is delivered as a packaged solution (in Open Source and Commercial flavors) that alleviates the pain associated with Data Space Silos that exist on the Web and/or behind corporate firewalls. In either scenario, ODS simply allows you to create Open and Secure Data Spaces (via it's suite of applications) that expose data via SQL, RDF, XML oriented data access and data management technologies. Of course it also enables you to integrates transparently with existing 3rd party data space generators (Blogs, Wikis, Shared Bookmrks, Discussion etc. services) by supporting industry standards that cover:
Thus, by installing ODS on your Desktop, Workgroup, Enterprise, or public Web Server, you end up with a very powerful solution for creating Open Data access oriented presence on the "Semantic Data Web" without incurring any of the typically assumed "RDF Tax".
Naturally, ODS is built atop Virtuoso and of course it exploits Virtuoso's feature-set to the max. It's also beginning to exploit functionality offered by the OpenLink Ajax Toolkit (OAT).
]]>(Via Read/Write Web.)
Web 3.0: When Web Sites Become Web Services: "
.....As more and more of the Web is becoming remixable, the entire system is turning into both a platform and the database. Yet, such transformations are never smooth. For one, scalability is a big issue. And of course legal aspects are never simple.'
But it is not a question of if web sites become web services, but when and how. APIs are a more controlled, cleaner and altogether preferred way of becoming a web service. However, when APIs are not avaliable or sufficient, scraping is bound to continue and expand. As always, time will be best judge; but in the meanwhile we turn to you for feedback and stories about how your businesses are preparing for 'web 3.0'.
We are hitting a little problem re. Web 3.0 and Web 2.0, naturally :-) Web 2.0 is one of several (present and future) Dimensions of Web Interaction that turns Web Sites into Web Services Endpoints; a point I've made repeatedly [1] [2] [3] [4] across the blogosphere, in addition to my early futile attempts to make the Wikipedia's Web 2.0 article meaningful (circa 2005), as per the Wikipedia Web 2.0 Talk Page excerpt below:
Web 2.0 is a web of executable endpoints and well formed content. The executable endpoints and well formed content are accessible via URIs. Put differently, Web 2.0 is a web defined by URIs for invoking Web Services and/or consuming or syndicating well formed content.
Hopefully, someone with more time on their hands will expand on this ( I am kinda busy)
.BTW - Web 2.0 being a platform doesn't distinguish it in anyway from Web 1.0. They are both platforms, the difference comes down to platform focus and mode of experience.
Web 3.0 is about Data Spaces: Points of Semantic Web Presence that provide granular access to Data, Information, and Knowledge via Conceptual Data Model oriented Query Languages and/or APIs.
The common denominator across all the current and future Web Interaction Dimensions is HTTP. While their differences are as follows:
Examples of Web 3.0 Infrastructure:
Web 3.0 is not purely about Web Sites becoming Web Services endpoints. It is about the "M" (Data Model) taking it's place in the MVC pattern as applied to the Web Platform.
I will repeat myself yet again:
The Devil is in the Details of the Data Model. Data Models make or break everything. You ignore data at your own peril. No amount of money in the bank will protect you from Data Ignorance! A bad Data Model will bring down any venture or enterprise, the only variable is time (where time is directly related to your increasing need to obtain, analyze, and then act on data, over repetitive operational cycles, that have ever decreasing intervals).
This applies to the Real-time enterprise of Information and/or knowledge workers and Real-time Web Users alike.
BTW - Data Makes Shifts Happen (spotter: Sam Sethi).
]]>The examples above combine OAT and Exhibit. OAT handles the binding to SPARQL.
Here is a pure OAT variation of the prior examples that includes an enhanced anchor (hyperlink) feature that enables a variety of traversal behaviors and actions against the same RDF Data:
Note: Use the "dereference option" (retrieve/get data associated with URI) for maximum effect. The "explore" is useful after you've dereferenced a few URIs. Also note that columns are resizable, like those in a spreadsheet, which also implies dynamic sorting capability.
]]>PREFIX dbpedia: <http://dbpedia.org/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?name ?birth ?death FROM <http://dbpedia.org> WHERE { ?person dbpedia:birthplace <http://dbpedia.org/resource/Berlin> . ?person dbpedia:birth ?birth . ?person foaf:name ?name . ?person dbpedia:death ?death FILTER (?birth < "1900-01-01"^^xsd:date and bif:contains (?name, 'otto')) . } ORDER BY ?name
You can test further using our SPARQL Endpoint for DBpedia or via the DBPedia bound Interactive SPARQL Query Builder or just click *Here* for results courtesy of the SPARQL Protocol (REST based Web Service).
Note: This is in-built functionality as Virtuoso has possessed Full Text Indexing since 1998-99. This capability applies to physical and virtual graphs managed by Virtuoso.
A per usual, there is more to come as we now have a nice intersection point for SPARQL and XQuery/XPath since Triple Objects (the Literal variety) can take the form of XML Schema based Complex Types :-) A point I alluded too in my podcast interview with Jon Udell last year (*note: mechanical turk based transcript is bad*). The point I made went something like this: "...you use SPARQL to traverse the typed links and then use XPath/XQuery for further granular access to the data if well-formed..."
Anyway, the podcast interview lead to this InfoWorld article titled: Unified Data Theory.
Linking personal posted content across communities: "
With the help of Kingsley, Uldis and I have been looking at how SIOC can be used to link the content that a single person posts to a number of community sites. The picture below shows an example of stuff that Iâve created on Flickr, YouTube, etc. through my various user identities on those sites (these match some SIOC types that we want to add to a separate module). We can also say that each Web 2.0 content item is a user-contributed post, with some attached or embedded content (e.g. a file or maybe just some metadata). This is part of a new discussion on the sioc-dev mailing list, and weâd value your contributions.
Edit: The inner layer is a person (semantically described in FOAF), the next layer is their user accounts (described in FOAF, SIOC) and the outer layer is the posted content - text, files, associated metadata - on community sites (again described using SIOC).
No Tags"(Via John Breslin - Cloudlands.)
The point that John is making about the Data Web and Interlinked Data Spaces exposed via URIs (e.g Personal URIs), crystallizes a number of very important issues about the Data Web that may remain unclear. I am hoping that by digesting the post excerpt above, in conjunction with the items below, aids the pursuit of clarity and comprehension about the all important Data Web (Semantic Web - Layer 1):
Examples of some of these principles in practice:
And of course there is more to come such as Grandma's Semantic Web Browser which is coming from Zitgist LLC (pronounced: Zeitgeist) a joint venture of OpenLink Software and Frederick Giasson.
]]>Here is what I was I was able to knock together using my SPARQL QBE (without writing the SPARQL by hand):
Note: Just select the "Explore" option when the link-lookup window appears in response to you clicking on any of the links. That said, if you are using the Firefox Linkification extension the page will not work properly (as per this discussion about disabling Linkification) :-(
BTW - I have a comments page, so don't be shy about showing me how you could produce this kind of data driven web page much quicker than I have :-)
Warning: IE6 and Safari (use Webkit instead) cannot process these pages due to the use of Ajax.
]]>XMP and microformats revisited: "
Yesterday I exercised poetic license when I suggested that Adobeâs Extensible metadata platform (XMP) was not only the spiritual cousin of microformats like hCalendar but also, perhaps, more likely to see widespread use in the near term. My poetic license was revoked, though, in a couple of comments:
Mike Linksvayer: How someone as massively clued-in as Jon Udell could be so misled as to describe XMP as a microformat is beyond me.
Danny Ayers: Like Mike I donât really understand Jonâs references to microformats - I first assumed he meant XMP could be replaced with a uF.
Actually, Iâm serious about this. If I step back and ask myself what are the essential qualities of a microformat, itâs a short list:
Mike notes:
XMP is embedded in a binary file, completely opaque to nearly all users; microformats put a premium on (practically require) colocation of metadata with human-visible HTML.
Yes, I understand. And as someone who is composing this blog entry as XHTML, in emacs, using a semantic CSS tag that will enable me to search for quotes by Mike Linksvayer and find the above fragment, Iâm obviously all about metadata coexisting with human-readable HTML. And Iâve been applying this technique since long before I ever heard the term microformats â my own term was originally microcontent.
(Via Jon Udell.)
I believe Jon is acknowledging the fact that the propagation of metadata in "Binary based" Web data sources is no different to the microformats based propagation that is currently underway in full swing across the "Text based" Web data sources realm. He is reiterating the fact that the Web is self-annotating (exponentially) by way of Metadata Embedding. And yes, what he describes is a similar to Microformats in substance and propagation style :-)
Here is what I believe Jon is hoping to see:
My little "Hello Data Web!" meme was about demonstrating a view that Danny has sought for a while: unobtrusive meshing of microformats and RDF via GRDDL and SPARQL binding that simply eliminates the often perceived "RDF Tax". Danny, Jon, myself, and many others have always understood that making the Data Web (Web of RDF Instance Data) more of a Force (Star Wars style) is the key to unravelling the power of the "Web as a Database". Of course, we also tend the describe our nirvana in different ways that sometimes obscures the fundamental commonality of vision that we all share.
Personally, I believe everyone should simply "feel the force" or observe "the bright and dark sides of the force" that is RDF. When this occurs en masse there will be a global epiphany (similar to what happened around the time of the initial unveiling of the Web of Hypertext). Jon's meme brings the often overlooked realm of binary based metadata sources into the general discourse.
JBinary Files as bona fide Data Web URIs (i.e. Metadata Sources) is much closer than you think :-) I should have my "Hello Data Web of Binary Data Sources" unveiled very soon!
Once you grasp the concept of entering values into the "Default Data Source URI field", take a look at: http://programmableweb.com and other URIs (hint: scroll through the results grid to the QEDWiki demo item)
]]>What I am demonstrating is how existing Web Content hooks transperently into the "Data Web". Zero RDF Tax :-) Everything is good!
Note: Please look to the bottom of the screen for the "Run Query" Button. Remember, it not quite Grandma's UI but should do for Infonauts etc.. A screencast will follow.
]]>OAT: OpenAjax Alliance Compliant Toolkit: "
Ondrej Zara and his team at Openlink Software have created a Openlink Software JS Toolkit, known as OAT. It is a full-blown JS framework, suitable for developing
rich applications with special focus to data access.
OAT works standalone, offers vast number of widgets and has some rarely seen features, such as on-demand library loading (which reduces the total amount of downloaded JS code).
OAT is one of the first JS toolkits which show full OpenAjax Alliance conformance: see the appropriate wiki page and conformance test page.
There is a lot to see with this toolkit:
You can see some of the widgets in a Kitchen sink application
Sample data access applications:
OAT is Open Source and GPLâed over at sourceforge and the team has recently managed to incorporate our OAT data access layer as a
module to dojo datastore.
(Via Ajaxian Blog.)
This is a corrected version of the initial post. Unfortunately, the initial post was inadvertently littered with invalid links :-( Also, since the original post we have released OAT 1.2 that includes integration of our iSPARQL QBE into the OAT Form Designer application.
Re. Data Access, It is important to note that OAT's Ajax Database Connectivity layers supports data binding to the following data source types:
OAT also includes a number of prototype applications that are completely developed using OAT Controls and Libraries:
Note: Pick "Local DSN" from page initialization dialog's drop-down list control when prompted
]]>SPARQL (query language for the Semantic Web) basically enables me to query a collection of typed links (predicates/properties/attributes) in my Data Space (ODS based of course) without breaking my existing local bookmarks database or the one I maintain at del.icio.us.
I am also demonstrating how Web 2.0 concepts such as Tagging mesh nicely with the more formal concepts of Topics in the Semantic Web realm. The key to all of this is the ability to generate RDF Data Model Instance Data based on Shared Ontologies such as SIOC (from DERI's SIOC Project) and SKOS (again showing that Ontologies and Folksonomies are complimentary).
This demo also shows that Ajax also works well in the Semantic Web realm (or web dimension of interaction 3.0) especially when you have a toolkit with Data Aware controls (for SQL, RDF, and XML) such as OAT (OpenLink Ajax Toolkit). For instance, we've successfully used this to build a Visual Query Building Tool for SPARQL (alpha) that really takes a lot of the pain out of constructing SPARQL Queries (there is much more to come on this front re. handling of DISTINCT, FILTER, ORDER BY etc..).
For now, take a look at the SPARQL Query dump generated by this SIOC & SKOS SPARQL QBE Canvas Screenshot.
You can cut and paste the queries that follow into the Query Builder or use the screenshot to build your variation of this query sample. Alternatively, you can simply click on *This* SPARQL Protocol URL to see the query results in a basic HTML Table. And one last thing, you can grab the SPARQL Query File saved into my ODS-Briefcase (the WebDAV repository aspect of my Data Space).
Note the following SPARQL Protocol Endpoints:
My beautified Version of the SPARQL Generated by QBE (you can cut and paste into "Advanced Query" section of QBE) is presented below:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX dct: <http://purl.org/dc/elements/1.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT distinct ?forum_name, ?owner, ?post, ?title, ?link, ?url, ?tag FROM <http://myopenlink.net/dataspace> WHERE { ?forum a sioc:Forum; sioc:type "bookmark"; sioc:id ?forum_name; sioc:has_member ?owner. ?owner sioc:id "kidehen". ?forum sioc:container_of ?post . ?post dct:title ?title . optional { ?post sioc:link ?link } optional { ?post sioc:links_to ?url } optional { ?post sioc:topic ?topic. ?topic a skos:Concept; skos:prefLabel ?tag}. }
Unmodified dump from the QBE (this will be beautified automatically in due course by the QBE):
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX dct: <http://purl.org/dc/elements/1.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?var8 ?var9 ?var13 ?var14 ?var24 ?var27 ?var29 ?var54 ?var56 WHERE { graph ?graph { ?var8 rdf:type sioc:Forum . ?var8 sioc:container_of ?var9 . ?var8 sioc:type "bookmark" . ?var8 sioc:id ?var54 . ?var8 sioc:has_member ?var56 . ?var9 rdf:type sioc:Post . OPTIONAL {?var9 dc:title ?var13} . OPTIONAL {?var9 sioc:links_to ?var14} . OPTIONAL {?var9 sioc:link ?var29} . ?var9 sioc:has_creator ?var37 . OPTIONAL {?var9 sioc:topic ?var24} . ?var24 rdf:type skos:Concept . OPTIONAL {?var24 skos:prefLabel ?var27} . ?var56 rdf:type sioc:User . ?var56 sioc:id "kidehen" . } }
Current missing items re. Visual QBE for SPARQL are:
Quick Query Builder Tip: You will need to import the following (using the Import Button in the Ontologies & Schemas side-bar);
Browser Support: The SPARQL QBE is SVG based and currently works fine with the following browsers; Firefox 1.5/2.0, Camino (Cocoa variant of Firefox for Mac OS X), Webkit (Safari pre-release / advanced sibling), Opera 9.x. We are evaluating the use of the Adobe SVG plugin re. IE 6/7 support.
Of course this should be a screencast, but I am the middle of a plethora of things right now :-)
]]>Web Me2.0 -- Exploding the Myth of Web 2.0:"Many people have told me this week that they think 'Web 2.0' has not been very impressive so far and that they really hope for a next-generation of the Web with some more significant innovation under the hood -- regardless of what it's called. A lot of people found the Web 2.0 conference in San Francisco to be underwhelming -- there was a lot of self-congratulation by the top few brands and the companies they have recently bought, but not much else happening. Where was all the innovation? Where was the focus on what's next? It seemed to be a conference mainly about what happened in the last year, not about what will happen in the coming year. But what happened last year is already so 'last year.' And frankly Web 2.0 still leaves a lot to be desired. The reason Tim Berners-Lee proposed the Semantic Web in the first place is that it will finally deliver on the real potential and vision of the Web. Not that today's Web 2.0 sucks completely -- it only sort of sucks. It's definitely useful and there are some nice bells and whistles we didn't have before. But it could still suck so much less!"
Web 2.0 is a (not was) a piece of the overall Web puzzle. The Data Web (so called Web 3.0) is another critical piece of this puzzle, especially as it provides the foundation layer (Layer 1) of the Semantic Web.
Web 2.0 was never about "Open Data Access", "Flexible Data Models", or "Open World" meshing of disparate data sources built atop disparate data schemas (see: Web 2.0's Open Data Access Conundrum). It was simply about "Execution and APIs". I already written about "Web Interaction Dimensions", but you call also look at the relationship of the currently perceived dimensions through the M-V-C programming pattern:
Another point to note, Social Networking is hot, but nearly every social network that I know (and I know and use most of them) suffers from an impedance mismatch between the service(s) they provide (social networks) and their underlying data models (in many cases Relational as opposed to Graph). Networks are about Relationships (N-ary) and your cannot effectively exploit the deep potential of: "Network Effects" (Wisdom of Crowds, Viral Marketing etc..) without a complimentary data model, you simply can't.
Finally, the Data Web is already here, I promised a long time ago (Internet Time) that the manifestation of the Semantic Web would occur unobtrusively, meaning, we will wake up one day and realize we are using critical portions of the Semantic Web (i.e. Data-Web) without even knowing it. Guess what? It's already happening. Simple case in point, you may have started to notice the emergence of SIOC gems in the same way you may have observed those RSS 2.0 gems at the dawn of Web 2.0. What I am implying here is that the real question we should be asking is: Where is the Semantic Web Data? And how easy or difficult will it be to generate? And where are the tools? My answers are presented below:
Next stop, less writing, more demos, these are long overdue! At least from my side of the fence :-) I need to produce a little step-by-guide oriented screencasts that demonstrates how Web 2.0 meshes nicely with the Data-Web.
Here are some (not so end-user friendly) examples of how you can use SPARQL (Data-Web's Query Language) to query Web 2.0 Instance Data projected through the SIOC Ontology:
Note: You can use the online SPARQL Query Interface at: http://demo.openlinksw.com/isparql.
Other Data-Web Technology usage demos include:
We have lately been busy with RDF scalability. We work with the 8000 university LUBM data set, a little over a billion triples. We can load it in 23h 46m on a box with 8G RAM. With 16G we probably could get it in 16h.
The resulting database is 75G, 74 bytes per triple which is not bad. It will shrink a little more if explicitly compacted by merging adjacent partly filled pages. See Advances in Virtuoso Triple Storage for an in-depth treatment of the subject.
The real question of RDF scalability is finding a way of having more than one CPU on the same index tree without them hitting the prohibitive penalty of waiting for a mutex. The sure solution is partitioning, would probably have to be by range of the whole key. but before we go to so much trouble, well look at dropping a couple of critical sections from index random access. Also some kernel parameters may be adjustable, like a spin count before calling the scheduler when trying to get an occupied mutex. Still we should not waste too much time on platform specifics. Well see.
We just updated the Virtuoso Open Source cut. The latest RDF refinements are not in, so maybe the cut will have to be refreshed shortly.
We are also now applying the relational to RDF mapping discussed in Declarative SQL Schema to RDF Ontology Mapping to the ODS applications.
There is a form of the mapping in the VOS cut on the net but it is not quite ready yet. We must first finish testing it through mapping all the relational schemas of the ODS apps before we can really recommend it. This is another reason for a VOS update in the near future.
We will be looking at the query side of LUBM after the ISWC 2006 conference. So far, we find queries compile OK for many SIOC use cases with the cost model that there is now. A more systematic review of the cost model for SPARQL will come when we get to the queries.
We put some ideas about inferencing in the Advances in Triple Storage paper. The question is whether we should forward chain such things as class subsumption and subproperties. If we build these into the SQL engine used for running SPARQL, we probably can do these as unions at run time with good performance and better working set due to not storing trivial entailed triples. Some more thought and experimentation needs to go into this.
]]>
A declarative language adapted from SPARQL's graph pattern language (N3/Turtle) for mapping SQL Data to RDF Ontologies. We currently refer to this as a Graph Pattern based RDF VIEW Definition Language.
It provides an effective mechanism for exposing existing SQL Data as virtual RDF Data Sets (Graphs) negating the data duplication associated with generating physical RDF Graphs from SQL Data en route to persistence in a dedicated Triple Store.
Enterprise applications (traditional and web based) and most Web Applications (Web 1.0 and Web 2.0) sit atop relational databases, implying that SQL/RDF model and data integration is an essential element of the burgeoning "Data Web" (Semantic Web - Layer 1) comprehension and adoption process.
In a nutshell, this is a quick route for non disruptive exposure of existing SQL Data to SPARQL supporting RDF Tools and Development Environments.
CREATE GRAPH IRI("http://myopenlink.net/dataspace")
CREATE IRI CLASS odsWeblog:feed_iri "http://myopenlink.net/dataspace/kidehen/weblog/MyFeeds" ( in memb varchar not null, in inst varchar not null)
What is Blogosphere 2.0 anyway?
Blog clusters that incorporate the "Open Data Access" dimension to their usage pattern via content exported as RDF Data Sets or Virtual RDF Data Sets (as demonstrated by the OpenLink Data Spaces SIOC Reference). In either scenario, the RDF rendition of blog content is accessible for to ad-hoc querying via SPARQL (btw - checkout this cool SPARQL FAQ).
The really fascinating thing about the "Blgosophere 2.0" is that the transition from "Blogosphere 1.0" is going to be transparent! The "Open Data Access" will actually do the talking etc..
]]>Recent Virtuoso Developments: "
We have been extensively working on virtual database refinements. There aremany SQL cost model adjustments to better model distributed queries and wenow support direct access to Oracle and Informix statistics system tables.Thus, when you attach a table from one or the other, you automatically getup to date statistics. This helps Virtuoso optimize distributed queries.Also the documentation is updated as concerns these, with a new section ondistributed query optimization.
On the applications side, we have been keeping up with the SIOC RDF ontologydevelopments. All ODS applications now make their data available as SIOCgraphs for download and SPARQL query access.
What is most exciting however is our advance in mapping relational data intoRDF. We now have a mapping language that makes arbitrary legacy data in Virtuoso or elsewhere in the relational world RDF queriable. We will putout a white paper on this in a few days.
Also we have some innovations in mind for optimizing the physical storage ofRDF triples. We keep experimenting, now with our sights set to the highend of triple storage, towards billion triple data sets. We areexperimenting with a new more space efficient index structure for betterworking set behavior. Next week will yield the first results.
]]>For additional clarity re. my comments above, you can also look at the SPARQL & SIOC Usecase samples document for our OpenLink Data Spaces platform. Bottom line, the Semantic Web and SPARQL aren't BORING. In fact, quite the contrary, since they are essential ingredients of a more powerful Web than the one we work with today!
Enjoy the rest of John's post:
]]>Creating connections between discussion clouds with SIOC:
(Extract from our forthcoming BlogTalk paper about browsers for SIOC.)
SIOC provides a unified vocabulary for content and interaction description: a semantic layer that can co-exist with existing discussion platforms. Using SIOC, various linkages are created between the aforementioned concepts, which allow new methods of accessing this linked data, including:
- Virtual Forums. These may be a gathering of posts or threads which are distributed across discussion platforms, for example, where a user has found posts from a number of blogs that can be associated with a particular category of interest, or an agent identifies relevant posts across a certain timeframe.
- Distributed Conversations. Trackbacks are commonly used to link blog posts to previous posts on a related topic. By creating links in both directions, not only across blogs but across all types of internet discussions, conversations can be followed regardless of what point or URI fragment a browser enters at.
- Unified Communities. Apart from creating a web page with a number of relevant links to the blogs or forums or people involved in a particular community, there is no standard way to define what makes up an online community (apart from grouping the people who are members of that community using FOAF or OPML). SIOC allows one to simply define what objects are constituent parts of a community, or to say to what community an object belongs (using sioc:has_part / part_of): users, groups, forums, blogs, etc.
- Shared Topics. Technorati (a search engine for blogs) and BoardTracker (for bulletin boards) have been leveraging the free-text tags that people associate with their posts for some time now. SIOC allows the definition of such tags (using the subject property), but also enables hierarchial or non-hierarchial topic definition of posts using sioc:topic when a topic is ambiguous or more information on a topic is required. Combining with other Semantic Web vocabularies, tags and topics can be further described using the SKOS organisation system.
- One Person, Many User Accounts. SIOC also aims to help the issue of multiple identities by allowing users to define that they hold other accounts or that their accounts belong to a particular personal identity (via foaf:holdsOnlineAccount or sioc:account_of). Therefore, all the posts or comments made by a particular person using their various associated user accounts across platforms could be identified.
Continuing from our recent Podcast conversation, Jon Udell sheds further insight into the essence of our conversation via a âStrategic Developerâ column article titled: Accessing the web of databases.
Below, I present an initial dump of a DataSpace FAQ below that hopefully sheds light on the DataSpace vision espoused during my podcast conversation with Jon.
What is a DataSpace?
A moniker for Web-accessible atomic containers that manage and expose Data, Information, Services, Processes, and Knowledge.
What would you typically find in a Data Space? Examples include:
How do Data Spaces and Databases differ?
Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled.
How do Data Spaces and Content Management Systems differ?
Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity.
How do Data Spaces and Knowledgebases differ?
A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models.
What Architectural Components make up a Data Space?
Where can I see a DataSpace along the lines described, in action?
Just look at my blog, and take the journey as follows:
What about other Data Spaces?
There are several and I will attempt to categorize along the lines of query method available:
Type 1 (Free Text Search over HTTP):
Google, MSN, Yahoo!, Amazon, eBay, and most Web 2.0 plays .
Type 2 (Free Text Search and XQuery/XPath over HTTP)
A few blogs and Wikis (Jon Udell's and a few others)
What About Data Space aware tools?
]]>
I was compelled to go back to the RSS 2.0 imbroglio when I came across Dave Winer's comments re. "the SEC attempting to reinvent RSS 2.0..." response to Jon Udell's recent XBRL article.
Although I don't believe in complex entry points into complex technology realms, I do subscribe to the approach where developers deal with the complexity associated with a problem domain while hiding said complexity from ambivalent end-users via coherent interfaces -- which does not always imply User Interface.
XBRL is a great piece of work that addresses the complex problem domain of Financial Reporting. The only thing it's missing right now is an Ontology that facilitates RDF Data Model based XBRL Schema and Instance Data which ultimately makes XBRL data available to RDF query languages such as SPARQL. This line of thought implies, for instance, an XML Schema to OWL Ontology Mapping for Schema Data (as explained in a white paper by the VSIS Group at the university of Hamburg) leaving the Instance Data to be generated in a myriad of ways that includes XML to RDF and/or XML->SQL->RDF.
As I stated in an earlier post: we should not mistake ambivalence to lack of intelligence. Assuming "Simple" is always right at all times is another way of subscribing to this profound misconception. You know, assuming the world was flat (as opposed to geoid) was quite palatable at some point in the history of mankind, I wonder what would have happened if we held on to this point of view to this day because of its "Simplicity"?
]]>
OAT offers a broad Javascript-based, browser-independent widget set
for building data source independent rich internet applications that are usable across a broad range of Ajax-capable web browsers.
OAT's support binding to the following data sources via its Ajax Database Connectivity Layer:
SQL Data via XML for Analysis (XMLA)
Web Data via SPARQL, GData, and OpenSearch Query Services
Web Services specific Data via service specific binding to SOAP and REST style web services
The toolkit includes a collection of powerful rich internet application prototypes include: SQL Query By Example, Visual Database Modeling, and Data bound Web Form Designer.
Project homepage on sourceforge.net:
http://sourceforge.net/projects/oat
Source Code:
http://sourceforge.net/projects/oat/files
Live demonstration:
http://www.openlinksw.com/oat/
]]>
Goggle vs Semantic Web: "Google exec challenges Berners-Lee 'At the end of the keynote, however, things took a different turn. Google Director of Search and AAAI Fellow Peter Norvig was the first to the microphone during the Q&A session, and he took the opportunity to raise a few points.
'What I get a lot is: 'Why are you against the Semantic Web?' I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first,' Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user.'
Related: Google Base -- summing up."
(Via More News.)
When will we drop the ill conceived notion that end-users are incompetent?
Has it every occurred to software developers and technology vendors that incompetent, dumb, and other contemptuous end-user adjectives simply reflect the inability of most technology products to surmount end-user "Interest Activation Thresholds"?
Interest Activation Threshold (IAT)? What's That?
I have a fundamental personal belief that all human beings are intelligent. Our ability to demonstrate intelligence, or be perceived as intelligent, is directly proportional to our interest level in a given context. In short, we have "Ambivalence Quotients" (AQs) just as we have "Intelligence Quotients" (IQs).
An interested human being is an inherently intelligent entity. The abstract nature of human intelligence also makes locating the IQ and AQ on/off buttons a mercurial quest at the best of times.
Technology end-users exhibit high AQs, most of the time due to the inability of most technology products to truly engage, and ultimately stimulate genuine interest, by surmounting IAT and reducing AQ.
Ironically, when a technology vendor is lagging behind its competitors in the "features arms race" it is common place to use the familiar excuse: "our end-users aren't asking for this feature".
Note To Google:
Ambivalence isn't incompetence. If end-users were genuinely incompetent, how is that they run rings around your page rank algorithms by producing google-friendly content at the expense of valuable context? What about the deteriorating value of Adsense due to click fraud? Likewise, the continued erosion of the value of your once exemplary "keyword based search" service? As we all know, necessity is the mother of invention, so when users develop high AQs because there is nothing better, we end up with a forced breech of "IAT"; which is why the issues that I mention remain long term challenges for you. Ironically, the so called "incompetents" are already outsmarting you, and you don't seem to comprehend this reality or its inevitable consequences.
Finally, how you are going to improve value without integrating the Semantic Web vision into your R&D roadmap? I can tell you categorically that you have little or no wiggle room re. this matter, especially if you want to remain true to your: "don't be evil" mantra. My guess is that you will incorporate Semantic Web technologies sooner rather than later (Google Co-op is a big clue). I would even go as far as predicting a Google hosted SPARQL Query Endpoint alongside your GData endpints during the next 6-12 months (if even that long). I believe that your GData protocol (like the rest of Web 2.0) will ultimately accelerate your appreciation of the data model dexterity that RDF brings to loosely coupled knowledge networks espoused by the Semantic Web vision.
Google & Semantic Web Paradox
The Semantic Web vision has the RDF graph data model at its core (and for good reason), but even more confusing for me, as I process Google sentiments about the Semantic Web, is the fact that RDF's actual creator (Ramanathan Guha aka. Guha) currently works at Google. There's a strange disconnect here IMHO.
If I recall correctly, Google wants to organize the worlds data and information, leaving the knowledge organization to someone else which is absolutely fine. What is increasingly irksome, is the current tendency to use corporate stature to generate Fear, Uncertainty, and Doubt when the subject matter is the "Semantic Web".
BTW - I've just read Frederick Giasson's perspective on the Google Semantic Web paradox which ultimately leads to the same conclusions regarding Google's FUD stance when dealing with matters relating to the Semantic Web.
I wonder if anyone is tracking the google hits for "fud google semantic web"?
]]>More Thoughts on ORDBMS Clients, .NET and RDF:
Continuing on from the previous post... If Microsoft opens the right interfaces for independent developers, we see many exciting possibilities for using ADO .NET 3 with Virtuoso.]]>Microsoft quite explicitly states that their thrust is to decouple the client side representation of data as .NET objects from the relational schema on the database. This is a worthy goal.
But we can also see other possible applications of the technology when we move away from strictly relational back ends. This can go in two directions: Towards object oriented database and towards making applications for the semantic web.
In the OODBMS direction, we could equate Virtuoso table hierarchies with .NET classes and create a tighter coupling between client and database, going as it were in the other direction from Microsofts intended decoupling. For example, we could do typical OODBMS tricks such as prefetch of objects based on storage clustering. The simplest case of this is like virtual memory, where the request for one byte brings in the whole page or group of pages. The basic idea is that what is created together probably gets used together and if all objects are modeled as subclasses of (subtables) of a common superclass, then, regardless of instance type, what is created together (has consecutive ids) will indeed tend to cluster on the same page. These tricks can deliver good results in very navigational applications like GIS or CAD. But these are rather specialized things and we do not see OODBMS making any great comeback.
But what is more interesting and more topical in the present times is making clients for the RDF world. There, the OWL Ontology Language could be used to make the .NET classes and the DBMS could, when returning URIs serving as subjects of triple include specified predicates on these subjects, enough to allow instantiating .NET instances as 'proxies' of these RDF objects. Of course, only predicates for which the client has a representation are relevant, thus some client-server handshake is needed at the start. What data could be prefetched is like the intersection of a concise bounded description and what the client has classes for. The rest of the mapping would be very simple, with IRIs becoming pointers, multi-valued predicates lists and so on. IRIs for which the RDF type were not known or inferable could be left out or represented as a special class with name-value pairs for its attributes, same with blank nodes.
In this way,.NETs considerable UI capabilities could directly be exploited for visualizing RDF data, only given that the data complied reasonably well with a known ontology.
If an SPARQL query returned a resultset, IRI type columns would be returned as .NET instances and the server would prefetch enough data for filling them in. For a SPARQL CONSTRUCT, a collection object could be returned with the objects materialized inside. If the interfaces allow passing an Entity SQL string, these could possibly be specialized to allow for a SPARQL string instead. LINQ might have to be extended to allow for SPARQL type queries, though.
Many of these questions will be better answerable as we get more details on Microsofts forthcoming ADO .NET release. We hope that sufficient latitude exists for exploring all these interesting avenues of development.
I shopped for everything except food on eBay. When working with foreign-language documents, I used translations from Babel Fish. (This worked only so well. After a Babel Fish round-trip through Italian, the preceding sentence reads, 'That one has only worked therefore well.') Why use up space storing files on my own hard drive when, thanks to certain free utilities, I can store them on Gmail's servers? I saved, sorted, and browsed photos I uploaded to Flickr. I used Skype for my phone calls, decided on books using Amazon's recommendations rather than 'expert' reviews, killed time with videos at YouTube, and listened to music through customizable sites like Pandora and Musicmatch. I kept my schedule on Google Calendar, my to-do list on Voo2do, and my outlines on iOutliner. I voyeured my neighborhood's home values via Zillow. I even used an online service for each stage of the production of this article, culminating in my typing right now in Writely rather than Word. (Being only so confident that Writely wouldn't somehow lose my work -- or as Babel Fish might put it, 'only confident therefore' -- I backed it up into Gmail files.Interesting article, Tim O'Reilly's response is here"
(Via Valentin Zacharias (Student).)
Tim O'Reilly's response provides the following hierarchy for Web 2.0 based on The what he calls: "Web 2.0-ness":
level 3: The application could ONLY exist on the net, and draws its essential power from the network and the connections it makes possible between people or applications. These are applications that harness network effects to get better the more people use them. EBay, craigslist, Wikipedia, del.icio.us, Skype, (and yes, Dodgeball) meet this test. They are fundamentally driven by shared online activity. The web itself has this character, which Google and other search engines have then leveraged. (You can search on the desktop, but without link activity, many of the techniques that make web search work so well are not available to you.) Web crawling is one of the fundamental Web 2.0 activities, and search applications like Adsense for Content also clearly have Web 2.0 at their heart. I had a conversation with Eric Schmidt, the CEO of Google, the other day, and he summed up his philosophy and strategy as "Don't fight the internet." In the hierarchy of web 2.0 applications, the highest level is to embrace the network, to understand what creates network effects, and then to harness them in everything you do.
Level 2: The application could exist offline, but it is uniquely advantaged by being online. Flickr is a great example. You can have a local photo management application (like iPhoto) but the application gains remarkable power by leveraging an online community. In fact, the shared photo database, the online community, and the artifacts it creates (like the tag database) is central to what distinguishes Flickr from its offline counterparts. And its fuller embrace of the internet (for example, that the default state of uploaded photos is "public") is what distinguishes it from its online predecessors.
Level 1: The application can and does exist successfully offline, but it gains additional features by being online. Writely is a great example. If you want to do collaborative editing, its online component is terrific, but if you want to write alone, as Fallows did, it gives you little benefit (other than availability from computers other than your own.)
Level 0: The application has primarily taken hold online, but it would work just as well offline if you had all the data in a local cache. MapQuest, Yahoo! Local, and Google Maps are all in this category (but mashups like housingmaps.com are at Level 3.) To the extent that online mapping applications harness user contributions, they jump to Level 2.
So, in a sense we have near conclusive confirmation that Web 2.0 is simply about APIs (typically service specific Data Silos or Walled-gardens) with little concern, understanding, or interest in truly open data access across the burgeoning "Web of Databases". Or the Web of "Databases and Programs" that I prefer to describe as "Data Spaces"
Thus, we can truly begin to conclude that Web 3.0 (Data Web) is the addition of Flexible and Open Data Access to Web 2.0; where the Open Data Access is achieved by leveraging Semantic Web deliverables such as the RDF Data Model and the SPARQL Query Language :-)
]]>Note to Tim:
Is the RDF.net domain deal still on? I know it's past 1st Jan 2006, but do bear in mind that the critical issue of a broadly supported RDF Query Language only took significant shape approximately 13 months ago (in the form of SPARQL), and this is all so critical to the challenge you posed in 2003.
RDF.net could become a point of semantic-web-presence through which the benefits of SPARQL compliant Triple|Quad Stores, Shared Ontologies, and SPARQL Protocol are unveiled in their well intended glory :-).
]]>Hiding Ontology from the Semantic Web Users: "
Ontology is a key foundation of the Semantic Web. Without ontology, it will be difficult for applications to share knowledge and reason over information that is published on the Web. However, it is a serious mistake to think that the Semantic Web is simply a collection of ontologies.
Last week I was invited to be on a panel discussion at the Humans and the Semantic Web Workshop. I talked a bit about the Geospatial Semantic Web and its associated research issues. Overall the workshop went very well. You can read about the notes from the workshop here.
Some of my new thinkings after the workshop are as the follows.
I was asked the question, âWhatâre user-related issues that Semantic Web developers must pay attention to?â I think building Semantic Web applications are similar to building database applications. Few things we can learn from our past experience in building database applications.
When building database-driven applications, we store information in SQL databases, and we use SQL to access, manipulate, and manage this information. When building Semantic Web applications, we express ontologies and information in RDF, and use RDF query languages (e.g. SPARQL) to access and manipulate this information.
When building database-driven applications, we hide complexity from the end-users. For example, we almost never expose raw SQL statements to the end users, or ask users to process the raw result sets returned from an SQL engine. We always provide intuitive interfaces for accessing and representing information.
When building Semantic Web applications, we should also hide complexity from the end-users. Users shouldnât need to see or edit RDF statements. Users shouldnât need to be fluent in SPARQL queries or able parse graphs that are returned by a SPARQL engine.
Semantic Web developers should spend more time on building functional capabilities that solve real world problems and improve peopleâs productivity. Itâs important to remember that âthe Semantic Web != ontologiesâ.
" ]]>
A quick FYI:
Virtuoso has offered a DBMS hosted Filesystem via WebDAV for a number of years, but the implications of this functionality have remained unclear for just as long. Thus, we developed (a few years ago) and released (recently) an application layer above Virtuoso's WebDAV storage realm called: “The OpenLink Briefcase” (nee. oDrive). This application allows you to view items uploaded by content type and/or kind (People, Business Cards, Calendars, Business Reports, Office Documents, Photos, Blog Posts, Feed Channels/Subscriptions, Bookmarks etc..). it also includes automatic metadata extraction (where feasible) and indexing. Naturally, as an integral part of our “OpenLink Data Spaces” (ODS) product offering, it supports GData, URIQA, SPARQL (note: WebDAV metadata is sync'ed with Virtuoso's RDF Triplestore), SQL, and WebDAV itself.
You can explore the power of this product via the following routes:
"We all know that structured data is boring and useless; while unstructured data is sexy and chock full of value. Well, only up to a point, Lord Copper. Genuinely unstructured data can be a real nuisance - imagine extracting the return address from an unstructured letter, without letterhead and any of the formatting usually applied to letters. A letter may be thought of as unstructured data, but most business letters are, in fact, highly-structured." ....Duncan Pauly, founder and chief technology officer of Coppereye add's eloquent insight to the conversation:
"The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure:* The structure of the data itself.
* The structure of the container that hosts the data.
* The structure of the access method used to access the data.
These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."
Data understanding and appreciation is dwindling at a time when the reverse should be happening. We are supposed to be in the throws of the "Information Age", but for some reason this appears to have no correlation with data and "data access" in the minds of many -- as reflected in the broad contradictory positions taken re. unstructured data vs structured data, structured is boring and useless while unstructured is useful and sexy....
The difference between "Structured Containers" and "Structured Data" are clearly misunderstood by most (an unfortunate fact).
For instance all DBMS products are "Structured Containers" aligned to one or more data models (typically one). These products have been limited by proprietary data access APIs and underlying data model specificity when used in the "Open-world" model that is at the core of the World Wide Web. This confusion also carries over to the misconception that Web 2.0 and the Semantic/Data Web are mutually exclusive.
But things are changing fast, and the concept of multi-model DBMS products is beginning to crystalize. On our part, we have finally released the long promised "OpenLink Data Spaces" application layer that has been developed using our Virtuoso Universal Server. We have structured unified storage containment exposed to the data web cloud via endpoints for querying or accessing data using a variety of mechanisms that include; GData, OpenSearch, SPARQL, XQuery/XPath, SQL etc..
To be continued....
]]>Apple patent application for cascade feature for creating records in a database:
" On June 22, the US Patent & Trademark Office revealed Apple’s patent application titled ‘Cascade feature for creating records in a database,’ originally filed in December 2004. The present invention relates to databases and, more particularly, to providing a cascade feature for a database program which can serve as an... [ read more ]"
(Via Macsimum News.)
Its one thing to not know, or have any demonstrable interest in, the enterprise corporate market (the land of database technology utilization etc..), and a completely different matter when lack of technology advances in this realm amount to advertising one's ignorance about database matters so publicly.
I would like to assume that this patent is dead on arrival since there should be an army of DBMS vendors Triggered by this attempt to CASCADE DELETE years of existing prior art LOL!!
The attempt to use Model Independence as the patentable variation of "DBMS Cascade Functionality" prior art doesn't wash. CASCADE functionality is old news in the real DBMS world! What next? Patent application for mixing SQL and SPARQL in 2009?
There is a gradual sense that we are now making the Conceptual View of Data Real, across the board, and obviously there would be a clear need to apply CASCADE technology in this context. But the fact that you realize this now (Apple!) simply doesn't make it novel in any shape or form.
]]>Virtuoso extends its SQL3 implementation with syntax for integrating SPARQL into queries and subqueries.Thus, as part of a SQL SELECT query or subquery, one can write the SPARQL keyword and a SPARQL query as part of query text processed by Virtuoso's SQL Query Processor.
Using Virtuoso's Command line or the Web Based ISQL utility type in the following (note: "SQL>" is the command line prompt for the native ISQL utility):
SQL> sparql select distinct ?p where { graph ?g { ?s ?p ?o } };
Which will return the following:
p varchar ---------- http://example.org/ns#b http://example.org/ns#d http://xmlns.com/foaf/0.1/name http://xmlns.com/foaf/0.1/mbox ...
SQL> select distinct subseq (p, strchr (p, '#')) as fragment from (sparql select distinct ?p where { graph ?g { ?s ?p ?o } } ) as all_predicates where p like '%#%' ;
fragment varchar ---------- #query #data #name #comment ...
You can pass parameters to a SPARQL query using a Virtuoso-specific syntax extension. '??' or '$?' indicates a positional parameter similar to '?' in standard SQL. '??' can be used in graph patterns or anywhere else where a SPARQL variable is accepted. The value of a parameter should be passed in SQL form, i.e. this should be a number or an untyped string. An IRI ID can not be passed, but an absolute IRI can. Using this notation, a dynamic SQL capable client (ODBC, JDBC, ADO.NET, OLEDB, XMLA, or others) can execute parametrized SPARQL queries using parameter binding concepts that are common place in dynamic SQL. Which implies that existing SQL applications and development environments (PHP, Ruby, Python, Perl, VB, C#, Java, etc.) are capable of issuing SPARQL queries via their existing SQL bound data access channels against RDF Data stored in Virtuoso.
Note: This is the Virtuoso equivalent of a recently published example using Jena (a Java based RDF Triple Store).
Create a Virtuoso Function by execting the following:
SQL> create function param_passing_demo (); { declare stat, msg varchar; declare mdata, rset any; exec ('sparql select ?s where { graph ?g { ?s ?? ?? }}', stat, msg, vector ('http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#int1', 4 ), -- Vector of two parameters 10, -- Max. result-set rows mdata, -- Variable for handling result-set metadata rset -- Variable for handling query result-set ); return rset[0][0]; }Test new "param_passing_demo" function by executing the following:
SQL> select param_passing_demo ();
Which returns:
callret VARCHAR _______________________________________________________________________________http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#four1 Rows. -- 00000 msec.
A SPARQL ASK query can be used as an argument of the SQL EXISTS predicate.
create function sparql_ask_demo () returns varchar { if (exists (sparql ask where { graph ?g { ?s ?p 4}})) return 'YES'; else return 'NO'; };
Test by executing:
SQL> select sparql_ask_demo ();
Which returns:
_________________________ YES]]>
Sometimes, an application will be making a SPARQL query, using the results from a previous query or using some RDF term found through the other Jena APIs.
SQL has prepared statements - they allow an SQL statement to take a number of parameters. The application fills in the parameters and executes the statement.
One way is to resort to doing this in SPARQL by building
a complete, new query string, parsing it and executing it.
But it takes a little care to handle all cases like
quoting special characters; you can at least use some of the
many utilities in ARQ for producing strings such as
FmtUtils.stringForResource
(it's
not in the application API but in the util
package currently).
Queries in ARQ can be built programmatically but it is tedious, especially when the documentation hasn't been written yet.
Another way is to use query variables and bind them to initial values that apply to all query solutions. Consider the query:
PREFIX dc <http://purl.org/dc/elements/1.1/> SELECT ?doc { ?doc dc:title ?title }
It gets documents and their titles.
Executing a query in program might look like:
import com.hp.hpl.jena.query.* ; Model model = ... ;
String queryString = StringUtils.join('\n', new String[]{ 'PREFIX dc <http://purl.org/dc/elements/1.1/>', 'SELECT ?doc { ?doc dc:title ?title }' }) ; Query query = QueryFactory.create(queryString) ; QueryExecution qexec = QueryExecutionFactory.create(query, model) ; try { ResultSet results = qexec.execSelect() ; for ( ; results.hasNext() ; ) { QuerySolution soln = results.nextSolution() ; Literal l = soln.getLiteral('doc') ; } } finally { qexec.close() ; }
Suppose the application knows the title it's interesting in - can it use this to get the document?
The value of ?title
made a parameter to the query
and fixed by an initial binding. All query solutions will
be restricted to patterns matches where ?title
is that RDF term.
QuerySolutionMap initialSettings = new QuerySolutionMap() ; initialSettings.add('title', node) ;
and this is passed to the factory that creates QueryExecution's:
QueryExecution qexec = QueryExecutionFactory.create(query, model, initialSettings) ;
It doesn't matter if the node is a literal, a resource with URI or a blank node. It becomes a fixed value in the query, even a blank node, because it's not part of the SPARQL syntax, it's a fixed part of every solution.
This gives named parameters to queries enabling something like SQL prepared statements except with named parameters not positional ones.
This can make a complex application easier to structure and clearer to read. It's better than bashing strings together, which is error prone, inflexible, and does not lead to clear code.
(Via ARQtick.)
]]>I added the missing piece regarding the "Virtuoso Conductor" (the Web based Admin UI for Virtuoso) to the original post below. I also added a link to our live SPARQL Demo so that anyone interested can start playing around with SPARQL and SPARQL integrated into SQL right away.
Another good thing about this post is the vast amount of valuable links that it contains. To really appreciate this point simply visit my Linkblog (excuse the current layout :-) - a Tab if you come in via the front door of this Data Space (what I used to call My Weblog Home Page).
]]>"Free" Databases: Express vs. Open-Source RDBMSs: "Open-source relational database management systems (RDBMSs) are gaining IT mindshare at a rapid pace. As an example, BusinessWeek's February 6, 2006 ' Taking On the Database Giants ' article asks 'Can open-source upstarts compete with Oracle, IBM, and Microsoft?' and then provides the answer: 'It's an uphill battle, but customers are starting to look at the alternatives.'
There's no shortage of open-source alternatives to look at. The BusinessWeek article concentrates on MySQL, which BW says 'is trying to be the Ikea of the database world: cheap, needs some assembly, but has a sleek, modern design and does the job.' The article also discusses Postgre[SQL] and Ingres, as well as EnterpriseDB, an Oracle clone created from PostgreSQL code*. Sun includes PostgreSQL with Solaris 10 and, as of April 6, 2006, with Solaris Express.**
*Frank Batten, Jr., the investor who originally funded Red Hat, invested a reported $16 million into Great Bridge with the hope of making a business out of providing paid support to PostgreSQL users. Great Bridge stayed in business only 18 months , having missed an opportunity to sell the business to Red Hat and finding that selling $50,000-per-year support packages for an open-source database wasn't easy. As Batten concluded, 'We could not get customers to pay us big dollars for support contracts.' Perhaps EnterpriseDB will be more successful with a choice of $5,000, $3,000, or $1,000 annual support subscriptions .
**Interestingly, Oracle announced in November 2005 that Solaris 10 is 'its preferred development and deployment platform for most x64 architectures, including x64 (x86, 64-bit) AMD Opteron and Intel Xeon processor-based systems and Sun's UltraSPARC(R)-based systems.'
There is a surfeit of reviews of current MySQL, PostgreSQL andâto a lesser extentâIngres implementations. These three open-source RDBMSs come with their own or third-party management tools. These systems compete against free versions of commercial (proprietary) databases: SQL Server 2005 Express Edition (and its MSDE 2000 and 1.0 predecessors), Oracle Database 10g Express Edition, IBM DB2 Express-C, and Sybase ASE Express Edition for Linux where database size and processor count limitations aren't important. Click here for a summary of recent InfoWorld reviews of the full versions of these four databases plus MySQL, which should be valid for Express editions also. The FTPOnline Special Report article, 'Microsoft SQL Server Turns 17,' that contains the preceding table is here (requires registration.)
SQL Server 2005 Express Edition SP-1 Advanced Features
SQL Server 2005 Express Edition with Advanced Features enhances SQL Server 2005 Express Edition (SQL Express or SSX) dramatically, so it deserves special treatment here. SQL Express gains full text indexing and now supports SQL Server Reporting Services (SSRS) on the local SSX instance. The SP-1 with Advanced Features setup package, which Microsoft released on April 18, 2006, installs the release version of SQL Server Management Studio Express (SSMSE) and the full version of Business Intelligence Development Studio (BIDS) for designing and editing SSRS reports. My 'Install SP-1 for SQL Server 2005 and Express' article for FTPOnline's SQL Server Special Report provides detailed, illustrated installation instructions for and related information about the release version of SP-1. SP-1 makes SSX the most capable of all currently available Express editions of commercial RDBMSs for Windows.
OpenLink Software's Virtuoso Open-Source Edition
OpenLink Software announced an open-source version of it's Virtuoso Universal Server commercial DBMS on April 11, 2006. On the initial date of this post, May 2, 2006, Virtuoso Open-Source Edition (VOS) was virtually under the radar as an open-source product. According to this press release, the new edition includes:VOS only lacks the virtual server and replication features that are offered by the commercial edition. VOS includes a Web-based administration tool called the "Virtuoso Conductor" According to Kingsley Idehen's Weblog, 'The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).'
- SPARQL compliant RDF Triple Store
- SQL-200n Object-Relational Database Engine (SQL, XML, and Free Text)
- Integrated BPEL Server and Enterprise Service Bus
- WebDAV and Native File Server
- Web Application Server that supports PHP, Perl, Python, ASP.NET, JSP, etc.
- Runtime Hosting for Microsoft .NET, Mono, and Java
InfoWorld's Jon Udell has tracked Virtuoso's progress since 2002, with an additional article in 2003 and a one-hour podcast with Kingsley Idehen on April 26, 2006. A major talking point for Virtuoso is its support for Atom 0.3 syndication and publication, Atom 1.0 syndication and (forthcoming) publication, and future support for Google's GData protocol, as mentioned in this Idehen post. Yahoo!'s Jeremy Zawodny points out that the 'fingerprints' of Adam Bosworth, Google's VP of Engineering and the primary force behind the development of Microsoft Access, 'are all over GData.' Click here to display a list of all OakLeaf posts that mention Adam Bosworth.
One application for the GData protocol is querying and updating the Google Base database independently of the Google Web client, as mentioned by Jeremy: 'It's not about building an easier onramp to Google Base. ... Well, it is. But, again, that's the small stuff.' Click here for a list of posts about my experiences with Google Base. Watch for a future OakLeaf post on the subject as the GData APIs gain ground.
Open-Source and Free Embedded Database Contenders
Open-source and free embedded SQL databases are gaining importance as the number and types of mobile devices and OSs proliferate. Embedded databases usually consist of Java classes or Windows DLLs that are designed to minimize file size and memory consumption. Embedded databases avoid the installation hassles, heavy resource usage and maintenance cost associated with client/server RDBMSs that run as an operating system service.
Andrew Hudson's December 2005 'Open Source databases rounded up and rodeoed' review for The Enquirer provides brief descriptions of one commercial and eight open source database purveyors/products: Sleepycat, MySQL, PostgreSQL, Ingres, InnoBase, Firebird, IBM Cloudscape (a.k.a, Derby), Genezzo, and Oracle. Oracle Sleepycat* isn't an SQL Database, Oracle InnoDB* is an OEM database engine that's used by MySQL, and Genezzo is a multi-user, multi-server distributed database engine written in Perl. These special-purpose databases are beyond the scope of this post.
* Oracle purchased Sleepycat Software, Inc. in February 2006 and purchased Innobase OY in October 2005 . The press release states: 'Oracle intends to continue developing the InnoDB technology and expand our commitment to open source software.'
Derby is an open-source release by the Apache Software Foundation of the Cloudscape Java-based database that IBM acquired when it bought Informix in 2001. IBM offers a commercial release of Derby as IBM Cloudscape 10.1. Derby is a Java class library that has a relatively light footprint (2 MB), which make it suitable for client/server synchronization with the IBM DB2 Everyplace Sync Server in mobile applications. The IBM DB2 Everyplace Express Edition isn't open source or free*, so it doesn't qualify for this post. The same is true for the corresponding Sybase SQL Anywhere components.**
* IBM DB2 Everyplace Express Edition with synchronization costs $379 per server (up to two processors) and $79 per user. DB2 Everyplace Database Edition (without DB2 synchronization) is $49 per user. (Prices are based on those when IBM announced version 8 in November 2003.)
** Sybase's iAnywhere subsidiary calls SQL Anywhere 'the industry's leading mobile database.' A Sybase SQL Anywhere Personal DB seat license with synchronization to SQL Anywhere Server is $119; the cost without synchronization wasn't available from the Sybase Web site. Sybase SQL Anywhere and IBM DB2 Everyplace perform similar replication functions.
Sun's Java DB, another commercial version of Derby, comes with the Solaris Enterprise Edition, which bundles Solaris 10, the Java Enterprise System, developer tools, desktop infrastructure and N1 management software. A recent Between the Lines blog entry by ZDNet's David Berlind waxes enthusiastic over the use of Java DB embedded in a browser to provide offline persistence. RedMonk analyst James Governor and eWeek's Lisa Vaas wrote about the use of Java DB as a local data store when Tim Bray announced Sun's Derby derivative and Francois Orsini demonstrated Java DB embedded in the Firefox browser at the ApacheCon 2005 conference.
Firebird is derived from Borland's InterBase 6.0 code, the first commercial relational database management system (RDBMS) to be released as open source. Firebird has excellent support for SQL-92 and comes in three versions: Classic, SuperServer and Embedded for Windows, Linux, Solaris, HP-UX, FreeBSD and MacOS X. The embedded version has a 1.4-MB footprint. Release Candidate 1 for Firebird 2.0 became available on March 30, 2006 and is a major improvement over earlier versions. Borland continues to promote InterBase, now at version 7.5, as a small-footprint, embedded database with commercial Server and Client licenses.
SQLite is a featherweight C library for an embedded database that implements most SQL-92 entry- and transitional-level requirements (some through the JDBC driver) and supports transactions within a tiny 250-KB code footprint. Wrappers support a multitude of languages and operating systems, including Windows CE, SmartPhone, Windows Mobile, and Win32. SQLite's primary SQL-92 limitations are lack of nested transactions, inability to alter a table design once committed (other than with RENAME TABLE and ADD COLUMN operations), and foreign-key constraints. SQLite provides read-only views, triggers, and 256-bit encryption of database files. A downside is the the entire database file is locked when while a transaction is in progress. SQLite uses file access permissions in lieu of GRANT and REVOKE commands. Using SQLite involves no license; its code is entirely in the public domain.The Mozilla Foundation's Unified Storage wiki says this about SQLite: 'SQLite will be the back end for the unified store [for Firefox]. Because it implements a SQL engine, we get querying 'for free', without having to invent our own query language or query execution system. Its code-size footprint is moderate (250k), but it will hopefully simplify much existing code so that the net code-size change should be smaller. It has exceptional performance, and supports concurrent access to the database. Finally, it is released into the public domain, meaning that we will have no licensing issues.'
Vieka Technology, Inc.'s eSQL 2.11 is a port of SQLite to Windows Mobile (Pocket PC and Smartphone) and Win32, and includes development tools for Windows devices and PCs, as well as a .NET native data provider. A conventional ODBC driver also is available. eSQL for Windows (Win32) is free for personal and commercial use; eSQL for Windows Mobile requires a license for commercial (for-profit or business) use.
HSQLDB isn't on most reviewers' radar, which is surprising because it's the default database for OpenOffice.org (OOo) 2.0's Base suite member. HSQLDB 1.8.0.1 is an open-source (BSD license) Java dembedded database engine based on Thomas Mueller's original Hypersonic SQL Project. Using OOo's Base feature requires installing the Java 2.0 Runtime Engine (which is not open-source) or the presence of an alternative open-source engine, such as Kaffe. My prior posts about OOo Base and HSQLDB are here, here and here.
The HSQLDB 1.8.0 documentation on SourceForge states the following regarding SQL-92 and later conformance:
Other less well-known embedded databases designed for or suited to mobile deployment are Mimer SQL Mobile and VistaDB 2.1 . Neither product is open-source and require paid licensing; VistaDB requires a small up-front payment by developers but offers royalty-free distribution.HSQLDB 1.8.0 supports the dialect of SQL defined by SQL standards 92, 99 and 2003. This means where a feature of the standard is supported, e.g. left outer join, the syntax is that specified by the standard text. Many features of SQL92 and 99 up to Advanced Level are supported and here is support for most of SQL 2003 Foundation and several optional features of this standard. However, certain features of the Standards are not supported so no claim is made for full support of any level of the standards.
Java DB, Firebird embedded, SQLite and eSQL 2.11 are contenders for lightweight PC and mobile device database projects that aren't Windows-only.
SQL Server 2005 Everywhere
If you're a Windows developer, SQL Server Mobile is the logical embedded database choice for mobile applications for Pocket PCs and Smartphones. Microsoft's April 19, 2006 press release delivered the news that SQL Server 2005 Mobile Editon (SQL Mobile or SSM) would gain a big brotherâSQL Server 2005 Everywhere Edition.
Currently, the SSM client is licensed (at no charge) to run in production on devices with Windows CE 5.0, Windows Mobile 2003 for Pocket PC or Windows Mobile 5.0, or on PCs with Windows XP Tablet Edition only. SSM also is licensed for development purposes on PCs running Visual Studio 2005. Smart Device replication with SQL Server 2000 SP3 and later databases has been the most common application so far for SSM.
By the end of 2006, Microsoft will license SSE for use on all PCs running any Win32 version or the preceding device OSs. A version of SQL Server Management Studio Express (SSMSE)âupdated to support SSEâis expected to release by the end of the year. These features will qualify SSE as the universal embedded database for Windows client and smart-device applications.
For more details on SSE, read John Galloway's April 11, 2006 blog post and my 'SQL Server 2005 Mobile Goes Everywhere' article for the FTPOnline Special Report on SQL Server."(Via OakLeaf Systems.)
I would like to make an important clarification re. the GData Protocol and what is popularly dubbed as "Adam Bosworth's fingerprints." I do not believe in a one solution (a simple one for the sake of simplicity) to a deceptively complex problem. Virtuoso supports Atom 1.0 (syndication only at the current time) and Atom 0.3 (syndication and publication which have been in place for years)."In my fourth Friday podcast we hear from Kingsley Idehen, CEO of OpenLink Software. I wrote about OpenLink's universal database and app server, Virtuoso, back in 2002 and 2003. Earlier this month Virtuoso became the first mature SQL/XML hybrid to make the transition to open source. The latest incarnation of the product also adds SPARQL (a semantic web query language) to its repertoire. ..."
(Via Jon's Radio.)
BTW - the GData Protocol and Atom 1.0 publishing support will be delivered in both the Open Source and Commercial Edition updates to Virtuoso next week (very little work due to what's already in place).
I make the clarification above to eliminate the possibility of assuming mutual exclusivity of my perspective/vison and Adam's (Jon also makes this important point when he speaks about our opinions being on either side of a spectrum/continuum). I simply want to broaden the scope of this discussion. I am a profound believer in the Semantic Web / Data Web vision, and I predict that we will be querying the Googlebase via SPARQL in the not to distant future (this doesn't mean that netizens will be forced to master SPARQL, absolutely not! But there will be conduit technologies that deal with matter).
Side note: I actually last spoke with Adam at the NY Hilton in 2000 (the day I unveiled Virtuoso to the public for the first time, in person). We bumped into each other and I told him about Virtuoso (at the time the big emphasis was SQL to XML and the vocabulary we had chosen re. SQL extension...), and he told me about his departure from Microsoft and the commencement of his new venture (CrossGain prior to his stint at BEA), what struck me even more was his interest in Linux and Open Source (bearing in mind this was about 3 or so week after he departed Microsoft.)
If you are encountering Virtuoso for the first time via this post or Jon's, please make time to read the product history article on the Virtuoso Wiki (which is one of many Virtuoso based applications that make up our soon to be released OpenLink DataSpace offering).
That said, I better go listen to the podcast :-)
]]>I would like to make an important clarification re. the GData Protocol and what is popularly dubbed as "Adam Bosworth's fingerprints." I do not believe in a one solution (a simple one for the sake of simplicity) to a deceptively complex problem. Virtuoso supports Atom 1.0 (syndication only at the current time) and Atom 0.3 (syndication and publication which have been in place for years)."In my fourth Friday podcast we hear from Kingsley Idehen, CEO of OpenLink Software. I wrote about OpenLink's universal database and app server, Virtuoso, back in 2002 and 2003. Earlier this month Virtuoso became the first mature SQL/XML hybrid to make the transition to open source. The latest incarnation of the product also adds SPARQL (a semantic web query language) to its repertoire. ..."
(Via Jon's Radio.)
BTW - the GData Protocol and Atom 1.0 publishing support will be delivered in both the Open Source and Commercial Edition updates to Virtuoso next week (very little work due to what's already in place).
I make the clarification above to eliminate the possibility of assuming mutual exclusivity of my perspective/vison and Adam's (Jon also makes this important point when he speaks about our opinions being on either side of a spectrum/continuum). I simply want to broaden the scope of this discussion. I am a profound believer in the Semantic Web / Data Web vision, and I predict that we will be querying the Googlebase via SPARQL in the not to distant future (this doesn't mean that netizens will be forced to master SPARQL, absolutely not! But there will be conduit technologies that deal with matter).
Side note: I actually last spoke with Adam at the NY Hilton in 2000 (the day I unveiled Virtuoso to the public for the first time, in person). We bumped into each other and I told him about Virtuoso (at the time the big emphasis was SQL to XML and the vocabulary we had chosen re. SQL extension...), and he told me about his departure from Microsoft and the commencement of his new venture (CrossGain prior to his stint at BEA), what struck me even more was his interest in Linux and Open Source (bearing in mind this was about 3 or so week after he departed Microsoft.)
If you are encountering Virtuoso for the first time via this post or Jon's, please make time to read the product history article on the Virtuoso Wiki (which is one of many Virtuoso based applications that make up our soon to be released OpenLink DataSpace offering).
That said, I better go listen to the podcast :-)
]]>To assist with the general understanding of Virtuoso's SPARQL Implementation, we have released an online version of the RDF DAWG SPARQL Test Suite (hosted by a live Virtuoso Demo & Tutorial Instance).
]]>A powerful next generation server product that implements otherwise distinct server functionality within a single server product. Think of Virtuoso as the server software analog of a dual core processor where each core represents a traditional server functionality realm.
The Virtuoso History page tells the whole story.
90% of the aforementioned functionality has been available in Virtuoso since 2000 with the RDF Triple Store being the only 2006 item.
The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).
Simple, there is no value in a product of this magnitude remaining the "best kept secret". That status works well for our competitors, but absolutely works against the legions of new generation developers, systems integrators, and knowledge workers that need to be aware of what is actually achievable today with the right server architecture.
GPL version 2.
Dual licensing.
The Open Source version of Virtuoso includes all of the functionality listed above. While the Virtual Database (distributed heterogeneous join engine) and Replication Engine (across heterogeneous data sources) functionality will only be available in the commercial version.
On SourceForge.
Of course!
Up until this point, the Virtuoso Product Blog has been a covert live demonstration of some aspects of Virtuoso (Content Management). My Personal Blog and the Virtuoso Product Blog are actual Virtuoso instances, and have been so since I started blogging in 2003.
Is There a product Wiki?
Sure! The Virtuoso Product Wiki is also an instance of Virtuoso demonstrating another aspect of the Content Management prowess of Virtuoso.
Yep! Virtuoso Online Documentation is hosted via yet another Virtuoso instance. This particular instance also attempts to demonstrate Free Text search combined with the ability to repurpose well formed content in a myriad of forms (Atom, RSS, RDF, OPML, and OCS).
The Virtuoso Online Tutorial Site has operated as a live demonstration and tutorial portal for a numbers of years. During the same timeframe (circa. 2001) we also assembled a few Screencast style demos (their look feel certainly show their age; updates are in the works).
BTW - We have also updated the Virtuoso FAQ and also released a number of missing Virtuoso White Papers (amongst many long overdue action items).
]]>"Ok, my first attempt at a round-up (in response to Philâs observation of Planetary damage). Thanks to the conference thereâs loads more here than thereâs likely to be subsequent weeks, although itâs still only a fairly random sample and some of the links here are to heaps of other resourcesâ¦
Incidentally, if anyoneâs got a list/links for SemWeb-related blogs that arenât on Planet RDF, Iâd be grateful for a pointer. PS. Ok, I forget⦠are there any blogs that arenât on Daveâs list yet..?
Quote of the week:
In the Semantic Web, it is not the Semantic which is new, it is the Web which is new.
- Chris Welty, IBM (lifted from TimBLâs slides)
I just noticed the article from Dan Zambonini âIs Web 2.0 killing the Semantic Web?â. From my perspective the article shows a misconception that people seems to have around the Semantic Web: the Semantic Web effort itself is not provide applications (like the Web 2.0 meme indicates) - it rather provides standards to interlink applications.
Blog post title of the week:
Alsoâ¦a new threat to Semantic Web developers has been discovered: typhoid!, and the key to the Webâs full potential isâ¦Tetris."
]]>Hot from the Galway sportsdesk:
1. Prize: CONFOTO, appmosphere web applications, Germany
2. Prize: FungalWeb, Concordia University, Canada
3. Prize: Personal Publication Reader, Universität Hannover, Germany
CONFOTO is a browsing and annotation service for conference photos. It combines recent Web trends (tag-based categorization, interactive user interfaces, syndication) with the advantages of Semantic Web platforms (machine-understandable information, an extensible data model, the possibility to mix arbitrary RDF vocabularies).
Congrats bengee!!
(Benjamin had a string of bad luck just prior to the conference, there may still be glitches in the app - ‘my sparql store exploded last week’)
"(Via Raw.)
]]>