Details
Kingsley Uyi Idehen
Lexington, United States
Subscribe
Post Categories
Subscribe
Recent Articles
Display Settings
|
Data Spaces
There is increasing coalescence around the idea that HTTP-based
Linked Data adds a tangible dimension to
the World Wide Web (Web). This
Data
Dimension grants end-users, power-users, integrators, and
developers the ability to experience the Web not solely as a
Information Space or Document
Space, but now also as a Data Space.
Here is a simple What and Why guide covering the essence of Data
Spaces.
What is a Data Space?
A Data Space is a point of presence on a network, where every
Data Object (item or entity) is given a Name (e.g., a
URI) by which it may be Referenced or
Identified.
In a Data Space, every Representation of those Data
Objects (i.e., every Object Representation) has an
Address (e.g., a URL) from which it may be Retrieved (or
"gotten").
In a Data Space, every Object Representation is a time variant
(that is, it changes over time), streamable, and format-agnostic
Resource.
An Object Representation is simply a Description of that Object.
It takes the form of a graph, pictorially constructed from sets of
3 elements which are themselves named Subject,
Predicate, and Object (or SPO); or
Entity, Attribute, and Value (or EAV).
Each Entity+Attribute+Value or
Subject+Predicate+Object set (or triple), is one datum, one
piece of data, one persisted observation about a given Subject or
Entity.
The underlying Schema that defines and constrains the
construction of Object Representations is based on Logic,
specifically First-Order Logic. Each Object Representation
is a collection of persisted observations (Data) about a
given Subject, which aid observers in materializing their
perception (Information), and ultimately comprehension
(Knowledge), of that Subject.
Why are Data Spaces important?
In the real-world -- which is networked by nature -- data is
heterogeneously (or "differently") shaped, and disparately
located.
Data has been increasing at an alarming rate since the advent of
computing; the interWeb simply provides context that makes this reality more
palpable and more exploitable, and in the process virtuously ups
the ante through increasingly exponential growth rates.
We can't stop data heterogeneity; it is endemic to the nature of
its producers -- humans and/or human-directed machines. What we can
do, though, is create a powerful Conceptual-level "bus" or
"interface" for data integration, based on Data Description
oriented Logic rather than Data Representation oriented
Formats. Basically, it's possible for us to use a Common Logic as the basis for
expressing and blending SPO- or EAV-based Object Representations in
a variety of Formats (or "dialects").
The roadmap boils down to:
-
Assigning unambiguous Object Names to:
-
Every record (or, in table terms, every row);
-
Every record attribute (or, in table terms, every field or
column);
-
Every record relationship (that is, every relationship between
one record and another);
-
Every record container (e.g., every table or view in a
relational database, every named graph, every spreadsheet, every
text file, etc.);
-
Making each Object Name resolve to an Address through which
Create, Read, Update, and Delete ("CRUD") operations can be
performed against (can access) the associated Object
Representation graph.
03/01/2011 18:49 GMT-0500 |
Modified: 03/01/2011 17:26
GMT-0500 |
New Preconfigured Virtuoso AMI for Amazon EC2 Cloud comprised of Linked Data from BBC & DBpedia
What?
Introducing a new preloaded and preconfigured Virtuoso (Cluster Edition) AMI for the
Amazon EC2 Cloud that hosts combined Linked Datasets from:
Why?
Predictably instantiate a powerful database with high quality
data and cross links
within minutes, for personal or service specific use.
How?
Simply follow the instructions in our Amazon EC2 guide for the BBC + DBpedia 3.6
Linked Dataset guide.
Your installation steps are as follows:
- Instantiate a Virtuoso EC2 AMI
- Mount the Amazon Elastic Block Storage (EBS) snapshot that
hosts the preloaded Virtuoso Database.
Related
02/18/2011 20:20 GMT-0500 |
Modified: 03/29/2011 09:52
GMT-0500 |
DBpedia + BBC (combined) Linked Data Space Installation Guide
What?
The DBpedia + BBC
Combo Linked Dataset is a preconfigured Virtuoso Cluster (4 Virtuoso Cluster Nodes,
each comprised of one Virtuoso Instance; initial deployment is to a
single Cluster Host, but license may be converted for physically
distributed deployment), available via the Amazon EC2 Cloud,
preloaded with the following datasets:
Why?
The BBC has been publishing Linked Data from its Web Data Space for a number of years. In line
with best practices for injecting Linked Data into the World Wide Web (Web), the BBC datasets are
interlinked with other datasets such as DBpedia and
MusicBrainz.
Typical follow-your-nose exploration using a Web Browser (or
even via sophisticated SPARQL query crawls) isn't always practical
once you get past the initial euphoria that comes from
comprehending the Linked Data concept. As your queries get more
complex, the overhead of remote sub-queries increases its impact,
until query results take so long to return that you simply give
up.
Thus, maximizing the effects of the BBC's efforts requires
Linked Data that shares locality in a Web-accessible Data Space —
i.e., where all Linked Data sets have been loaded into the same
data store or warehouse. This holds true even when leveraging
SPARQL-FED style virtualization — there's always a need to localize
data as part of any marginally-decent locality-aware
cost-optimization algorithm.
This DBpedia + BBC dataset, exposed via a preloaded and
preconfigured Virtuoso Cluster, delivers a practical point of
presence on the Web for immediate and cost-effective exploitation
of Linked Data at the individual and/or service specific
levels.
How?
To work through this guide, you'll need to start with 90 GB of free
disk space. (Only 41 GB will be consumed after you delete the
installer archives, but starting with 90+ GB ensures enough work
space for the installation.)
Install Virtuoso
-
Download Virtuoso installer archive(s). You
must deploy the Personal or Enterprise Edition; the Open Source
Edition does not support Shared-Nothing Cluster Deployment.
-
Obtain a Virtuoso Cluster license.
-
Install Virtuoso.
-
Set key environment variables and start the OpenLink License
Manager, using command (this may vary depending on your shell and
install directory):
.
/opt/virtuoso/virtuoso-enterprise.sh
-
Optional: To keep the default single-server configuration
file and demo database intact, set the VIRTUOSO_HOME
environment variable to a different directory, e.g.,
export
VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Note: You will have to adjust this setting every time
you shift between this cluster setup and your single-server setup.
Either may be made your environment's default through the
virtuoso-enterprise.sh and related scripts.
-
Set up your cluster by running the
mkcluster.sh script. Note that initial deployment of
the DBpedia + BBC Combo requires a 4 node cluster, which is
the default for this script.
-
Start the Virtuoso Cluster with this command:
virtuoso-start.sh
-
Stop the Virtuoso Cluster with this command:
virtuoso-stop.sh
Using the DBpedia + BBC Combo dataset
-
Navigate to your installation directory.
-
Download the combo dataset installer script — bbc-dbpedia-install.sh .
-
For best results, set the downloaded script to fully executable
using this command:
chmod 755
bbc-dbpedia-install.sh
-
Shut down any Virtuoso instances that may be currently
running.
-
Optional: As above, if you have decided to keep the
default single-server configuration file and demo database intact,
set the VIRTUOSO_HOME environment variable
appropriately, e.g.,
export
VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
-
Run the combo dataset installer script with this command:
sh bbc-dbpedia-install.sh
Verify installation
The combo dataset typically deploys to EC2 virtual machines in
under 90 minutes; your time will vary depending on your network
connection speed, machine speed, and other variables.
Once the script completes, perform the following steps:
-
Verify that the Virtuoso Conductor (HTTP-based Admin UI) is in
place via:
http://localhost:[port]/conductor
-
Verify that the Virtuoso SPARQL endpoint is in place via:
http://localhost:[port]/sparql
-
Verify that the Precision Search & Find UI is in place
via:
http://localhost:[port]/fct
-
Verify that the Virtuoso hosted PivotViewer is in place via:
http://localhost:[port]/PivotViewer
Related
02/17/2011 17:15 GMT-0500 |
Modified: 03/29/2011 10:09
GMT-0500 |
SPARQL Guide for the Perl Developer
What?
A simple guide usable by any Perl
developer seeking to exploit SPARQL without hassles.
Why?
SPARQL is a powerful query language, results serialization
format, and an HTTP based data access protocol from
the W3C. It provides a mechanism for accessing and integrating data
across Deductive Database Systems (colloquially
referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems
(or data spaces) that manage proposition oriented records in
3-tuple (triples) or 4-tuple (quads) form.
How?
SPARQL queries are actually HTTP payloads (typically). Thus,
using a RESTful client-server interaction pattern, you can dispatch
calls to a SPARQL compliant data server and receive a payload for
local processing.
Steps:
- Determine which SPARQL endpoint you want to access e.g.
DBpedia or a local Virtuoso instance (typically:
http://localhost:8890/sparql).
- If using Virtuoso, and you want to populate its quad store
using SPARQL, assign "SPARQL_SPONGE" privileges to user
"SPARQL" (this is basic control, more sophisticated WebID based
ACLs are available for controlling SPARQL access).
Script:
#
# Demonstrating use of a single query to populate a
# Virtuoso Quad Store via Perl.
#
#
# HTTP URL is constructed accordingly with CSV query results format as the default via mime type.
#
use CGI qw/:standard/;
use LWP::UserAgent;
use Data::Dumper;
use Text::CSV_XS;
sub sparqlQuery(@args) {
my $query=shift;
my $baseURL=shift;
my $format=shift;
%params=(
"default-graph" => "", "should-sponge" => "soft", "query" => $query,
"debug" => "on", "timeout" => "", "format" => $format,
"save" => "display", "fname" => ""
);
@fragments=();
foreach $k (keys %params) {
$fragment="$k=".CGI::escape($params{$k});
push(@fragments,$fragment);
}
$query=join("&", @fragments);
$sparqlURL="${baseURL}?$query";
my $ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");
my $req = HTTP::Request->new(GET => $sparqlURL);
my $res = $ua->request($req);
$str=$res->content;
$csv = Text::CSV_XS->new();
foreach $line ( split(/^/, $str) ) {
$csv->parse($line);
@bits=$csv->fields();
push(@rows, [ @bits ] );
}
return \@rows;
}
# Setting Data Source Name (DSN)
$dsn="http://dbpedia.org/resource/DBpedia";
# Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET using the IRI in
# FROM clause as Data Source URL en route to DBMS
# record Inserts.
$query="DEFINE get:soft \"replace\"\n
# Generic (non Virtuoso specific SPARQL
# Note: this will not add records to the
# DBMS
SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}";
$data=sparqlQuery($query, "http://localhost:8890/sparql/", "text/csv");
print "Retrieved data:\n";
print Dumper($data);
Output
Retrieved data:
$VAR1 = [
[
's',
'p',
'o'
],
[
'http://dbpedia.org/resource/DBpedia',
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
'http://www.w3.org/2002/07/owl#Thing'
],
[
'http://dbpedia.org/resource/DBpedia',
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
'http://dbpedia.org/ontology/Work'
],
[
'http://dbpedia.org/resource/DBpedia',
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
'http://dbpedia.org/class/yago/Software106566077'
],
...
Conclusion
CSV was chosen over XML (re. output format) since this is about
a "no-brainer installation and utilization" guide for a Perl
developer that already knows how to use Perl for HTTP based data
access within HTML. SPARQL just provides an added bonus to URL
dexterity (delivered via URI abstraction) with regards to
constructing Data Source Names or Addresses.
Related
01/25/2011 11:05 GMT-0500 |
Modified: 01/26/2011 18:11
GMT-0500 |
Virtuoso + DBpedia 3.6 Installation Guide (Update 1)
DBpedia is a community effort to provide a contemporary
deductive database derived from Wikipedia content. Project
contributions can be partitioned as follows:
- Ontology Construction and Maintenance
- Dataset Generation via Wikipedia Content Extraction &
Transformation
- Live Database Maintenance & Administration -- includes
actual Linked Data loading and publishing,
provision of SPARQL endpoint, and traditional DBA
activity
- Internationalization.
Why is DBpedia important?
Comprising the nucleus of the Linked Open Data effort, DBpedia also
serves as a fulcrum for the burgeoning Web of Linked Data
by delivering a dense and highly-interlinked lookup database. In
its most basic form, DBpedia is a great source of strong and
resolvable identifiers for People, Places, Organizations, Subject
Matter, and many other data items of interest. Naturally, it
provides a fantastic starting point for comprehending the
fundamental concepts underlying TimBL's initial Linked Data meme.
How do I use DBpedia?
Depending on your particular requirements, whether personal or
service-specific, DBpedia offers the following:
- Datasets that can be loaded on your deductive database (also
known as triple or quad stores) platform of choice
- Live browsable HTML+RDFa
based entity description pages
- A wide variety of data formats for importing entity description
data into a broad range of existing applications and services
- A SPARQL endpoint allowing ad-hoc querying over HTTP using the
SPARQL query language, and delivering results serialized in a
variety of formats
- A broad variety of tools covering query by example, faceted
browsing, full text search, entity name lookups,
etc.
What is the DBpedia 3.6 + Virtuoso Cluster Edition Combo?
OpenLink Software has preloaded the
DBpedia 3.6 datasets into a preconfigured Virtuoso Cluster Edition
database, and made the package available for easy installation.
Why is the DBpedia+Virtuoso package important?
The DBpedia+Virtuoso package provides a cost-effective option
for personal or service-specific incarnations of DBpedia.
For instance, you may have a service that isn't best-served by
competing with the rest of the world for ad-hoc query time and
resources on the live instance, which itself operates under various
restrictions which enable this ad-hoc query service to be provided
at Web Scale.
Now you can easily commission your own instance and quickly
exploit DBpedia and Virtuoso's database feature set to the max,
powered by your own hardware and network infrastructure.
How do I use the DBpedia+Virtuoso package?
Pre-requisites are simply:
-
Functional Virtuoso Cluster Edition
installation.
-
Virtuoso Cluster Edition License.
- 90 GB of free disk space -- you ultimately only need 43 gigs,
but this our recommended free disk space size pre installation
completion.
To install the Virtuoso Cluster Edition simply perform the
following steps:
-
Download Software.
- Run installer
-
Set key environment variables and start the OpenLink License
Manager, using command (this may vary depending on your shell):
.
/opt/virtuoso/virtuoso-enterprise.sh
- Run the
mkcluster.sh script which defaults to a 4
node cluster
- Set
VIRTUOSO_HOME environment variable -- if you
want to start cluster databases distinct from single server
databases via distinct root directory for database files (one that
isn't adjacent to single-server database directories)
- Start Virtuoso Cluster Edition instances using command:
virtuoso-start.sh
- Stop Virtuoso Cluster Edition instances using command:
virtuoso-stop.sh
To install your personal or service specific edition of DBpedia
simply perform the following steps:
- Navigate to your installation directory
- Download Installer script (
dbpedia-install.sh )
- Set execution mode on script using command:
chmod 755 dbpedia-install.sh
- Shutdown any Virtuoso instances that may be currently
running
- Set your
VIRTUOSO_HOME environment variable, e.g.,
to the current directory, via command (this may vary depending on
your shell):
export VIRTUOSO_HOME=`pwd`
- Run script using command:
sh dbpedia-install.sh
Once the installation completes (approximately 1 hour and 30
minutes from start time), perform the following steps:
- Verify that the Virtuoso Conductor (HTML based Admin UI) is in
place via:
http://localhost:[port]/conductor
- Verify that the Precision Search & Find UI is in place via:
http://localhost:[port]/fct
- Verify that DBpedia's Green Entity Description Pages are in
place via:
http://localhost:[port]/resource/DBpedia
Related
01/24/2011 20:08 GMT-0500 |
Modified: 01/25/2011 14:46
GMT-0500 |
SPARQL Guide for the Javascript Developer
What?
A simple guide usable by any Javascript developer seeking to
exploit SPARQL without hassles.
Why?
SPARQL is a powerful query language, results serialization
format, and an HTTP based data access protocol from
the W3C. It provides a mechanism for accessing and integrating data
across Deductive Database Systems (colloquially
referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems
(or data spaces) that manage proposition oriented records in
3-tuple (triples) or 4-tuple (quads) form.
How?
SPARQL queries are actually HTTP payloads (typically). Thus,
using a RESTful client-server interaction pattern, you can dispatch
calls to a SPARQL compliant data server and receive a payload for
local processing.
Steps:
- Determine which SPARQL endpoint you want to access e.g.
DBpedia or a local Virtuoso instance (typically:
http://localhost:8890/sparql).
- If using Virtuoso, and you want to populate its quad store
using SPARQL, assign "SPARQL_SPONGE" privileges to user
"SPARQL" (this is basic control, more sophisticated WebID based
ACLs are available for controlling SPARQL access).
Script:
/*
Demonstrating use of a single query to populate a # Virtuoso Quad Store via Javascript.
*/
/*
HTTP URL is constructed accordingly with JSON query results format as the default via mime type.
*/
function sparqlQuery(query, baseURL, format) {
if(!format)
format="application/json";
var params={
"default-graph": "", "should-sponge": "soft", "query": query,
"debug": "on", "timeout": "", "format": format,
"save": "display", "fname": ""
};
var querypart="";
for(var k in params) {
querypart+=k+"="+encodeURIComponent(params[k])+"&";
}
var queryURL=baseURL + '?' + querypart;
if (window.XMLHttpRequest) {
xmlhttp=new XMLHttpRequest();
}
else {
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET",queryURL,false);
xmlhttp.send();
return JSON.parse(xmlhttp.responseText);
}
/*
setting Data Source Name (DSN)
*/
var dsn="http://dbpedia.org/resource/DBpedia";
/*
Virtuoso pragma "DEFINE get:soft "replace" instructs Virtuoso SPARQL engine to perform an HTTP GET using the IRI in FROM clause as Data Source URL with regards to
DBMS record inserts
*/
var query="DEFINE get:soft \"replace\"\nSELECT DISTINCT * FROM <"+dsn+"> WHERE {?s ?p ?o}";
var data=sparqlQuery(query, "/sparql/");
Output
Place the snippet above into the <script/> section of an
HTML document to see the query result.
Conclusion
JSON was chosen over XML (re. output format) since this is about
a "no-brainer installation and utilization" guide for a Javascript
developer that already knows how to use Javascript for HTTP based
data access within HTML. SPARQL just provides an added bonus to URL
dexterity (delivered via URI abstraction) with regards to
constructing Data Source Names or Addresses.
Related
01/21/2011 14:59 GMT-0500 |
Modified: 01/26/2011 18:10
GMT-0500 |
SPARQL Guide for the PHP Developer
What?
A simple guide usable by any PHP developer seeking to exploit SPARQL without hassles.
Why?
SPARQL is a powerful query language, results serialization
format, and an HTTP based data access protocol from
the W3C. It provides a mechanism for accessing and integrating data
across Deductive Database Systems (colloquially
referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems
(or data spaces) that manage proposition oriented records in
3-tuple (triples) or 4-tuple (quads) form.
How?
SPARQL queries are actually HTTP payloads (typically). Thus,
using a RESTful client-server interaction pattern, you can dispatch
calls to a SPARQL compliant data server and receive a payload for
local processing e.g. local object binding re. PHP.
Steps:
- From your command line execute: aptitude search '^PHP26', to
verify PHP is in place
- Determine which SPARQL endpoint you want to access e.g.
DBpedia or a local Virtuoso instance (typically:
http://localhost:8890/sparql).
- If using Virtuoso, and you want to populate its quad store
using SPARQL, assign "SPARQL_SPONGE" privileges to user
"SPARQL" (this is basic control, more sophisticated WebID based
ACLs are available for controlling SPARQL access).
Script:
#!/usr/bin/env php
<?php
#
# Demonstrating use of a single query to populate a # Virtuoso Quad Store via PHP.
#
# HTTP URL is constructed accordingly with JSON query results format in mind.
function sparqlQuery($query, $baseURL, $format="application/json")
{
$params=array(
"default-graph" => "",
"should-sponge" => "soft",
"query" => $query,
"debug" => "on",
"timeout" => "",
"format" => $format,
"save" => "display",
"fname" => ""
);
$querypart="?";
foreach($params as $name => $value)
{
$querypart=$querypart . $name . '=' . urlencode($value) . "&";
}
$sparqlURL=$baseURL . $querypart;
return json_decode(file_get_contents($sparqlURL));
};
# Setting Data Source Name (DSN)
$dsn="http://dbpedia.org/resource/DBpedia";
#Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET
#using the IRI in FROM clause as Data Source URL
$query="DEFINE get:soft \"replace\"
SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}";
$data=sparqlQuery($query, "http://localhost:8890/sparql/");
print "Retrieved data:\n" . json_encode($data);
?>
Output
Retrieved data:
{"head":
{"link":[],"vars":["s","p","o"]},
"results":
{"distinct":false,"ordered":true,
"bindings":[
{"s":
{"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p":
{"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o":
{"type":"uri","value":"http:\/\/www.w3.org\/2002\/07\/owl#Thing"}},
{"s":
{"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p":
{"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o":
{"type":"uri","value":"http:\/\/dbpedia.org\/ontology\/Work"}},
{"s":
{"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p":
{"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o":
{"type":"uri","value":"http:\/\/dbpedia.org\/class\/yago\/Software106566077"}},
...
Conclusion
JSON was chosen over XML (re. output format) since this is about
a "no-brainer installation and utilization" guide for a PHP
developer that already knows how to use PHP for HTTP based data
access. SPARQL just provides an added bonus to URL dexterity
(delivered via URI abstraction) with regards to constructing Data
Source Names or Addresses.
Related
01/20/2011 16:25 GMT-0500 |
Modified: 01/25/2011 10:36
GMT-0500 |
SPARQL Guide for Python Developer
What?
A simple guide usable by any Python developer seeking to exploit
SPARQL without hassles.
Why?
SPARQL is a powerful query language, results serialization
format, and an HTTP based data access protocol from
the W3C. It provides a mechanism for accessing and integrating data
across Deductive Database Systems (colloquially
referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems
(or data spaces) that manage proposition oriented records in
3-tuple (triples) or 4-tuple (quads) form.
How?
SPARQL queries are actually HTTP payloads (typically). Thus,
using a RESTful client-server interaction pattern, you can dispatch
calls to a SPARQL compliant data server and receive a payload for
local processing e.g. local object binding re. Python.
Steps:
- From your command line execute: aptitude search '^python26', to
verify Python is in place
- Determine which SPARQL endpoint you want to access e.g.
DBpedia or a local Virtuoso instance (typically:
http://localhost:8890/sparql).
- If using Virtuoso, and you want to populate its quad store
using SPARQL, assign "SPARQL_SPONGE" privileges to user
"SPARQL" (this is basic control, more sophisticated WebID based
ACLs are available for controlling SPARQL access).
Script:
#!/usr/bin/env python
#
# Demonstrating use of a single query to populate a # Virtuoso Quad Store via Python.
#
import urllib, json
# HTTP URL is constructed accordingly with JSON query results format in mind.
def sparqlQuery(query, baseURL, format="application/json"):
params={
"default-graph": "",
"should-sponge": "soft",
"query": query,
"debug": "on",
"timeout": "",
"format": format,
"save": "display",
"fname": ""
}
querypart=urllib.urlencode(params)
response = urllib.urlopen(baseURL,querypart).read()
return json.loads(response)
# Setting Data Source Name (DSN)
dsn="http://dbpedia.org/resource/DBpedia"
# Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET
# using the IRI in FROM clause as Data Source URL
query="""DEFINE get:soft "replace"
SELECT DISTINCT * FROM <%s> WHERE {?s ?p ?o}""" % dsn
data=sparqlQuery(query, "http://localhost:8890/sparql/")
print "Retrieved data:\n" + json.dumps(data, sort_keys=True, indent=4)
#
# End
Output
Retrieved data:
{
"head": {
"link": [],
"vars": [
"s",
"p",
"o"
]
},
"results": {
"bindings": [
{
"o": {
"type": "uri",
"value": "http://www.w3.org/2002/07/owl#Thing"
},
"p": {
"type": "uri",
"value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
},
"s": {
"type": "uri",
"value": "http://dbpedia.org/resource/DBpedia"
}
},
...
Conclusion
JSON was chosen over XML (re. output format) since this is about
a "no-brainer installation and utilization" guide for a Python
developer that already knows how to use Python for HTTP based data
access. SPARQL just provides an added bonus to URL dexterity
(delivered via URI abstraction) with regards to constructing Data
Source Names or Addresses.
Related
01/19/2011 12:13 GMT-0500 |
Modified: 01/25/2011 10:35
GMT-0500 |
SPARQL for the Ruby Developer
What?
A simple guide usable by any Ruby developer seeking to exploit SPARQL without hassles.
Why?
SPARQL is a powerful query language, results serialization
format, and an HTTP based data access protocol from
the W3C. It provides a mechanism for accessing and integrating data
across Deductive Database Systems (colloquially
referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems
(or data spaces) that manage proposition oriented records in
3-tuple (triples) or 4-tuple (quads) form.
How?
SPARQL queries are actually HTTP payloads (typically). Thus,
using a RESTful client-server interaction pattern, you can dispatch
calls to a SPARQL compliant data server and receive a payload for
local processing e.g. local object binding re. Ruby.
Steps:
- From your command line execute: aptitude search '^ruby', to
verify Ruby is in place
- Determine which SPARQL endpoint you want to access e.g.
DBpedia or a local Virtuoso instance (typically:
http://localhost:8890/sparql).
- If using Virtuoso, and you want to populate its quad store
using SPARQL, assign "SPARQL_SPONGE" privileges to user
"SPARQL" (this is basic control, more sophisticated WebID based
ACLs are available for controlling SPARQL access).
Script:
#!/usr/bin/env ruby
#
# Demonstrating use of a single query to populate a # Virtuoso Quad Store.
#
require 'net/http'
require 'cgi'
require 'csv'
#
# We opt for CSV based output since handling this format is straightforward in Ruby, by default.
# HTTP URL is constructed accordingly with CSV as query results format in mind.
def sparqlQuery(query, baseURL, format="text/csv")
params={
"default-graph" => "",
"should-sponge" => "soft",
"query" => query,
"debug" => "on",
"timeout" => "",
"format" => format,
"save" => "display",
"fname" => ""
}
querypart=""
params.each { |k,v|
querypart+="#{k}=#{CGI.escape(v)}&"
}
sparqlURL=baseURL+"?#{querypart}"
response = Net::HTTP.get_response(URI.parse(sparqlURL))
return CSV::parse(response.body)
end
# Setting Data Source Name (DSN)
dsn="http://dbpedia.org/resource/DBpedia"
#Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET
#using the IRI in FROM clause as Data Source URL
query="DEFINE get:soft \"replace\"
SELECT DISTINCT * FROM <#{dsn}> WHERE {?s ?p ?o} "
#Assume use of local installation of Virtuoso
#otherwise you can change URL to that of a public endpoint
#for example DBpedia: http://dbpedia.org/sparql
data=sparqlQuery(query, "http://localhost:8890/sparql/")
puts "Got data:"
p data
#
# End
Output
Got data:
[["s", "p", "o"],
["http://dbpedia.org/resource/DBpedia",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"http://www.w3.org/2002/07/owl#Thing"],
["http://dbpedia.org/resource/DBpedia",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"http://dbpedia.org/ontology/Work"],
["http://dbpedia.org/resource/DBpedia",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"http://dbpedia.org/class/yago/Software106566077"],
...
Conclusion
CSV was chosen over XML (re. output
format) since this is about a "no-brainer installation and
utilization" guide for a Ruby developer that already knows how to
use Ruby for HTTP based data access. SPARQL just provides an added
bonus to URL dexterity (delivered via URI abstraction) with regards
to constructing Data Source Names or Addresses.
Related
01/18/2011 14:48 GMT-0500 |
Modified: 01/25/2011 10:17
GMT-0500 |
Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5)
A declarative query language from the W3C for querying
structured propositional data (in the form of
3-tuple [triples] or 4-tuple [quads] records)
stored in a deductive database (colloquially referred
to as triple or quad stores in Semantic Web and Linked Data parlance).
SPARQL is inherently platform independent. Like SQL, the query language and the backend
database engine are distinct. Database clients capture SPARQL
queries which are then passed on to compliant backend
databases.
Why is it important?
Like SQL for relational databases, it provides a powerful
mechanism for accessing and joining data across one or more data
partitions (named graphs identified by IRIs). The aforementioned
capability also enables the construction of sophisticated Views,
Reports (HTML or those produced in native form by desktop
productivity tools), and data streams for other services.
Unlike SQL, SPARQL includes result serialization formats and an
HTTP based wire protocol. Thus, the ubiquity and sophistication of
HTTP is integral to SPARQL i.e., client side applications (user
agents) only need to be able to perform an HTTP GET against a
URL en route to exploiting the power of
SPARQL.
How do I use it, generally?
- Locate a SPARQL endpoint (DBpedia, LOD Cloud
Cache, Data.Gov, URIBurner, others), or;
- Install a SPARQL compliant database server (quad or triple
store) on your desktop, workgroup server, data center, or cloud
(e.g., Amazon EC2 AMI)
- Start the database server
- Execute SPARQL Queries via the SPARQL
endpoint.
How do I use SPARQL with Virtuoso?
What follows is a very simple guide for using SPARQL against
your own instance of Virtuoso:
- Software Download and Installation
- Data Loading from Data Sources exposed at Network Addresses
(e.g. HTTP URLs) using very simple methods
- Actual SPARQL query execution via SPARQL endpoint.
Installation Steps
- Download Virtuoso Open Source or Virtuoso Commercial Editions
- Run installer (if using Commercial edition of Windows Open
Source Edition, otherwise follow build guide)
- Follow post-installation guide and verify installation by
typing in the command: virtuoso -? (if this fails check you've
followed installation and setup steps, then verify environment
variables have been set)
- Start the Virtuoso server using the command:
virtuoso-start.sh
- Verify you have a connection to the Virtuoso Server via the
command: isql localhost (assuming you're using default DB settings)
or the command: isql localhost:1112 (assuming demo database) or
goto your browser and type in:
http://<virtuoso-server-host-name>:[port]/conductor (e.g.
http://localhost:8889/conductor for default DB or
http://localhost:8890/conductor if using Demo DB)
- Go to SPARQL endpoint which is typically --
http://<virtuoso-server-host-name>:[port]/sparql
- Run a quick sample query (since the database always has system
data in place): select distinct * where {?s ?p ?o} limit 50 .
Troubleshooting
- Ensure environment settings are set and functional -- if using
Mac OS X or Windows, so you don't have to worry about this, just
start and stop your Virtuoso server using native OS services
applets
- If using the Open Source Edition, follow the getting started guide -- it covers PATH
and startup directory location re. starting and stopping Virtuoso
servers.
- Sponging (HTTP GETs against external Data Sources) within
SPARQL queries is disabled by default. You can enable this feature
by assigning "SPARQL_SPONGE" privileges to user
"SPARQL". Note, more sophisticated security exists via WebID based ACLs.
Data Loading Steps
- Identify an RDF based structured data source of interest -- a
file that contains 3-tuple / triples available at an address on a
public or private HTTP based network
- Determine the Address (URL) of the RDF data source
- Go to your Virtuoso SPARQL endpoint and type in the following
SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM
<RDFDataSourceURL> WHERE {?s ?p ?o}
- All the triples in the RDF resource (data source accessed via
URL) will be loaded into the Virtuoso Quad Store (using RDF Data
Source URL as the internal quad store Named Graph IRI) as part of
the SPARQL query processing pipeline.
Note: the data source URL doesn't even have to be RDF based --
which is where the Virtuoso Sponger Middleware comes into play
(download and install the VAD installer package first) since it
delivers the following features to Virtuoso's SPARQL engine:
- Transformation of data from non RDF data sources (file content,
hypermedia resources, web services
output etc..) into RDF based 3-tuples (triples)
- Cache Invalidation Scheme Construction -- thus, subsequent
queries (without the define get:soft "replace" pragma will not be
required bar when you forcefully want to override cache).
- If you have very large data sources like DBpedia etc. from
CKAN, simply use our bulk loader .
SPARQL Endpoint Discovery
Public SPARQL endpoints are emerging at an ever increasing rate.
Thus, we've setup up a DNS lookup service that provides access to a
large number of SPARQL endpoints. Of course, this doesn't cover all
existing endpoints, so if our endpoint is missing please ping
me.
Here are a collection of commands for using DNS-SD to discover
SPARQL endpoints:
- dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for
services instances
- dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results
in Zone File format
Related
-
Using HTTP from Ruby -- you can just
make SPARQL Protocol URLs re. SPARQL
-
Using SPARQL Endpoints via Ruby -- Ruby
example using DBpedia endpoint
-
Interactive SPARQL Query By Example (QBE)
tool -- provides a graphical user interface (as is common in
SQL realm re. query building against RDBMS engines) that works with any
SPARQL endpoint
-
Other methods of loading RDF data into
Virtuoso
-
Virtuoso Sponger -- architecture and how
it turns a wide variety of non RDF data sources into SPARQL
accessible data
-
Using OpenLink Data Explorer (ODE) to
populate Virtuoso -- locate a resource of interest; click on a
bookmarklet or use context menus (if using ODE extensions for
Firefox, Safari, or Chrome); and you'll have SPARQL accessible data
automatically inserted into your Virtuoso instance.
-
W3C's SPARQLing Data Access Ingenuity --
an older generic SPARQL introduction post
-
Collection of SPARQL Query Examples --
GoodRelations (Product Offers), FOAF (Profiles), SIOC
(Data Spaces -- Blogs, Wikis, Bookmarks, Feed Collections, Photo Galleries, Briefcase/DropBox, AddressBook, Calendars, Discussion Forums)
-
Collection of Live SPARQL Queries against LOD
Cloud Cache -- simple and advanced queries.
01/16/2011 02:06 GMT-0500 |
Modified: 01/19/2011 10:43
GMT-0500 |
|
|