SPARQL Guide for the Perl Developer

What?

A simple guide usable by any Perl developer seeking to exploit SPARQL without hassles.

Why?

SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.

How?

SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.

Steps:

Determine which SPARQL endpoint you want to access e.g. DBpedia or a local Virtuoso instance (typically: http://localhost:8890/sparql).
If using Virtuoso, and you want to populate its quad store using SPARQL, assign "SPARQL_SPONGE" privileges to user "SPARQL" (this is basic control, more sophisticated WebID based ACLs are available for controlling SPARQL access).

Script:

#
# Demonstrating use of a single query to populate a 
# Virtuoso Quad Store via Perl. 
#

# 
# HTTP URL is constructed accordingly with CSV query results format as the default via mime type.
#

use CGI qw/:standard/;
use LWP::UserAgent;
use Data::Dumper;
use Text::CSV_XS;

sub sparqlQuery(@args) {
  my $query=shift;
  my $baseURL=shift;
  my $format=shift;
	
	%params=(
		"default-graph" => "", "should-sponge" => "soft", "query" => $query,
		"debug" => "on", "timeout" => "", "format" => $format,
		"save" => "display", "fname" => ""
	);
	
	@fragments=();
	foreach $k (keys %params) {
		$fragment="$k=".CGI::escape($params{$k});
		push(@fragments,$fragment);
	}
	$query=join("&", @fragments);
	
	$sparqlURL="${baseURL}?$query";
	
	my $ua = LWP::UserAgent->new;
	$ua->agent("MyApp/0.1 ");
	my $req = HTTP::Request->new(GET => $sparqlURL);
	my $res = $ua->request($req);
	$str=$res->content;
	
	$csv = Text::CSV_XS->new();
	
	foreach $line ( split(/^/, $str) ) {
		$csv->parse($line);
		@bits=$csv->fields();
	  push(@rows, [ @bits ] );
	}
	return \@rows;
}


# Setting Data Source Name (DSN)

$dsn="http://dbpedia.org/resource/DBpedia";

# Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET using the IRI in
# FROM clause as Data Source URL en route to DBMS
# record Inserts.

$query="DEFINE get:soft \"replace\"\n

# Generic (non Virtuoso specific SPARQL
# Note: this will not add records to the 
# DBMS 

SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; 

$data=sparqlQuery($query, "http://localhost:8890/sparql/", "text/csv");

print "Retrieved data:\n";
print Dumper($data);

Output

Retrieved data:
$VAR1 = [
          [
            's',
            'p',
            'o'
          ],
          [
            'http://dbpedia.org/resource/DBpedia',
            'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
            'http://www.w3.org/2002/07/owl#Thing'
          ],
          [
            'http://dbpedia.org/resource/DBpedia',
            'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
            'http://dbpedia.org/ontology/Work'
          ],
          [
            'http://dbpedia.org/resource/DBpedia',
            'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
            'http://dbpedia.org/class/yago/Software106566077'
          ],
...

Conclusion

CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Perl developer that already knows how to use Perl for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.

Comments

Re:SPARQL Guide for the Perl Developer

Hm, that doesn't look like "without hassle" to me :-) There is a lot of RDF code on CPAN already. RDF::Trine and RDF::Query(both by Gregory Todd Williams of SPARQL WG fame) are de-facto standard modules providing low-level APIs. To interact with a Virtuoso-based endpoint, one should most likely use RDF::Query::Client, by Toby Inkster

However, I have long toyed with the idea of making Virtuoso a first-class storage citizen in the Perl world by creating a DBD::ODBC-based driver for RDF::Trine. It makes a lot of sense to do so, but time hasn't permitted so far. Kingsley, if you have any resources to commit to doing such a thing, please let us know!

There is also an active community mailing list that I would like to encourage people to subscribe to.

Posted by Kjetil Kjernsmo on 01/25/2011 12:50 GMT-0500

Kjetil Kjernsmo wrote:

<<<

As for "no hassles" take that as being subjectively targeted at a Perl developer that's stumbled across SPARQL. This individual knows how to request data via an HTTP URL and process retrieved data using Perl.

Of course, RDF::Query::Client is a good offering too, I'll add links to my related section as I've done others as examples emerge.

Yes, re. resource contribution re. DBD:ODBC and RDF::Trine.

Happy New Year! It's been a while.

Posted by Kingsley Idehen on 01/25/2011 13:53 GMT-0500

You say "if using Virtuoso", but I suspect this example would *only* work with Virtuoso because of the "DEFINE get:soft" stuff .That might be worth mentioning in case someone wanted to access an endpoint that wasn't running Virtuoso.

Posted by Gregory Todd Williams on 01/26/2011 16:55 GMT-0500

Comments URL for this entry: http://www.openlinksw.com/mt-tb/Http/comments?id=1655

Kingsley Idehen's Blog Data Space

Details

Subscribe

Tag Cloud

Post Categories