Details

Kingsley Uyi Idehen
Lexington, United States

Subscribe

Post Categories

Subscribe

E-Mail:

Recent Articles

Display Settings

articles per page.
order.
SPARQL Guide for the Perl Developer

What?

A simple guide usable by any Perl developer seeking to exploit SPARQL without hassles.

Why?

SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.

How?

SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.

Steps:

  1. Determine which SPARQL endpoint you want to access e.g. DBpedia or a local Virtuoso instance (typically: http://localhost:8890/sparql).
  2. If using Virtuoso, and you want to populate its quad store using SPARQL, assign "SPARQL_SPONGE" privileges to user "SPARQL" (this is basic control, more sophisticated WebID based ACLs are available for controlling SPARQL access).

Script:

#
# Demonstrating use of a single query to populate a 
# Virtuoso Quad Store via Perl. 
#

# 
# HTTP URL is constructed accordingly with CSV query results format as the default via mime type.
#

use CGI qw/:standard/;
use LWP::UserAgent;
use Data::Dumper;
use Text::CSV_XS;

sub sparqlQuery(@args) {
  my $query=shift;
  my $baseURL=shift;
  my $format=shift;
	
	%params=(
		"default-graph" => "", "should-sponge" => "soft", "query" => $query,
		"debug" => "on", "timeout" => "", "format" => $format,
		"save" => "display", "fname" => ""
	);
	
	@fragments=();
	foreach $k (keys %params) {
		$fragment="$k=".CGI::escape($params{$k});
		push(@fragments,$fragment);
	}
	$query=join("&", @fragments);
	
	$sparqlURL="${baseURL}?$query";
	
	my $ua = LWP::UserAgent->new;
	$ua->agent("MyApp/0.1 ");
	my $req = HTTP::Request->new(GET => $sparqlURL);
	my $res = $ua->request($req);
	$str=$res->content;
	
	$csv = Text::CSV_XS->new();
	
	foreach $line ( split(/^/, $str) ) {
		$csv->parse($line);
		@bits=$csv->fields();
	  push(@rows, [ @bits ] );
	}
	return \@rows;
}


# Setting Data Source Name (DSN)

$dsn="http://dbpedia.org/resource/DBpedia";

# Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET using the IRI in
# FROM clause as Data Source URL en route to DBMS
# record Inserts.

$query="DEFINE get:soft \"replace\"\n

# Generic (non Virtuoso specific SPARQL
# Note: this will not add records to the 
# DBMS 

SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; 

$data=sparqlQuery($query, "http://localhost:8890/sparql/", "text/csv");

print "Retrieved data:\n";
print Dumper($data);

Output

Retrieved data:
$VAR1 = [
          [
            's',
            'p',
            'o'
          ],
          [
            'http://dbpedia.org/resource/DBpedia',
            'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
            'http://www.w3.org/2002/07/owl#Thing'
          ],
          [
            'http://dbpedia.org/resource/DBpedia',
            'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
            'http://dbpedia.org/ontology/Work'
          ],
          [
            'http://dbpedia.org/resource/DBpedia',
            'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
            'http://dbpedia.org/class/yago/Software106566077'
          ],
...

Conclusion

CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Perl developer that already knows how to use Perl for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.

Related

# PermaLink Comments [3]
01/25/2011 11:05 GMT-0500 Modified: 01/26/2011 18:11 GMT-0500
SPARQL Guide for the Javascript Developer

What?

A simple guide usable by any Javascript developer seeking to exploit SPARQL without hassles.

Why?

SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.

How?

SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.

Steps:

  1. Determine which SPARQL endpoint you want to access e.g. DBpedia or a local Virtuoso instance (typically: http://localhost:8890/sparql).
  2. If using Virtuoso, and you want to populate its quad store using SPARQL, assign "SPARQL_SPONGE" privileges to user "SPARQL" (this is basic control, more sophisticated WebID based ACLs are available for controlling SPARQL access).

Script:

/*
Demonstrating use of a single query to populate a # Virtuoso Quad Store via Javascript. 
*/

/* 
HTTP URL is constructed accordingly with JSON query results format as the default via mime type.
*/

function sparqlQuery(query, baseURL, format) {
	if(!format)
		format="application/json";
	var params={
		"default-graph": "", "should-sponge": "soft", "query": query,
		"debug": "on", "timeout": "", "format": format,
		"save": "display", "fname": ""
	};
	
	var querypart="";
	for(var k in params) {
		querypart+=k+"="+encodeURIComponent(params[k])+"&";
	}
	var queryURL=baseURL + '?' + querypart;
	if (window.XMLHttpRequest) {
  	xmlhttp=new XMLHttpRequest();
  }
  else {
  	xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }
  xmlhttp.open("GET",queryURL,false);
  xmlhttp.send();
  return JSON.parse(xmlhttp.responseText);
}

/*
setting Data Source Name (DSN)
*/

var dsn="http://dbpedia.org/resource/DBpedia";

/*
Virtuoso pragma "DEFINE get:soft "replace" instructs Virtuoso SPARQL engine to perform an HTTP GET using the IRI in FROM clause as Data Source URL with regards to 
DBMS record inserts
*/

var query="DEFINE get:soft \"replace\"\nSELECT DISTINCT * FROM <"+dsn+"> WHERE {?s ?p ?o}"; 
var data=sparqlQuery(query, "/sparql/");

Output

Place the snippet above into the <script/> section of an HTML document to see the query result.

Conclusion

JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Javascript developer that already knows how to use Javascript for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.

Related

# PermaLink Comments [0]
01/21/2011 14:59 GMT-0500 Modified: 01/26/2011 18:10 GMT-0500
SPARQL Guide for the PHP Developer

What?

A simple guide usable by any PHP developer seeking to exploit SPARQL without hassles.

Why?

SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.

How?

SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. PHP.

Steps:

  1. From your command line execute: aptitude search '^PHP26', to verify PHP is in place
  2. Determine which SPARQL endpoint you want to access e.g. DBpedia or a local Virtuoso instance (typically: http://localhost:8890/sparql).
  3. If using Virtuoso, and you want to populate its quad store using SPARQL, assign "SPARQL_SPONGE" privileges to user "SPARQL" (this is basic control, more sophisticated WebID based ACLs are available for controlling SPARQL access).

Script:

#!/usr/bin/env php
<?php
#
# Demonstrating use of a single query to populate a # Virtuoso Quad Store via PHP. 
#

# HTTP URL is constructed accordingly with JSON query results format in mind.

function sparqlQuery($query, $baseURL, $format="application/json")

  {
	$params=array(
		"default-graph" =>  "",
		"should-sponge" =>  "soft",
		"query" =>  $query,
		"debug" =>  "on",
		"timeout" =>  "",
		"format" =>  $format,
		"save" =>  "display",
		"fname" =>  ""
	);

	$querypart="?";	
	foreach($params as $name => $value) 
  {
		$querypart=$querypart . $name . '=' . urlencode($value) . "&";
	}
	
	$sparqlURL=$baseURL . $querypart;
	
	return json_decode(file_get_contents($sparqlURL));
};



# Setting Data Source Name (DSN)
$dsn="http://dbpedia.org/resource/DBpedia";

#Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET
#using the IRI in FROM clause as Data Source URL

$query="DEFINE get:soft \"replace\"
SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; 

$data=sparqlQuery($query, "http://localhost:8890/sparql/");

print "Retrieved data:\n" . json_encode($data);

?>

Output

Retrieved data:
  {"head":
  {"link":[],"vars":["s","p","o"]},
  "results":
		{"distinct":false,"ordered":true,
		"bindings":[
			{"s":
			{"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p":
			{"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o":
			{"type":"uri","value":"http:\/\/www.w3.org\/2002\/07\/owl#Thing"}},
			{"s":
			{"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p":
			{"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o":
			{"type":"uri","value":"http:\/\/dbpedia.org\/ontology\/Work"}},
			{"s":
			{"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p":
			{"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o":
			{"type":"uri","value":"http:\/\/dbpedia.org\/class\/yago\/Software106566077"}},
...

Conclusion

JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a PHP developer that already knows how to use PHP for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.

Related

# PermaLink Comments [0]
01/20/2011 16:25 GMT-0500 Modified: 01/25/2011 10:36 GMT-0500
SPARQL Guide for Python Developer

What?

A simple guide usable by any Python developer seeking to exploit SPARQL without hassles.

Why?

SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.

How?

SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Python.

Steps:

  1. From your command line execute: aptitude search '^python26', to verify Python is in place
  2. Determine which SPARQL endpoint you want to access e.g. DBpedia or a local Virtuoso instance (typically: http://localhost:8890/sparql).
  3. If using Virtuoso, and you want to populate its quad store using SPARQL, assign "SPARQL_SPONGE" privileges to user "SPARQL" (this is basic control, more sophisticated WebID based ACLs are available for controlling SPARQL access).

Script:

#!/usr/bin/env python
#
# Demonstrating use of a single query to populate a # Virtuoso Quad Store via Python. 
#

import urllib, json

# HTTP URL is constructed accordingly with JSON query results format in mind.

def sparqlQuery(query, baseURL, format="application/json"):
	params={
		"default-graph": "",
		"should-sponge": "soft",
		"query": query,
		"debug": "on",
		"timeout": "",
		"format": format,
		"save": "display",
		"fname": ""
	}
	querypart=urllib.urlencode(params)
	response = urllib.urlopen(baseURL,querypart).read()
	return json.loads(response)

# Setting Data Source Name (DSN)
dsn="http://dbpedia.org/resource/DBpedia"

# Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET
# using the IRI in FROM clause as Data Source URL

query="""DEFINE get:soft "replace"
SELECT DISTINCT * FROM <%s> WHERE {?s ?p ?o}""" % dsn 

data=sparqlQuery(query, "http://localhost:8890/sparql/")

print "Retrieved data:\n" + json.dumps(data, sort_keys=True, indent=4)

#
# End

Output

Retrieved data:
{
    "head": {
        "link": [], 
        "vars": [
            "s", 
            "p", 
            "o"
        ]
    }, 
    "results": {
        "bindings": [
            {
                "o": {
                    "type": "uri", 
                    "value": "http://www.w3.org/2002/07/owl#Thing"
                }, 
                "p": {
                    "type": "uri", 
                    "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
                }, 
                "s": {
                    "type": "uri", 
                    "value": "http://dbpedia.org/resource/DBpedia"
                }
            }, 
...

Conclusion

JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Python developer that already knows how to use Python for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.

Related

# PermaLink Comments [0]
01/19/2011 12:13 GMT-0500 Modified: 01/25/2011 10:35 GMT-0500
SPARQL for the Ruby Developer

What?

A simple guide usable by any Ruby developer seeking to exploit SPARQL without hassles.

Why?

SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.

How?

SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Ruby.

Steps:

  1. From your command line execute: aptitude search '^ruby', to verify Ruby is in place
  2. Determine which SPARQL endpoint you want to access e.g. DBpedia or a local Virtuoso instance (typically: http://localhost:8890/sparql).
  3. If using Virtuoso, and you want to populate its quad store using SPARQL, assign "SPARQL_SPONGE" privileges to user "SPARQL" (this is basic control, more sophisticated WebID based ACLs are available for controlling SPARQL access).

Script:

#!/usr/bin/env ruby
#
# Demonstrating use of a single query to populate a # Virtuoso Quad Store. 
#

require 'net/http'
require 'cgi'
require 'csv'

#
# We opt for CSV based output since handling this format is straightforward in Ruby, by default.
# HTTP URL is constructed accordingly with CSV as query results format in mind.

def sparqlQuery(query, baseURL, format="text/csv")
	params={
		"default-graph" => "",
		"should-sponge" => "soft",
		"query" => query,
		"debug" => "on",
		"timeout" => "",
		"format" => format,
		"save" => "display",
		"fname" => ""
	}
	querypart=""
	params.each { |k,v|
		querypart+="#{k}=#{CGI.escape(v)}&"
	}
  
	sparqlURL=baseURL+"?#{querypart}"
	
	response = Net::HTTP.get_response(URI.parse(sparqlURL))

	return CSV::parse(response.body)
	
end

# Setting Data Source Name (DSN)

dsn="http://dbpedia.org/resource/DBpedia"

#Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET
#using the IRI in FROM clause as Data Source URL

query="DEFINE get:soft \"replace\"
SELECT DISTINCT * FROM <#{dsn}> WHERE {?s ?p ?o} "

#Assume use of local installation of Virtuoso 
#otherwise you can change URL to that of a public endpoint
#for example DBpedia: http://dbpedia.org/sparql

data=sparqlQuery(query, "http://localhost:8890/sparql/")

puts "Got data:"
p data

#
# End

Output

Got data:
[["s", "p", "o"], 
  ["http://dbpedia.org/resource/DBpedia", 
   "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", 
   "http://www.w3.org/2002/07/owl#Thing"], 
  ["http://dbpedia.org/resource/DBpedia", 
   "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", 
   "http://dbpedia.org/ontology/Work"], 
  ["http://dbpedia.org/resource/DBpedia", 
   "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", 
   "http://dbpedia.org/class/yago/Software106566077"],
...

Conclusion

CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Ruby developer that already knows how to use Ruby for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.

Related

# PermaLink Comments [1]
01/18/2011 14:48 GMT-0500 Modified: 01/25/2011 10:17 GMT-0500
Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5)

What is SPARQL?

A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).

SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.

Why is it important?

Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.

Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.

How do I use it, generally?

  1. Locate a SPARQL endpoint (DBpedia, LOD Cloud Cache, Data.Gov, URIBurner, others), or;
  2. Install a SPARQL compliant database server (quad or triple store) on your desktop, workgroup server, data center, or cloud (e.g., Amazon EC2 AMI)
  3. Start the database server
  4. Execute SPARQL Queries via the SPARQL endpoint.

How do I use SPARQL with Virtuoso?

What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:

  1. Software Download and Installation
  2. Data Loading from Data Sources exposed at Network Addresses (e.g. HTTP URLs) using very simple methods
  3. Actual SPARQL query execution via SPARQL endpoint.

Installation Steps

  1. Download Virtuoso Open Source or Virtuoso Commercial Editions
  2. Run installer (if using Commercial edition of Windows Open Source Edition, otherwise follow build guide)
  3. Follow post-installation guide and verify installation by typing in the command: virtuoso -? (if this fails check you've followed installation and setup steps, then verify environment variables have been set)
  4. Start the Virtuoso server using the command: virtuoso-start.sh
  5. Verify you have a connection to the Virtuoso Server via the command: isql localhost (assuming you're using default DB settings) or the command: isql localhost:1112 (assuming demo database) or goto your browser and type in: http://<virtuoso-server-host-name>:[port]/conductor (e.g. http://localhost:8889/conductor for default DB or http://localhost:8890/conductor if using Demo DB)
  6. Go to SPARQL endpoint which is typically -- http://<virtuoso-server-host-name>:[port]/sparql
  7. Run a quick sample query (since the database always has system data in place): select distinct * where {?s ?p ?o} limit 50 .

Troubleshooting

  1. Ensure environment settings are set and functional -- if using Mac OS X or Windows, so you don't have to worry about this, just start and stop your Virtuoso server using native OS services applets
  2. If using the Open Source Edition, follow the getting started guide -- it covers PATH and startup directory location re. starting and stopping Virtuoso servers.
  3. Sponging (HTTP GETs against external Data Sources) within SPARQL queries is disabled by default. You can enable this feature by assigning "SPARQL_SPONGE" privileges to user "SPARQL". Note, more sophisticated security exists via WebID based ACLs.

Data Loading Steps

  1. Identify an RDF based structured data source of interest -- a file that contains 3-tuple / triples available at an address on a public or private HTTP based network
  2. Determine the Address (URL) of the RDF data source
  3. Go to your Virtuoso SPARQL endpoint and type in the following SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM <RDFDataSourceURL> WHERE {?s ?p ?o}
  4. All the triples in the RDF resource (data source accessed via URL) will be loaded into the Virtuoso Quad Store (using RDF Data Source URL as the internal quad store Named Graph IRI) as part of the SPARQL query processing pipeline.

Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:

  1. Transformation of data from non RDF data sources (file content, hypermedia resources, web services output etc..) into RDF based 3-tuples (triples)
  2. Cache Invalidation Scheme Construction -- thus, subsequent queries (without the define get:soft "replace" pragma will not be required bar when you forcefully want to override cache).
  3. If you have very large data sources like DBpedia etc. from CKAN, simply use our bulk loader .

SPARQL Endpoint Discovery

Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.

Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:

  1. dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for services instances
  2. dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results in Zone File format

Related

  1. Using HTTP from Ruby -- you can just make SPARQL Protocol URLs re. SPARQL
  2. Using SPARQL Endpoints via Ruby -- Ruby example using DBpedia endpoint
  3. Interactive SPARQL Query By Example (QBE) tool -- provides a graphical user interface (as is common in SQL realm re. query building against RDBMS engines) that works with any SPARQL endpoint
  4. Other methods of loading RDF data into Virtuoso
  5. Virtuoso Sponger -- architecture and how it turns a wide variety of non RDF data sources into SPARQL accessible data
  6. Using OpenLink Data Explorer (ODE) to populate Virtuoso -- locate a resource of interest; click on a bookmarklet or use context menus (if using ODE extensions for Firefox, Safari, or Chrome); and you'll have SPARQL accessible data automatically inserted into your Virtuoso instance.
  7. W3C's SPARQLing Data Access Ingenuity -- an older generic SPARQL introduction post
  8. Collection of SPARQL Query Examples -- GoodRelations (Product Offers), FOAF (Profiles), SIOC (Data Spaces -- Blogs, Wikis, Bookmarks, Feed Collections, Photo Galleries, Briefcase/DropBox, AddressBook, Calendars, Discussion Forums)
  9. Collection of Live SPARQL Queries against LOD Cloud Cache -- simple and advanced queries.
# PermaLink Comments [2]
01/16/2011 02:06 GMT-0500 Modified: 01/19/2011 10:43 GMT-0500
Virtuoso Linked Data Deployment 3-Step

Injecting Linked Data into the Web has been a major pain point for those who seek personal, service, or organization-specific variants of DBpedia. Basically, the sequence goes something like this:

  1. You encounter DBpedia or the LOD Cloud Pictorial.
  2. You look around (typically following your nose from link to link).
  3. You attempt to publish your own stuff.
  4. You get stuck.

The problems typically take the following form:

  1. Functionality confusion about the complementary Name and Address functionality of a single URI abstraction
  2. Terminology confusion due to conflation and over-loading of terms such as Resource, URL, Representation, Document, etc.
  3. Inability to find robust tools with which to generate Linked Data from existing data sources such as relational databases, CSV files, XML, Web Services, etc.

To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.

Step 1 - RDF Data Generation

Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.

Many options allow you to easily and quickly generate RDF data from other data sources:

  • Install the Sponger Bookmarklet for the URIBurner service. Bind this to your own SPARQL-compliant backend RDF database (in this scenario, your local Virtuoso instance), and then Sponge some HTTP-accessible resources.
  • Convert relational DBMS data to RDF using the Virtuoso RDF Views Wizard.
  • Starting with CSV files, you can
    • Place them at an HTTP-accessible location, and use the Virtuoso Sponger to convert them to RDF or;
    • Use the CVS import feature to import their content into Virtuoso's relational data engine; then use the built-in RDF Views Wizard as with other RDBMS data.
  • Starting from XML files, you can
    • Use Virtuoso's inbuilt XSLT-Processor for manual XML to RDF/XML transformation or;
    • Leverage the Sponger Cartridge for GRDDL, if there is a transformation service associated with your XML data source, or;
    • Let the Sponger analyze the XML data source and make a best-effort transformation to RDF.

Step 2 - Linked Data Deployment

Install the Faceted Browser VAD package (fct_dav.vad) which delivers the following:

  1. Faceted Browser Engine UI
  2. Dynamic Hypermedia Resource Generator
    • delivers descriptor resources for every entity (data object) in the Native or Virtual Quad Stores
    • supports a broad array of output formats, including HTML+RDFa, RDF/XML, N3/Turtle, NTriples, RDF-JSON, OData+Atom, and OData+JSON.

Step 3 - Linked Data Consumption & Exploitation

Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --

  1. Load a page like this in your browser: http://<cname>[:<port>]/describe/?uri=<entity-uri>
    • <cname>[:<port>] gets replaced by the host and port of your Virtuoso instance
    • <entity-uri> gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
  2. Follow the links presented in the descriptor page.
  3. If you ever see a blank page with a hyperlink subject name in the About: section at the top of the page, simply add the parameter "&sp=1" to the URL in the browser's Address box, and hit [ENTER]. This will result in an "on the fly" resource retrieval, transformation, and descriptor page generation.
  4. Use the navigator controls to page up and down the data associated with the "in scope" resource descriptor.

Related

# PermaLink Comments [0]
10/29/2010 18:54 GMT-0500 Modified: 11/02/2010 11:57 GMT-0500
Virtuoso Linked Data Deployment In 3 Simple Steps

Injecting Linked Data into the Web has been a major pain point for those who seek personal, service, or organization-specific variants of DBpedia. Basically, the sequence goes something like this:

  1. You encounter DBpedia or the LOD Cloud Pictorial.
  2. You look around (typically following your nose from link to link).
  3. You attempt to publish your own stuff.
  4. You get stuck.

The problems typically take the following form:

  1. Functionality confusion about the complementary Name and Address functionality of a single URI abstraction
  2. Terminology confusion due to conflation and over-loading of terms such as Resource, URL, Representation, Document, etc.
  3. Inability to find robust tools with which to generate Linked Data from existing data sources such as relational databases, CSV files, XML, Web Services, etc.

To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.

Step 1 - RDF Data Generation

Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.

Many options allow you to easily and quickly generate RDF data from other data sources:

  • Install the Sponger Bookmarklet for the URIBurner service. Bind this to your own SPARQL-compliant backend RDF database (in this scenario, your local Virtuoso instance), and then Sponge some HTTP-accessible resources.
  • Convert relational DBMS data to RDF using the Virtuoso RDF Views Wizard.
  • Starting with CSV files, you can
    • Place them at an HTTP-accessible location, and use the Virtuoso Sponger to convert them to RDF or;
    • Use the CVS import feature to import their content into Virtuoso's relational data engine; then use the built-in RDF Views Wizard as with other RDBMS data.
  • Starting from XML files, you can
    • Use Virtuoso's inbuilt XSLT-Processor for manual XML to RDF/XML transformation or;
    • Leverage the Sponger Cartridge for GRDDL, if there is a transformation service associated with your XML data source, or;
    • Let the Sponger analyze the XML data source and make a best-effort transformation to RDF.

Step 2 - Linked Data Deployment

Install the Faceted Browser VAD package (fct_dav.vad) which delivers the following:

  1. Faceted Browser Engine UI
  2. Dynamic Hypermedia Resource Generator
    • delivers descriptor resources for every entity (data object) in the Native or Virtual Quad Stores
    • supports a broad array of output formats, including HTML+RDFa, RDF/XML, N3/Turtle, NTriples, RDF-JSON, OData+Atom, and OData+JSON.

Step 3 - Linked Data Consumption & Exploitation

Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --

  1. Load a page like this in your browser: http://<cname>[:<port>]/describe/?uri=<entity-uri>
    • <cname>[:<port>] gets replaced by the host and port of your Virtuoso instance
    • <entity-uri> gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
  2. Follow the links presented in the descriptor page.
  3. If you ever see a blank page with a hyperlink subject name in the About: section at the top of the page, simply add the parameter "&sp=1" to the URL in the browser's Address box, and hit [ENTER]. This will result in an "on the fly" resource retrieval, transformation, and descriptor page generation.
  4. Use the navigator controls to page up and down the data associated with the "in scope" resource descriptor.

Related

# PermaLink Comments [0]
10/29/2010 18:54 GMT-0500 Modified: 11/02/2010 11:55 GMT-0500
What is Linked Data, really?

Linked Data is simply hypermedia-based structured data.

Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.

The fundamental steps to creating Linked Data are as follows:

  1. Choose a Name Reference Mechanism — i.e., URIs.

  2. Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes

    1. Subjects (also known as Entities)
    2. Subject Attributes (also known as Entity Attributes), and
    3. Attribute Values (also known as Subject Attribute Values or Entity Attribute Values).
  3. Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, OData, OpenGraph, and many others.

  4. Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.

  5. Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:

    1. Identify Subject(s) using Resolvable URI(s).
    2. Identify Subject Attribute(s) using Resolvable URI(s).
    3. Assign Attribute Values to Subject Attributes. These Values may be either Literals (e.g., STRINGs, BLOBs) or Resolvable URIs.

You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more.

Related

  1. Linked Data an Introduction -- simple introduction to Linked Data and its virtues
  2. How Data Makes Corporations Dumb -- Jeff Jonas (IBM) interview
  3. Hypermedia Types -- evolving information portal covering different aspects of Hypermedia resource types
  4. URIBurner -- service that generates Linked Data from a plethora of heterogeneous data sources
  5. Linked Data Meme -- TimbL design issues note about Linked Data
  6. Data 3.0 Manifesto -- note about format agnostic Linked Data
  7. DBpedia -- large Linked Data Hub
  8. Linked Open Data Cloud -- collection of Linked Data Spaces
  9. Linked Open Commerce Cloud -- commerce (clicks & mortar and/or clicks & clicks) oriented Linked Data Space
  10. LOD Cloud Cache -- massive Linked Data Space hosting most of the LOD Cloud Datasets
  11. LOD2 Initiative -- EU Co-Funded Project to develop global knowledge space from LOD
  12. .
# PermaLink Comments [0]
10/14/2010 19:10 GMT-0500 Modified: 11/09/2010 13:53 GMT-0500
What is Linked Data, really?

Linked Data is simply hypermedia-based structured data.

Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.

The fundamental steps to creating Linked Data are as follows:

  1. Choose a Name Reference Mechanism — i.e., URIs.

  2. Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes

    1. Subjects (also known as Entities)
    2. Subject Attributes (also known as Entity Attributes), and
    3. Attribute Values (also known as Subject Attribute Values or Entity Attribute Values).
  3. Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, and OData; there are many others.

  4. Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.

  5. Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:

    1. Identify Subject(s) using Resolvable URI(s).
    2. Identify Subject Attribute(s) using Resolvable URI(s).
    3. Assign Attribute Values to Subject Attributes. These Values may be either Literals (e.g., STRINGs, BLOBs) or Resolvable URIs.

You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more.

Related

  1. Hypermedia Types -- evolving information portal covering different aspects of Hypermedia resource types
  2. URIBurner -- service that generates Linked Data from a plethora of heterogeneous data sources
  3. Linked Data Meme -- TimbL design issues note about Linked Data
  4. Data 3.0 Manifesto -- note about format agnostic Linked Data
  5. DBpedia -- large Linked Data Hub
  6. Linked Open Data Cloud -- collection of Linked Data Spaces
  7. Linked Open Commerce Cloud -- commerce (clicks & mortar and/or clicks & clicks) oriented Linked Data Space
  8. LOD Cloud Cache -- massive Linked Data Space hosting most of the LOD Cloud Datasets
  9. LOD2 Initiative -- EU Co-Funded Project to develop global knowledge space from LOD
  10. .
# PermaLink Comments [4]
10/14/2010 17:54 GMT-0500 Modified: 02/15/2011 17:28 GMT-0500
 <<     | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform
The posts on this weblog are my personal views, and not those of OpenLink Software.