Virtuoso RDF: A Getting Started Guide for the Developer

It is a long standing promise of mine to dispel the false impression that using Virtuoso to work with RDF is complicated.

The purpose of this presentation is to show a programmer how to put RDF into Virtuoso and how to query it. This is done programmatically, with no confusing user interfaces.

You should have a Virtuoso Open Source tree built and installed. We will look at the LUBM benchmark demo that comes with the package. All you need is a Unix shell. Running the shell under emacs (m-x shell) is the best. But the open source isql utility should have command line editing also. The emacs shell is however convenient for cutting and pasting things between shell and files.

To get started, cd into binsrc/tests/lubm.

To verify that this works, you can do

./test_server.sh virtuoso-t

This will test the server with the LUBM queries. This should report 45 tests passed. After this we will do the tests step-by-step.

Loading the Data

The file lubm-load.sql contains the commands for loading the LUBM single university qualification database.

The data files themselves are in lubm_8000, 15 files in RDFXML.

There is also a little ontology called inf.nt. This declares the subclass and subproperty relations used in the benchmark.

So now let's go through this procedure.

Start the server:

$ virtuoso-t -f &

This starts the server in foreground mode, and puts it in the background of the shell.

Now we connect to it with the isql utility.

$ isql 1111 dba dba

This gives a SQL> prompt. The default username and password are both dba.

When a command is SQL, it is entered directly. If it is SPARQL, it is prefixed with the keyword sparql. This is how all the SQL clients work. Any SQL client, such as any ODBC or JDBC application, can use SPARQL if the SQL string starts with this keyword.

The lubm-load.sql file is quite self-explanatory. It begins with defining an SQL procedure that calls the RDF/XML load function, DB..RDF_LOAD_RDFXML, for each file in a directory.

Next it calls this function for the lubm_8000 directory under the server's working directory.

sparql 
   CLEAR GRAPH <lubm>;

sparql 
   CLEAR GRAPH <inf>;

load_lubm ( server_root() || '/lubm_8000/' );

Then it verifies that the right number of triples is found in the <lubm> graph.

sparql 
   SELECT COUNT(*) 
     FROM <lubm> 
    WHERE { ?x ?y ?z } ;

The echo commands below this are interpreted by the isql utility, and produce output to show whether the test was passed. They can be ignored for now.

Then it adds some implied subOrganizationOf triples. This is part of setting up the LUBM test database.

sparql 
   PREFIX  ub:  <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
   INSERT 
      INTO GRAPH <lubm> 
      { ?x  ub:subOrganizationOf  ?z } 
   FROM <lubm> 
   WHERE { ?x  ub:subOrganizationOf  ?y  . 
           ?y  ub:subOrganizationOf  ?z  . 
         };

Then it loads the ontology file, inf.nt, using the Turtle load function, DB.DBA.TTLP. The arguments of the function are the text to load, the default namespace prefix, and the URI of the target graph.

DB.DBA.TTLP ( file_to_string ( 'inf.nt' ), 
              'http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl', 
              'inf' 
            ) ;
sparql 
   SELECT COUNT(*) 
     FROM <inf> 
    WHERE { ?x ?y ?z } ;

Then we declare that the triples in the <inf> graph can be used for inference at run time. To enable this, a SPARQL query will declare that it uses the 'inft' rule set. Otherwise this has no effect.

rdfs_rule_set ('inft', 'inf');

This is just a log checkpoint to finalize the work and truncate the transaction log. The server would also eventually do this in its own time.

checkpoint;

Now we are ready for querying.

Querying the Data

The queries are given in 3 different versions: The first file, lubm.sql, has the queries with most inference open coded as UNIONs. The second file, lubm-inf.sql, has the inference performed at run time using the ontology information in the <inf> graph we just loaded. The last, lubm-phys.sql, relies on having the entailed triples physically present in the <lubm> graph. These entailed triples are inserted by the SPARUL commands in the lubm-cp.sql file.

If you wish to run all the commands in a SQL file, you can type load <filename>; (e.g., load lubm-cp.sql;) at the SQL> prompt. If you wish to try individual statements, you can paste them to the command line.

For example:

SQL> sparql 
   PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
   SELECT * 
     FROM <lubm>
    WHERE { ?x  a                     ub:Publication                                                . 
            ?x  ub:publicationAuthor  <http://www.Department0.University0.edu/AssistantProfessor0> 
          };

VARCHAR
_______________________________________________________________________

http://www.Department0.University0.edu/AssistantProfessor0/Publication0
http://www.Department0.University0.edu/AssistantProfessor0/Publication1
http://www.Department0.University0.edu/AssistantProfessor0/Publication2
http://www.Department0.University0.edu/AssistantProfessor0/Publication3
http://www.Department0.University0.edu/AssistantProfessor0/Publication4
http://www.Department0.University0.edu/AssistantProfessor0/Publication5

6 Rows. -- 4 msec.

To stop the server, simply type shutdown; at the SQL> prompt.

If you wish to use a SPARQL protocol end point, just enable the HTTP listener. This is done by adding a stanza like —

[HTTPServer]
ServerPort    = 8421
ServerRoot    = .
ServerThreads = 2

— to the end of the virtuoso.ini file in the lubm directory. Then shutdown and restart (type shutdown; at the SQL> prompt and then virtuoso-t -f & at the shell prompt).

Now you can connect to the end point with a web browser. The URL is http://localhost:8421/sparql. Without parameters, this will show a human readable form. With parameters, this will execute SPARQL.

We have shown how to load and query RDF with Virtuoso using the most basic SQL tools. Next you can access RDF from, for example, PHP, using the PHP ODBC interface.

To see how to use Jena or Sesame with Virtuoso, look at Native RDF Storage Providers. To see how RDF data types are supported, see Extension datatype for RDF

To work with large volumes of data, you must add memory to the configuration file and use the row-autocommit mode, i.e., do log_enable (2); before the load command. Otherwise Virtuoso will do the entire load as a single transaction, and will run out of rollback space. See documentation for more.

Orri Erling's Weblog

Details

Subscribe

Tag Cloud

Post Categories

Recent Articles

Loading the Data

Querying the Data

Comments

Post Comment

Orri Erling's Weblog

Details

Subscribe

Tag Cloud

Post Categories

Recent Articles

Loading the Data

Querying the Data

Related

Comments

Post Comment