Last time I said we had extended SPARQL for sub-queries. As a preview of the new functionality, let us look at a query from TPC H.

Below is the Virtuoso SPARQL version of Q2.

sparql
define sql:signal-void-variables 1
prefix tpcd: <http://www.openlinksw.com/schemas/tpcd#>
prefix oplsioc: <http://www.openlinksw.com/schemas/oplsioc#>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
select
  ?supp+>tpcd:acctbal,
  ?supp+>tpcd:name,
  ?supp+>tpcd:has_nation+>tpcd:name as ?nation_name,
  ?part+>tpcd:partkey,
  ?part+>tpcd:mfgr,
  ?supp+>tpcd:address,
  ?supp+>tpcd:phone,
  ?supp+>tpcd:comment
from <http://example.com/tpcd>
where {
  ?ps a tpcd:partsupp ; tpcd:has_supplier ?supp ; tpcd:has_part ?part .
  ?supp+>tpcd:has_nation+>tpcd:has_region tpcd:name 'EUROPE' .
  ?part tpcd:size 15 .
  ?ps tpcd:supplycost ?minsc .
  { select ?p min(?ps+>tpcd:supplycost) as ?minsc
    where {
        ?ps a tpcd:partsupp ; tpcd:has_part ?p ; tpcd:has_supplier ?ms .
        ?ms+>tpcd:has_nation+>tpcd:has_region tpcd:name 'EUROPE' .
      }
  }
    filter (?part+>tpcd:type like '%BRASS') }
order by
  desc (?supp+>tpcd:acctbal)
  ?supp+>tpcd:has_nation+>tpcd:name
  ?supp+>tpcd:name
  ?part+>tpcd:partkey ;

Note the pattern { ?ms+>tpcd:has_nation+>tpcd:has_region tpcd:name 'EUROPE' } which is a shorthand for { ?ms tpcd:has_nation ?t1 . ?t1 tpcd:has-region ?t2 . ?t2 tpcd:has_region ?t3 . ?t3 tpcd:name "EUROPE" }

Also note a sub-query is used for determining the lowest supply cost for a part.

The SQL text of the query can be found in the TPC H benchmark specification, reproduced below:

select s_acctbal, s_name, n_name,
        p_partkey, p_mfgr, s_address,
        s_phone, s_comment
from part, supplier, partsupp, nation, region
where
        p_partkey = ps_partkey
        and s_suppkey = ps_suppkey
        and p_size = 15
        and p_type like '%BRASS'
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'EUROPE'
        and ps_supplycost = (
                        select min(ps_supplycost)
                        from partsupp, supplier, nation, region
                        where
                                p_partkey = ps_partkey
                                and s_suppkey = ps_suppkey
                                and s_nationkey = n_nationkey
                                and n_regionkey = r_regionkey
                                and r_name = 'EUROPE')
order by
        s_acctbal desc, n_name, s_name, p_partkey;

For brevity we have omitted the declarations for mapping the TPC H schema to its RDF equivalent. The mapping is straightforward, with each column mapping to a predicate and each table to a class.

This is now part of the next Virtuoso Open Source cut, due around next week.

As of this writing we are going through the TPC H query by query and testing with mapping going to Virtuoso and Oracle databases.

Also we have been busy measuring Virtuoso 6. Even after switching from 32-bit to 64-bit IDs for IRIs and objects, the new databases are about half the size of the same Virtuoso 5.0.2 databases. This does not include any stream compression like gzip for disk pages. The load and query speeds are higher because of better working set. For all in memory, they are about even with 5.0.2. So now on an 8G box, we load 1067 million LUBM triples at 39.7 Kt/s instead of 29 Kt/s with 5.0.2. Right now we experimenting with clusters at Amazon EC2. We'll write about that in a bit.