We have a few new features that we did for the WWW 2007 conference that we will be shortly adding to the open source release.

  • Optimization for SQL IN predicate. The IN predicate with a list of values will now use an index if available. This is useful for SPARQL queries with multiple FROM graphs, for example.
  • API for index population estimates. There is an API for getting an approximate count of matches given one or more leading key parts of an index.
  • Row-level autocommit mode – If one updates a huge table and the application does not require transaction isolation, it is possible to do this with an automatic commit after each row. This saves the server from having to keep rollback information on millions and billions of rows and saves it from temporary rollbacks of the uncommitted data for checkpoints etc. These things can completely hang a server if there are a few tens of millions of uncommitted inserts/deletes/updates.
  • 64-bit IDs for IRIs and RDF objects, 64-bit integer data type. With the growth of some RDF databases to the tens of billions of triples, we run out of the 32-bit range for IDs of distinct IRIs. To accommodate this before actually running out, we introduce a longer ID.
  • Some cost model adjustments.
  • SQL extension for producing multiple result set rows from a single table row. This is useful for mapping SPARQL queries like SELECT * FROM graph WHERE {?s ?p ?o} into a UNION of SELECT *’s from multiple tables of different width. Each term of the UNION will simply produce multiple 3 column result rows for each actual row while not having to run through the tables multiple times. Together with this, we have also fixed a number of things with the relational-to-RDF mapping. We have been testing this extensively with the Musicbrainz mapping by Fred Giasson.

These changes are small and to be released shortly.

There are also some larger things in the works, to be released during this summer, the next post gives an overview of these.