There is a lot of scientific material on Virtuoso, but it has not been presented all together in any one place. So I am making here a compilation of the best resources with a paragraph of introduction on each. Some of these are project deliverables from projects under the EU FP7 programme; some are peer-reviewed publications.

For the future, an updated version of this list may be found on the main Virtuoso site.

European Project Deliverables

  • GeoKnow D 2.6.1: Graph Analytics in the DBMS (2015-01-05)

    This introduces the idea of unbundling basic cluster DBMS functionality like cross partition joins and partitioned group by to form a graph processing framework collocated with the data.

  • GeoKnow D2.4.1: Geospatial Clustering and Characteristic Sets (2015-01-06)

    This presents experimental results of structure-aware RDF applied to geospatial data. The regularly structured part of the data goes in tables; the rest is triples/quads. Furthermore, for the first time in the RDF space, physical storage location is correlated to properties of entities, in this case geo location, so that geospatially adjacent items are also likely adjacent in the physical data representation.

  • LOD2 D2.1.5: 500 billion triple BSBM (2014-08-18)

    This presents experimental results on lookup and BI workloads on Virtuoso cluster with 12 nodes, for a total of 3T RAM and 192 cores. This also discusses bulk load, at up to 6M triples/s and specifics of query optimization in scale-out settings.

  • LOD2 D2.6: Parallel Programming in SQL (2012-08-12)

    This discusses ways of making SQL procedures partitioning-aware, so that one can, map-reduce style, send parallel chunks of computation to each partition of the data.

Publications

2015

  • Minh-Duc, Pham, Linnea, P., Erling, O., and Boncz, P.A. "Deriving an Emergent Relational Schema from RDF Data," WWW, 2015.

    This paper shows how RDF is in fact structured and how this structure can be reconstructed. This reconstruction then serves to create a physical schema, reintroducing all the benefits of physical design to the schema-last world. Experiments with Virtuoso show marked gains in query speed and data compactness.

2014

2013

2012

  • Orri Erling: Virtuoso, a Hybrid RDBMS/Graph Column Store. IEEE Data Eng. Bull. (DEBU) 35(1):3-8 (2012)

    This paper introduces the Virtuoso column store architecture and design choices. One design is made to serve both random updates and lookups as well as the big scans where column stores traditionally excel. Examples are given from both TPC-H and the schema-less RDF world.

  • Minh-Duc Pham, Peter A. Boncz, Orri Erling: S3G2: A Scalable Structure-Correlated Social Graph Generator. TPCTC 2012:156-172

    This paper presents the basis of the social network benchmarking technology later used in the LDBC benchmarks.

2011

2009

  • Orri Erling, Ivan Mikhailov: Faceted Views over Large-Scale Linked Data. LDOW 2009

    This paper introduces anytime query answering as an enabling technology for open-ended querying of large data on public service end points. While not every query can be run to completion, partial results can most often be returned within a constrained time window.

  • Orri Erling, Ivan Mikhailov: Virtuoso: RDF Support in a Native RDBMS. Semantic Web Information Management 2009:501-519

    This is a general presentation of how a SQL engine needs to be adapted to serve a run-time typed and schema-less workload.

2008

2007

  • Orri Erling, Ivan Mikhailov: RDF Support in the Virtuoso DBMS. CSSW 2007:59-68

    This is an initial discussion of RDF support in Virtuoso. Most specifics are by now different but this can give a historical perspective.