Below is a questionnaire I sent to the BSBM participants in order to get tuning instructions for the runs we were planning. I have filled in the answers for Virtuoso, here. This can be a checklist for pretty much any RDF database tuning.

  1. Threading - What settings should be used (e.g., for query parallelization, I/O parallelization [e.g., prefetch, flush of dirty], thread pools [e,.g. web server], any other thread related)? We will run with 8 and 32 cores, so if there are settings controlling number of read/write (R/W) locks or mutexes or such for serializing diverse things, these should be set accordingly to minimize contention.

    The following three settings are all in the [Parameters] section of the virtuoso.ini file.

    • AsyncQueueMaxThreads controls the size of a pool of extra threads that can be used for query parallelization. This should be set to either 1.5 * the number of cores or 1.5 * the number of core threads; see which works better.

    • ThreadsPerQuery is the maximum number of threads a single query will take. This should be set to either the number of cores or the number of core threads; see which works better.

    • IndexTreeMaps is the number of mutexes over which control for buffering an index tree is split. This can generally be left at default (256 in normal operation; valid settings are powers of 2 from 2 to 1024), but setting to 64, 128, or 512 may be beneficial.

      A low number will lead to frequent contention; upwards of 64 will have little contention. We have sometimes seen a multiuser workload go 10% faster when setting this to 64 (down from 256), which seems counter-intuitive. This may be a cache artifact.

    In the [HTTPServer] section of the virtuoso.ini file, the ServerThreads setting is the number of web server threads, i.e., the maximum number of concurrent SPARQL protocol requests. Having a value larger than the number of concurrent clients is OK; for large numbers of concurrent clients a lower value may be better, which will result in requests waiting for a thread to be available.

    Note — The [HTTPServer] ServerThreads are taken from the total pool made available by the [Parameters] ServerThreads. Thus, the [Parameters] ServerThreads should always be at least as large as (and is best set greater than) the [HTTPServer] ServerThreads, and if using the closed-source Commercial Version, [Parameters] ServerThreads cannot exceed the licensed thread count.

  2. File layout - Are there settings for striping over multiple devices? Settings for other file access parallelism? Settings for SSDs (e.g., SSD based cache of hot set of larger db files on disk)? The target config is for 4 independent disks and 4 independent SSDs. If you depend on RAID, are there settings for this? If you need RAID to be set up, please provide the settings/script for doing this with 4 SSDs on Linux (RH and Debian). This will be software RAID, as we find the hardware RAID to be much worse than an independent disk setup on the system in question.

    It is best to stripe database files over all available disks, and to not use RAID. If RAID is desired, then stripe database files across many RAID sets. Use the segment declaration in the virtuoso.ini file. It is very important to give each independently seekable device its own I/O queue thread. See the documentation on the TPC-C sample for examples.

    in the [Parameters] section of the virtuoso.ini file, set FDsPerFile to be (the number of concurrent threads * 1.5) ÷ the number of distinct database files.

    There are no SSD specific settings.

  3. Loading - How many parallel streams work best? We are looking for non-transactional bulk load, with no inference materialization. For partitioned cluster settings, do we divide the load streams over server processes?

    Use one stream per core (not per core thread). In the case of a cluster, divide load streams evenly across all processes. The total number of streams on a cluster can equal the total number of cores; adjust up or down depending on what is observed.

    Use the built-in bulk load facility, i.e.,

    ld_dir ('<source-filename-or-directory>', '<file name pattern>', '<destination graph iri>');

    For example,

    SQL> ld_dir ('/path/to/files', '*.n3', 'http://dbpedia.org');

    Then do a rdf_loader_run () on enough connections. For example, you can use the shell command

    isql rdf_loader_run () &

    to start one in a background isql process. When starting background load commands from the shell, you can use the shell wait command to wait for completion. If starting from isql, use the wait_for_children; command (see isql documentation for details).

    See the BSBM disclosure report for an example load script.

  4. What command should be used after non-transactional bulk load, to ensure a consistent persistent state on disk, like a log checkpoint or similar? Load and checkpoint will be timed separately, load being CPU-bound and checkpoint being I/O-bound. No roll-forward log or similar is required; the load does not have to recover if it fails before the checkpoint.

    Execute

    CHECKPOINT;

    through a SQL client, e.g., isql. This is not a SPARQL statement and cannot be executed over the SPARQL protocol.

  5. What settings should be used for trickle load of small triple sets into a pre-existing graph? This should be as transactional as supported; at least there should be a roll forward log, unlike the case for the bulk load.

    No special settings are needed for load testing; defaults will produce transactional behavior with a roll forward log. Default transaction isolation is REPEATABLE READ, but this may be altered via SQL session settings or at Virtuoso server start-up through the [Parameters] section of the virtuoso.ini file, with

    DefaultIsolation = 4

    Transaction isolation cannot be set over the SPARQL protocol.

    NOTE: When testing full CRUD operations, other isolation settings may be preferable, due to ACID considerations. See answer #12, below, and detailed discussion in part 8 of this series, BSBM Explore and Update.

  6. What settings control allocation of memory for database caching? We will be running mostly from memory, so we need to make sure that there is enough memory configured.

    In the [Parameters] section of the virtuoso.ini file, NumberOfBuffers controls the amount of RAM used by Virtuoso to cache database files. One buffer caches an 8KB database page. In practice, count 10KB of memory per page. If "swappiness" on Linux is low (e.g., 2), two-thirds or more of physical memory can be used for database buffers. If swapping occurs, decrease the setting.

  7. What command gives status on memory allocation (e.g., number of buffers, number of dirty buffers, etc.) so that we can verify that things are indeed in server memory and not, for example, being served from OS disk cache. If the cached format is different from the disk layout (e.g., decompression after disk read), is there a command for space statistics for database cache?

    In an isql session, execute

    STATUS ( ? ? );

    The second result paragraph gives counts of total, used, and dirty buffers. If used buffers is steady and less than total, and if the disk read count on the line below does not increase, the system is running from memory. The cached format is the same as the disk based format.

  8. What command gives information on disk allocation for different things? We are looking for the total size of allocated database pages for quads (including table, indices, anything else associated with quads) and dictionaries for literals, IRI names, etc. If there is a text index on literals, what command gives space stats for this? We count used pages, excluding any preallocated unused pages or other gaps. There is one number for quads and another for the dictionaries or other such structures, optionally a third for text index.

    Execute on an isql session:

    CHECKPOINT;
    SELECT TOP 20 * FROM sys_index_space_stats ORDER BY iss_pages DESC;
    

    The iss_pages column is the total pages for each index, including blob pages. Pages are 8KB. Only used pages are reported, gaps and unused pages are not counted. The rows pertaining to RDF_QUAD are for quads; RDF_IRI, RDF_PREFIX, RO_START, RDF_OBJ are for dictionaries; RDF_OBJ_RO_FLAGS_WORDS and VTLOG_DB_DBA_RDF_OBJ are for text index.

  9. If there is a choice between triples and quads, we will run with quads. How do we ascertain that the run is with quads? How do we find out the index scheme? Should be use an alternate index scheme? Most of the data will be in a single big graph.

    The default scheme uses quads. The default index layout is PSOG, POGS, GS, SP, OP. To see the current index scheme, use an isql session to execute

    STATISTICS DB.DBA.RDF_QUAD;
  10. For partitioned cluster settings, are there partitioning-related settings to control even distribution of data between partitions? For example, is there a way to set partitioning by S or O depending on which is first in key order for each index?

    The default partitioning settings are good, i.e., partitioning is on O or S, whichever is first in key order.

  11. For partitioned clusters, are there settings to control message batching or similar? What are the statistics available for checking interconnect operation, e.g. message counts, latencies, total aggregate throughput of interconnect?

    In the [Cluster] section of the cluster.ini file, ReqBatchSize is the number of query states dispatched between cluster nodes per message round trip. This may be incremented from the default of 10000 to 50000 or so if this is seen to be useful.

    To change this on the fly, the following can be issued through an isql session:

    cl_exec ( ' __dbf_set (''cl_request_batch_size'', 50000) ' );

    The commands below may be executed through an isql session to get a summary of CPU and message traffic for the whole cluster or process-by-process, respectively. The documentation details the fields.

     STATUS ('cluster')      ;; whole cluster 
    STATUS ('cluster_d') ;; process-by-process
  12. Other settings - Are there settings for limiting query planning, when appropriate? For example, the BSBM Explore mix has a large component of unnecessary query optimizer time, since the queries themselves access almost no data. Any other relevant settings?

    • For BSBM, needless query optimization should be capped at Virtuoso server start-up through the [Parameters] section of the virtuoso.ini, with

      StopCompilerWhenXOverRun = 1
    • When testing full CRUD operations (not simply CREATE, i.e., load, as discussed in #5, above), it is essential to make queries run with transaction isolation of READ COMMITTED, to remove most lock contention. Transaction isolation cannot be adjusted via SPARQL. This can be changed through SQL session settings, or at Virtuoso server start-up through the [Parameters] section of the virtuoso.ini file, with

      DefaultIsolation = 2

Benchmarks, Redux Series