In Hoc Signo Vinces (part 3 of n) -- Benchmark Configuration Settings

In this section, we cover the system configuration for running an analytics workload. If you are running a Virtuoso database with even moderate data size, or are interested in reproducing the results presented here, the below will be relevant. If you are interested in the science of query, you can skip to the next installment.

The relevant sections of the virtuoso.ini file are below, with commentary inline. The actual ini file has many more settings but these do not influence the benchmark.

The test file system layout has two SSD file systems, mounted on /1s1 and /1s2. The database is striped across the two file systems.

[Database]
DatabaseFile    = virtuoso.db
TransactionFile = /1s2/dbs/virtuoso.trx
Striping        = 1

This sets the log to be on the second SSD, and the database to be striped; the files are declared in the [Striping] section further below.

[TempDatabase]
DatabaseFile    = virtuoso.tdb
TransactionFile = virtuoso.ttr

[Parameters]
ServerPort                 = 1209
ServerThreads              = 100
CheckpointInterval         = 0
NumberOfBuffers            = 8000000
MaxDirtyBuffers            = 1000000

The thread count is set to 100. This is not significant, since the test will only have a few concurrent connections, but this should be at least as high as the number of concurrent user connections expected.

The 100 GB TPC-H working set is about 38 GB for the queries. The full database is about 80 GB. Eight million buffers at 8 KB each means that up to 64 GB of database pages will be resident in memory. This should be set higher than the expected working set if possible, but the database process size should also not exceed 80% of physical memory.

The max dirty buffers limit is set to a small fraction of the total buffers for faster bulk load. The bulk load is limited by writing to secondary storage, so we want the writing to start early, and continue through the bulk load. Otherwise the checkpoint at the end of the bulk load would be oversized, because of high numbers of un-flushed buffers.

The checkpoint interval is set to 0, meaning no automatic checkpoints. There will be one at the end of the bulk load, as required by the rules, but the rules do not require checkpoints for the refresh functions.

ColumnStore                = 1

This sets all tables to be created column-wise. No special DDL directives are needed for column store operation.

MaxCheckpointRemap         = 2500000
DefaultIsolation           = 2

The default isolation is set to READ COMMITTED. Running large queries with locking on reads would have a very high overhead.

DirsAllowed                = /
TransactionAfterImageLimit = 1500000000

This is set to an arbitrarily high number. The measure is the count of bytes to be written to log at commit (1.5 GB, here). If the amount of data to be logged exceeds this, the transaction aborts. The RF1 transaction at 100 GB scale will log about 100 MB.

FDsPerFile                 = 4
MaxMemPoolSize             = 40000000

This is the maximum number of bytes of transient memory to be used for query optimization (40 MB, here). The number is adequate for TPC-H, since the queries only have a few joins each. For RDF workloads, the number should be higher, since there are more joins.

AdjustVectorSize           = 0

The workload will run at the default vector size. Index operations can be accelerated by switching to a larger vector size, trading memory for locality. But since this workload is mostly by hash join, there is no benefit in changing this.

ThreadsPerQuery            = 24

Each query is divided into up to 24 parallel fragments. 24 is the number of threads on the test system.

AsyncQueueMaxThreads       = 48

Queries are run by a pool of 48 worker threads. Each session has one thread of its own. If a query parallelizes, the first fragment runs on the session's thread and the remaining fragments run on a thread from this pool. Thus the core threads are oversubscribed by a factor of slightly over 2 in the throughput run: 6 sessions plus 48 threads makes up to 53 runnable threads at any point in the throughput test.

MaxQueryMem                = 30G

This is a cap on query execution memory. If memory would exceed this, optimizations that would increase space consumption are not used. The memory may still transiently exceed this limit.

HashJoinSpace              = 30G

This is the maximum memory to be used for hash tables during hash joins. If a hash join causes this amount to be exceeded, it will be run in multiple passes, so as to have a cap on the hash table size. Not all hash joins may be partitioned, and the test must not do multi-pass hash joins, hence a high number here. We will see actual space consumption figures when looking at the queries. This parameter may be increased for analytics performance, especially in multiuser situations.

[Client]
SQL_QUERY_TIMEOUT  = 0
SQL_TXN_TIMEOUT    = 0
SQL_ROWSET_SIZE    = 10
SQL_PREFETCH_BYTES = 120000

120 KB of results is to be sent to clients in a single window. This is enough for the relatively short result sets in this benchmark.

[Striping]
Segment1 = 1024, /1s1/dbs/tpch100cp-1.db = q1, /1s2/dbs/tpch100cp-2.db = q2

The database is set to stripe in two files, each on a different SSD. Each file has its own background I/O thread; this is the meaning of the = q1 and = q2 declaration. All files on each separately-seekable device should share the same q.

[Flags]
enable_mt_txn      = 1
enable_mt_transact = 1

The first setting enables multithreading DML statement execution. The second setting enables multithreading of COMMIT or ROLLBACK operations. This is important for the refresh function performance. A column store COMMIT of a DELETE will especially benefit from multithreading, since this may involve re-compression.

hash_join_enable   = 2

Will use hash joins for SQL and SPARQL (even though SPARQL is not used in this experiment).

dbf_explain_level  = 0

Specifies less verbose query plan formatting for logging of query execution.

dbf_log_fsync = 1

Specify that fsync is to be called after each write to the transaction log. The ACID qualification procedure specifies that the system is to be powered down in mid-run, hence this setting is required by the test.