About: In Hoc Signo Vinces (part 3 of 5) -- Benchmark Configuration Settings

Not logged in : Login

(Sponging disallowed)

Facets (new session)
Description
Metadata
Settings
- Rule:
- Inverse Functional Properties:
- "Same As":

About: In Hoc Signo Vinces (part 3 of 5) -- Benchmark Configuration Settings Goto Sponge NotDistinct Permalink

An Entity of Type : schema:BlogPosting, within Data Space : www.openlinksw.com associated with source document(s)
QRcode icon

http://www.openlinksw.com/describe/?url=http%3A%2F%2Fwww.openlinksw.com%2Fdataspace%2Fvdb%2Fweblog%2Fvdb%2527s%2520BLOG%2520%255B136%255D%2F1750&graph=http%3A%2F%2Fwww.openlinksw.com%2Fdataspace&graph=http%3A%2F%2Fwww.openlinksw.com%2Fdataspace

Attributes	Values
has container	vdb's BLOG [136] description
Date Created	2013-11-13 19:14:59-05:00(dt:dateTime)
maker	Virtuso Data Space Bot
Date Modified	2015-06-10 12:11:17.921466-04:00(dt:dateTime)
link	In Hoc Signo Vinces (part 3 of 5) -- Benchmark Configuration Settings
id	d9a59d9725be63e3f8441b0e8eb2b862
content	In this section, we cover the system configuration for running an analytics workload. If you are running a Virtuoso database with even moderate data size, or are interested in reproducing the results presented here, the below will be relevant. If you are interested in the science of query, you can skip to the next installment. The relevant sections of the `virtuoso.ini` file are below, with commentary inline. The actual ini file has many more settings but these do not influence the benchmark. The test file system layout has two SSD file systems, mounted on `/1s1` and `/1s2`. The database is striped across the two file systems. `[Database] DatabaseFile = virtuoso.db TransactionFile = /1s2/dbs/virtuoso.trx Striping = 1` This sets the log to be on the second SSD, and the database to be striped; the files are declared in the `[Striping]` section further below. `[TempDatabase] DatabaseFile = virtuoso.tdb TransactionFile = virtuoso.ttr [Parameters] ServerPort = 1209 ServerThreads = 100 CheckpointInterval = 0 NumberOfBuffers = 8000000 MaxDirtyBuffers = 1000000` The thread count is set to 100. This is not significant, since the test will only have a few concurrent connections, but this should be at least as high as the number of concurrent user connections expected. The 100 GB TPC-H working set is about 38 GB for the queries. The full database is about 80 GB. Eight million buffers at 8 KB each means that up to 64 GB of database pages will be resident in memory. This should be set higher than the expected working set if possible, but the database process size should also not exceed 80% of physical memory. The max dirty buffers limit is set to a small fraction of the total buffers for faster bulk load. The bulk load is limited by writing to secondary storage, so we want the writing to start early, and continue through the bulk load. Otherwise the checkpoint at the end of the bulk load would be oversized, because of high numbers of un-flushed buffers. The checkpoint interval is set to 0, meaning no automatic checkpoints. There will be one at the end of the bulk load, as required by the rules, but the rules do not require checkpoints for the refresh functions. `ColumnStore = 1` This sets all tables to be created column-wise. No special DDL directives are needed for column store operation. `MaxCheckpointRemap = 2500000 DefaultIsolation = 2` The default isolation is set to `READ COMMITTED`. Running large queries with locking on reads would have a very high overhead. `DirsAllowed = / TransactionAfterImageLimit = 1500000000` This is set to an arbitrarily high number. The measure is the count of bytes to be written to log at commit (1.5 GB, here). If the amount of data to be logged exceeds this, the transaction aborts. The RF1 transaction at 100 GB scale will log about 100 MB. `FDsPerFile = 4 MaxMemPoolSize = 40000000` This is the maximum number of bytes of transient memory to be used for query optimization (40 MB, here). The number is adequate for TPC-H, since the queries only have a few joins each. For RDF workloads, the number should be higher, since there are more joins. `AdjustVectorSize = 0` The workload will run at the default vector size. Index operations can be accelerated by switching to a larger vector size, trading memory for locality. But since this workload is mostly by hash join, there is no benefit in changing this. `ThreadsPerQuery = 24` Each query is divided into up to 24 parallel fragments. 24 is the number of threads on the test system. `AsyncQueueMaxThreads = 48` Queries are run by a pool of 48 worker threads. Each session has one thread of its own. If a query parallelizes, the first fragment runs on the session's thread and the remaining fragments run on a thread from this pool. Thus the core threads are oversubscribed by a factor of slightly over 2 in the throughput run: 6 sessions plus 48 threads makes up to 53 runnable threads at any point in the throughput test. `MaxQueryMem = 30G` This is a cap on query execution memory. If memory would exceed this, optimizations that would increase space consumption are not used. The memory may still transiently exceed this limit. `HashJoinSpace = 30G` This is the maximum memory to be used for hash tables during hash joins. If a hash join causes this amount to be exceeded, it will be run in multiple passes, so as to have a cap on the hash table size. Not all hash joins may be partitioned, and the test must not do multi-pass hash joins, hence a high number here. We will see actual space consumption figures when looking at the queries. This parameter may be increased for analytics performance, especially in multiuser situations. `[Client] SQL_QUERY_TIMEOUT = 0 SQL_TXN_TIMEOUT = 0 SQL_ROWSET_SIZE = 10 SQL_PREFETCH_BYTES = 120000` 120 KB of results is to be sent to clients in a single window. This is enough for the relatively short result sets in this benchmark. `[Striping] Segment1 = 1024, /1s1/dbs/tpch100cp-1.db = q1, /1s2/dbs/tpch100cp-2.db = q2` The database is set to stripe in two files, each on a different SSD. Each file has its own background I/O thread; this is the meaning of the `= q1` and `= q2` declaration. All files on each separately-seekable device should share the same `q`. `[Flags] enable_mt_txn = 1 enable_mt_transact = 1` The first setting enables multithreading DML statement execution. The second setting enables multithreading of `COMMIT` or `ROLLBACK` operations. This is important for the refresh function performance. A column store `COMMIT` of a `DELETE` will especially benefit from multithreading, since this may involve re-compression. `hash_join_enable = 2` Will use hash joins for SQL and SPARQL (even though SPARQL is not used in this experiment). `dbf_explain_level = 0` Specifies less verbose query plan formatting for logging of query execution. `dbf_log_fsync = 1` Specify that `fsync` is to be called after each write to the transaction log. The ACID qualification procedure specifies that the system is to be powered down in mid-run, hence this setting is required by the test. In Hoc Signo Vinces Series In Hoc Signo Vinces (part 1): Virtuoso meets TPC-H In Hoc Signo Vinces (part 2): TPC-H Schema Choices In Hoc Signo Vinces (part 3): Benchmark Configuration Settings (this post) In Hoc Signo Vinces (part 4): Bulk Load and Refresh In Hoc Signo Vinces (part 5): The Return of SQL Federation In Hoc Signo Vinces (part 9): TPC-H: TPC-H Q18, Ordered Aggregation, and Top K In Hoc Signo Vinces (part 10): TPC-H: TPC-H Q9, Q17, Q20 - Predicate Games In Hoc Signo Vinces (part 11): TPC-H Q2, Q10 - Late Projection In Hoc Signo Vinces (part 12): TPC-H: Result Preview In Hoc Signo Vinces (part 13): Virtuoso TPC-H Kit Now on V7 Fast Track In Hoc Signo Vinces (part 14): Virtuoso TPC-H Implementation Analysis In Hoc Signo Vinces (part 15): TPC-H and the Science of Hash In Hoc Signo Vinces (part 16): Introduction to Scale-Out In Hoc Signo Vinces (part 17): 100G and 300G Runs on Dual Xeon E5 2650v2 In Hoc Signo Vinces (part 18): Cluster Dynamics In Hoc Signo Vinces (part 19): Scalability, 1000G, and 3000G In Hoc Signo Vinces (part 20): 100G and 1000G With Cluster; When is Cluster Worthwhile; Effects of I/O In Hoc Signo Vinces (part 21): Running TPC-H on Virtuoso Cluster on Amazon EC2
Title	In Hoc Signo Vinces (part 3 of n) -- Benchmark Configuration Settings
is described using	http://www.openlinksw.com/dataspace/vdb/weblog/vdb%27s%20BLOG%20%5B136%5D/1750/sioc.rdf
atom:source	vdb's BLOG [136] description
atom:updated	2015-06-10T16:11:17Z
atom:title	In Hoc Signo Vinces (part 3 of n) -- Benchmark Configuration Settings
links to	http://www.openlinksw.com/weblog/oerling/?id=1741 http://www.openlinksw.com/weblog/oerling/?id=1744 http://www.openlinksw.com/weblog/oerling/?id=1747 http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/InHocSignoVincesSuppDocs/virtuoso.ini http://www.openlinksw.com/weblog/oerling/?id=1793 http://www.openlinksw.com/weblog/oerling/?id=1824 http://www.openlinksw.com/weblog/oerling/?id=1816 http://www.openlinksw.com/weblog/oerling/?id=1739 http://www.openlinksw.com/weblog/oerling/?id=1822 http://www.openlinksw.com/weblog/oerling/?id=1798 http://www.openlinksw.com/weblog/oerling/?id=1796 http://www.openlinksw.com/weblog/oerling/?id=1818 http://www.openlinksw.com/weblog/oerling/?id=1819 http://www.openlinksw.com/weblog/oerling/?id=1845 http://www.openlinksw.com/weblog/oerling/?id=1802 http://www.openlinksw.com/weblog/oerling/?id=1800 http://www.openlinksw.com/weblog/oerling/?id=1789 http://www.openlinksw.com/weblog/oerling/?id=1779
atom:author	Virtuso Data Space Bot
label	In Hoc Signo Vinces (part 3 of n) -- Benchmark Configuration Settings
atom:published	2013-11-14T00:14:59Z
http://rdfs.org/si...ices#has_services	http://www.openlinksw.com/dataspace/services/weblog/item
type	Blog Post atom:Entry BlogPosting
is made of	Virtuso Data Space Bot
is link of	In Hoc Signo Vinces (part 3 of 5) -- Benchmark Configuration Settings
is atom:contains of	vdb's BLOG [136] description
is atom:entry of	vdb's BLOG [136] description
is container of of	vdb's BLOG [136] description
is http://rdfs.org/si...vices#services_of of	http://www.openlinksw.com/dataspace/services/weblog/item

Faceted Search & Find service v1.17_git122 as of Jan 03 2023

Alternative Linked Data Documents: iSPARQL | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 08.03.3330 as of Apr 5 2024, on Linux (x86_64-generic-linux-glibc25), Single-Server Edition (30 GB total memory, 28 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software

About: In Hoc Signo Vinces (part 3 of 5) -- Benchmark Configuration Settings Goto Sponge NotDistinct Permalink

In Hoc Signo Vinces Series