This article shows how you can reproduce the TPC-H experiments from the previous posts in this series.

All the code is in the feature/analytics branch of the v7fasttrack git repository on GitHub.

Prerequisites

Start by checking out and compiling Virtuoso Open Source (VOS).

git clone https://github.com/v7fasttrack/virtuoso-opensource
cd virtuoso-opensource 
git checkout feature/analytics
./autogen.sh
export CFLAGS="-msse4.2 -DSSE42"
./configure 
make -j 24
make install

The system should be an x86_64, Intel Core i7 or later, with SSE 4.2 support. (Running without SSE 4.2 is possible, but for better score you need to define it before doing configure.) The gcc may be any that supports SSE 4.2.

To have a good result, the system should have at least 96 GB of RAM and SSD.

To get a good load time, both the database files and the CSV files made by dbgen should be on SSD.

Running TPC-H

Set up

Copy the binsrc/tests/tpc-h directory from the check-out to an SSD, if it is not already on one. Set $HOME to point to the root directory of the check-out. Rename the virtuoso.ini-100G in the tpc-h directory to virtuoso.ini. Edit the database file paths in the virtuoso.ini. Where it says --

Segment1 = 1024, /1s1/dbs/tpch100cp-1.db = q1, /1s2/dbs/tpch100cp-2.db = q2

-- change the file names, and add or remove files as appropriate, each file with a different = qn, until you have one file per independent device. If this is a RAID, one file per distinct device in the RAID usually brings improvement. Edit the TransactionFile entry, and replace /1s2/dbs/ with a suitable path.

Edit ThreadsPerQuery to be the number of threads on the machine. For i7, this is double the number of cores; your environment may vary. AsyncQueueMaxThreads should be set to double ThreadsPerQuery.

The memory settings (NumberOfBuffers and MaxDirtyBuffers) are OK for 100G scale. For larger scale, make the memory settings correspondingly larger, not to exceed 75% of system memory. Count 8.5 KB per buffer. If you have less memory, you can decrease these. If so, the first power test will be hit the worst, so the scores will not be as good.

The default BIOS settings are usually OK. Disabling prefetch of adjacent cache line does not help, and turning off core threads does not help either.

For 100G, the data files for loading will take 100 GB, and the database files will take 88 GB divided among however many files. Be sure there is enough space before you start.

Generate the data

In the tpc-h directory (copy of binsrc/tests/tpc-h) --

./gen.sh 100 5 2

The first parameter is the scale factor; the second is the number of streams to use in the throughput test; the last is the number of consecutive test runs. The minimum number of streams is 5 for 100G; each successive scale adds one more. A larger number of streams is allowed, but will not make a better result in this case. A test always consists of 2 runs; you could specify more, but the extra tests will not influence the score.

Making the data files takes the longest time. You may run dbgen multithreaded to make the dataset in parallel, but then the load scripts will have to be changed to match.

Run the load

Start Virtuoso.

./load.sh 100

Looking at iostat, you will see a read rate of about 140 MB/s from the source files.

Run the test

./run.sh 100 5 2 

The parameters have the same meaning as in gen.sh, and the same values must be specified.

Outputs

The run produces two main files, report1.txt and report2.txt. These are the numerical quantity summaries for the first and second run. Additionally, there are output files for each query stream. The suppfiles.sh script can be used to collect the supporting files archive for a TPC-H full disclosure report.

The database is left running. To reuse the loaded data for another experiment, kill the Virtuoso process, delete the transaction log, and restart. This will have the data in the post-load state. To get warm cache, use the warm.sql script in the tpc-h directory.

On the test system used in this series, 12 core E5 at 2.3GHz, we expect 240K for the first run and 250K for the second. With a top-of-the-line E5, we expect around 400K. For an 8-core 2.26GHz Nehalem, we expect 150K.

If you get scores that are significantly different, something is broken; we would like to know about this.

If you have this audited according to the TPC rules, you will be allowed to call this a TPC-H result. Without such audit, the result should be labeled Virt-H.

To be continued...

In Hoc Signo Vinces (TPC-H) Series