We are getting ready to run BSBM benchmarks at a 500 billion triple scale over the holidays. This will be done on the Scilens cluster at CWI. The system has 16 nodes, each with 2x Xeon E5-2650, 256 GB RAM, and QDR InfiniBand. We will use 12 of these, for a total of 3 TB RAM. I will blog about the experiences in January, after the experiment is done. This is the final LOD2 (EU FP7 project) benchmarking piece.

I will here give results for practice runs on my desktop, with 1/50th the data and 1/8th the capacity. This is 10 billion triples on a system of two machines, each with 2x Xeon E5-2630, 192 GB RAM, and QDR InfiniBand.

We start with Explore with 16 clients. The clients are evenly divided over 4 server processes, with 2 processes per machine. We do a run of 100 --

% ./bibm/bsbmdriver -seed 1287654 -dg http://bsbm.org -t 300000 \
   -idir /1d4/bsbm_10000/td_data -uqp query -uc bsbm/explore \
   -mt 16 -runs 500  http://madras:8604/sparql \
   http://madras:8605/sparql http://masala-i:8606/sparql \
   http://masala-i:8607/sparql

The QMPH (query mixes per hour) is 12683.046. The run is 1500 query mixes; it takes 425s. The warmup is about 3000 query mixes with a different seed. The details are in 10gc16e.xml. The sample configurations are as in virtuoso.global.ini, cluster.global.ini, and virtuoso.ini.

We note that there is a total of 20M 8 KB buffer pages, and after running 1000 or so different query mixes, about 16M get used. So the working set is about 16M * 8 KB = 128 GB, for 10 Gt ("Gigatriples"). The quads themselves take less space than that, but the benchmark also accesses some literals. At 50x the size and at best 2.5 TB worth of buffers, there may be a problem.

The total database files are around 800K pages * 8 KB/slice * 48 slices = 272 GB. This times 50 is 15.3 TB. I do not think the system has that much SSD space, and it has about 3 TB per node in 3-way striped RAID 0 disk. There will be some disk access during the explore run. So we will report one number with steady state from disk, and another for a rerun of a set of queries where data is known to come from memory.

We note that there is speculative read, taking whole extents in when not all pages get used. Whether one reads 8 KB or 2 MB (the extent) makes little difference, so may as well do whole extents. Subtracting the speculatively-read pages that are not in fact accessed, we get 1.5M working set per box, which would indicate that we will make it into a RAM-based steady-state on the 500 Gt Scilens experiment. We shall see.

Loading may present some problems, since last time we had two boxes with significantly worse disk-write throughput than the rest. The Virtuoso I/O system is now different, with more emphasis on writing contiguous sequential ranges of pages, irrespective of the time the page became dirty. But there is nothing that a bad disk will not screw up.

We go to BI. First single user (power) run. This is preceded by one single user BI run with a different seed, for warmup. The power run has 4 consecutive query mixes; the throughput run has the same 4 query mixes concurrently.

Power query mix run time: 229 (arithmetic mean)

Throughput query mix run time: 269s (arithmetic mean)

The test driver output follows, the full result summaries are in 10gc-4pwer.xml and 10gc-4tp.xml.

Power Results

% ./bibm/bsbmdriver  -drill -t 300000  -dg http://bsbm.org -idir \
   /1d4/bsbm_10000/td_data -uqp query -uc bsbm/bi  -mt 1 -runs 4 \
   http://madras:8604/sparql 
% java -Xmx256M com.openlinksw.bibm.bsbm.TestDriver -qrd ./bibm \
   -dg http://bsbm/ -drill -t 300000 -dg http://bsbm.org -idir \
   /1d4/bsbm_10000/td_data -uqp query -uc bsbm/bi -mt 1 -runs 4 \
   http://madras:8604/sparql
Thread 1: query mix: 0  255.074 s, total: 255.195 s
Thread 1: query mix: 1  170.622 s, total: 170.667 s
Thread 1: query mix: 2  295.642 s, total: 295.691 s
Thread 1: query mix: 3  188.885 s, total: 188.935 s
Benchmark run completed in 910.493 s
Query Number Execute Count Timeshare aqet aqetg aps minqet maxqet Average Results Min Results Max Results Timeout Count
1 4 4.079 9.281250 7.848979 0.108 4.141000 18.836000 10.000 10 10 0
2 4 2.358 5.365000 4.907928 0.186 2.539000 8.182000 10.000 10 10 0
3 4 25.639 58.343500 24.902196 0.017 2.640000 111.680000 10.000 10 10 0
4 20 28.908 13.156450 1.725366 0.076 0.130000 73.728000 92.650 55 100 0
5 20 12.229 5.565350 2.550107 0.180 0.202000 16.251000 30.250 14 58 0
6 4 0.277 0.631250 0.574319 1.584 0.269000 0.950000 49.250 14 72 0
7 24 3.875 1.469625 0.268809 0.680 0.056000 7.961000 54.875 0 413 0
8 20 22.635 10.301600 5.450917 0.097 0.626000 36.908000 10.000 10 10 0

Throughput Results

% ./bibm/bsbmdriver  -drill -t 300000  -dg http://bsbm.org -idir \ 
   /1d4/bsbm_10000/td_data -uqp query -uc bsbm/bi  -mt 4 -runs 4 \
   http://madras:8604/sparql http://madras:8605/sparql  \
   http://masala-i:8606/sparql  http://masala-i:8607/sparql 
% java -Xmx256M com.openlinksw.bibm.bsbm.TestDriver -qrd ./bibm \
   -dg http://bsbm/ -drill -t 300000 -dg http://bsbm.org -idir \
   /1d4/bsbm_10000/td_data -uqp query -uc bsbm/bi -mt 4 -runs 4 \
   http://madras:8604/sparql http://madras:8605/sparql \
   http://masala-i:8606/sparql http://masala-i:8607/sparql
Thread 2: query mix: 1  474.435 s, total: 474.498 s
Thread 1: query mix: 0  669.552 s, total: 669.663 s
Thread 3: query mix: 3  914.943 s, total: 915.009 s
Thread 4: query mix: 2  1077.138 s, total: 1077.283 s
Benchmark run completed in 1077.285 s

%
Query Number Execute Count Timeshare aqet aqetg aps minqet maxqet Average Results Min Results Max Results Timeout Count
1 4 2.478 19.424500 12.522236 0.150 4.207000 52.115000 10.000 10 10 0
2 4 1.188 9.312250 7.525659 0.313 2.116000 14.486000 10.000 10 10 0
3 4 17.822 139.727250 79.815584 0.021 19.953000 268.652000 10.000 10 10 0
4 20 47.737 74.853750 4.014962 0.039 0.132000 728.240000 92.650 55 100 0
5 20 10.393 16.296000 6.905212 0.179 0.406000 76.083000 30.250 14 58 0
6 4 1.557 12.209000 2.304804 0.238 0.364000 45.836000 49.250 14 72 0
7 24 3.795 4.958458 0.876038 0.587 0.049000 33.648000 54.708 0 413 0
8 20 15.031 23.568900 9.495984 0.124 0.614000 157.444000 10.000 10 10 0

We notice quite a bit of variability between the different query mixes. This comes from parameter choices, and runs with different seeds are therefore not comparable, unless they be very long.

What does this promise for the 500 Gt runs? The complexities are n·log(n), with the log pretty constant. There will be some loss of speed from less locality of reference. I expect run times that are 8x or so higher, 50x more data, and about 8x more CPU. The dataset does not scale that linearly throughout, as the product hierarchies may have different depth.

The working set will be OK; on each of the 4 processes, there are 3.1M buffers used, of which 2M are just read ahead, not really hit. When you read, read the whole extent of 256 pages while at it; costs the same and may prefetch. So actually, 4.4M buffers used is 34 GB, times 50 is 1.7 TB. Will fit.

In the interest of advancing standards of disclosure, I am also providing the test driver output for the runs, and an excerpt of the server query log for an interactive query mix and a BI query mix. The query texts and plans are there, with per operator time and cardinality.

Related