Benchmarks, Redux (part 13): BSBM BI Modifications

Details

In this post we introduce changes to the BSBM BI queries and metric. These changes are motivated by prevailing benchmark practice and by our experiences in optimizing for the BSBM BI workload.

We will publish results according to the definitions given here and recommend that any interested parties do likewise. The rationales are given in the text.

Query Mix

We have removed Q4 from the mix because it is quadratic to the scale factor. The other queries are roughly n * log (n).

Parameter Substitution

All queries that take a product type as parameter are run in flights of several query invocations where the product type goes from broader to more specific. The initial product type specifies either the root product type or an immediate subtype of this, and the last in the drill-down is a leaf type.

The rationale for this is that the choice of product type may make several orders of magnitude difference in the run time of a query. In order to make consecutive query mixes roughly comparable in execution time, all mixes should have a predictable number of query invocations with product types of each level.

Query Order

In the BI mix, when running multiple concurrent clients, each query mix is submitted in a random order. Queries which do drill-downs always have the steps of the drill-down as consecutive in the session, but the query templates are permuted. This is done so as to make less likely that there were two concurrent queries accessing exactly the same data. In this way, scans cannot be trivially shared between queries -- but there are still opportunities for reuse of results and adapting execution to working set, e.g., starting with what is in memory.

Metrics

We use a TPC-H-like metric. This metric consists of a single-user part and a multi-user part, called respectively Power and Throughput. The Power metric is a geometric mean of query run-time. The Throughput is the total run-time divided by the number of queries completed. After taking the mean, the time is converted into queries-per-hour. This time is then multiplied by the scale factor divided by the scale factor for 100 Mt. In other words, we consider the 100 Mt data set as the unit scale.

The Power is defined as

( scale_factor / 284826 ) * 3600 / ( ( t1 * t1 * ... * tn ) ^ ( 1 / n ) )

The Throughput is defined as

( scale_factor / 284826 ) * 3600 / ( ( t1 + t2 + ... + tn ) / n )

The magic number 284826 is the scale that generates approximately 100 million triples (100 Mt). We consider this scale "one". The reason for the multiplication is that scores at different scales should get similar numbers; otherwise 10x larger scale would result roughly in 10x lower throughput with the BI queries.

The Composite metric is the geometric mean of the Power and Throughput metrics. A complete report shows both Power and Throughput metrics, as well as individual query times for all queries. The rationale for using a geometric mean is to give an equal importance to long and short queries. Halving the execution time of either a long query or a short query will have the same effect on the metric. This is good for encouraging research into all aspects of query processing. On the other hand, real-life users are more interested in halving the time of queries that take one hour than of queries that take one second; therefore, the throughput metric considers run times.

Taking the geometric mean of the two metrics gives more weight to the lower of the two than an arithmetic mean, hence we pay more attention to the worse of the two.

Single-user and multi-user metrics are separate because of the relative importance of intra-query parallelization in BI workloads: There may not be large numbers of concurrent users, yet queries are still complex, and it is important to have maximum parallelization. Therefore the metric rewards single-user performance.

In the next post we will look at the use of this metric and the actual content of BSBM BI.