We have made an Amazon EC2 deployment of Virtuoso 7 Commercial Edition, configured to use the Elastic Cluster Module with TPC-H preconfigured, similar to the recently published OpenLink Virtuoso Benchmark AMI running the Open Source Edition. The details of the new Elastic Cluster AMI and steps to use it will be published in a forthcoming post. Here we will simply look at results of running TPC-H 100G scale on two machines, and 1000G scale on four machines. This shows how Virtuoso provides great performance on a cloud platform. The extremely fast bulk load — 33 minutes for a terabyte! — means that you can get straight to work even with on-demand infrastructure.

In the following, the Amazon instance type is R3.8xlarge, each with dual Xeon E5-2670 v2, 244G RAM, and 2 x 300G SSD. The image is made from the Amazon Linux with built-in network optimization. We first tried a RedHat image without network optimization and had considerable trouble with the interconnect. Using network-optimized Amazon Linux images inside a virtual private cloud has resolved all these problems.

The network optimized 10GE interconnect at Amazon offers throughput close to the QDR InfiniBand running TCP-IP; thus the Amazon platform is suitable for running cluster databases. The execution that we have seen is not seriously network bound.

100G on 2 machines, with a total of 32 cores, 64 threads, 488 GB RAM, 4 x 300 GB SSD

Load time: 3m 52s
Run Power Throughput Composite
1 523,554.3 590,692.6 556,111.2
2 565,353.3 642,503.0 602,694.9

1000G on 4 machines, with a total of 64 cores, 128 threads, 976 GB RAM, 8 x 300 GB SSD

Load time: 32m 47s
Run Power Throughput Composite
1 592,013.9 754,107.6 668,163.3
2 896,564.1 828,265.4 861,738.4
3 883,736.9 829,609.0 856,245.3

For the larger scale we did 3 sets of power + throughput tests to measure consistency of performance. By the TPC-H rules, the worst (first) score should be reported. Even after bulk load, this is markedly less than the next power score due to working set effects. This is seen to a lesser degree with the first throughput score also.

The numerical quantities summaries are available in a report.zip file, or individually --

Subsequent posts will explain how to deploy Virtuoso Elastic Clusters on AWS.

In Hoc Signo Vinces (TPC-H) Series