We have another new Amazon machine image, this time for deploying your own Virtuoso Elastic Cluster on the cloud. The previous post gave a summary of running TPC-H on this image. This post is about what the AMI consists of and how to set it up.

Note: This AMI is running a pre-release build of Virtuoso 7.5, Commercial Edition. Features are subject to change, and this build is not licensed for any use other than the AMI-based benchmarking described herein.

There are two preconfigured cluster setups; one is for two (2) machines/instances and one is for four (4). Generation and loading of TPC-H data, as well as the benchmark run itself, is preconfigured, so you can do it by entering just a few commands. The whole sequence of doing a terabyte (1000G) scale TPC-H takes under two hours, with 30 minutes to generate the data, 35 minutes to load, and 35 minutes to do three benchmark runs. The 100G scale is several times faster still.

To experiment with this AMI, you will need a set of license files, one per machine/instance, which our Sales Team can provide.

Detailed instructions are on the AMI, in /home/ec2-user/cluster_instructions.txt, but the basic steps to get up and running are as follows:

  1. Instantiate machine image ami-811becea) (AMI ID is subject to change; you should be able to find the latest by searching for "OpenLink Virtuoso Benchmarks" in "Community AMIs"; this one is short-named virtuoso-bench-cl) with two or four (2 or 4) R3.8xlarge instances within one virtual private cluster and placement group. Make sure the VPC security is set to allow all connections.

  2. Log in to the first, and fill in the configuration file with the internal IP addresses of all machines instantiated in step 1.

  3. Distribute the license files to the instances, and start the OpenLink License Manager on each machine.

  4. Run 3 shell commands to set up the file systems and the Virtuoso configuration files.

  5. If you do not plan to run one of these benchmarks, you can simply start and work with the Virtuoso cluster now. It is ready for use with an empty database.

  6. Before running one of these benchmark, generate the appropriate dataset with the dbgen.sh command.

  7. Bulk load the data with load.sh.

  8. Run the benchmark with run.sh.

Right now the cluster benchmarks are limited to TPC-H but cluster versions of the LDBC Social Network and Semantic Publishing benchmarks will follow soon.