(Cut & Pasted verbatim from Orri Erling's
Weblog.)
Virtuoso TPCC
and Multiprocessor Linux and Mac: "
We have updated our article on Virtuoso scalability with two new platforms: A
2 x dual core Intel Xeon and a Mac Mini with an Intel Core Duo.
We have more than quadrupled the best result so far.
The best score so far is 83K transactions per minute with a 40
warehouse (about 4G) database. This is attributable to the process
running in mostly memory, with 3 out of 4 cores busy on the
database server. But even when doubling the database size and
number of 3 clients, we stay at 49K transactions per minute, now
with a little under 2 cores busy and am average of 20 disk reads
pending at all times, split over 4 SATA disks. The measurement is
the count of completed transactions during a 1h run. With the 80
warehouse database, it took about 18 minutes for the system to
reach steady state, with a warm working set, hence the actual
steady rate is somewhat higher than 49K, as the warm up period was
included in the measurement.
The metric on the Mac Mini was 2.7K with 2G RAM and one disk.
The CPU usage was about one third of one core. Since we have had
rates of over 10K with 2G RAM, we attribute the low result to
running on a single disk which is not very fast at that.
We have run tests in 64 and 32 bit modes but have found little
difference as long as actual memory does not exceed 4g. If
anything, 32 bit binaries should have an advantage in cache hit
rate since most data structures take less space there. After the
process size exceeds the 32 bit limit, there is a notable
difference in favor of 64 bit. Having more than 4G of database
buffers produces a marked advantage over letting the OS use the
space for file system cache. So, 64 bit is worthwhile but only if
there is enough memory. As for X86 having more registers in 64 bit
mode, we have not specifically measured what effect that might
have.
We also note that Linux has improved a great deal with respect
to multiprocessor configurations. We use a very simple test with a
number of threads acquiring and then immediately freeing the same
mutex. On single CPU systems, the real time has pretty much
increased linearly with the number of threads. On multiprocessor
systems, we used to get very non-linear behavior, with 2 threads
competing for the same mutex taking tens of times the real time as
opposed to one thread. At last measurement, with a 64 bit FC 5, we
saw 2 threads take 7x the real time when competing for the same
mutex. This is in the same ballpark as Solaris 10 on a similar
system. Mac OS X 10.4 Tiger on a 2x dual core Xeon Mac Pro did the
worst so far, with two threads taking over 70x the time of one.
With a Mac Mini with a single Core Duo, the factor between one
thread and two was 73.
Also the proportion of system CPU on Tiger was consistently
higher than on Solaris or Linux when running the same benchmarks.
Of course for most applications this test is not significant but it
is relevant for database servers, as there are many very short
critical sections involved in multithreaded processing of indices
and the like.