Details
OpenLink Software
Burlington, United States
Subscribe
Post Categories
Recent Articles
Community Member Blogs
Display Settings
Translate
|
Showing posts in all categories Refresh
Comparing Virtuoso Performance on Different Processors
[
Orri Erling
]
Over the years we have run Virtuoso on different hardware. We will here give a few figures that help identify the best price point for machines running Virtuoso.
Our test is very simple: Load 20 warehouses of TPC-C data, and then run one client per warehouse for 10,000 new orders. The way this is set up, disk I/O does not play a role and lock contention between the clients is minimal.
The test essentially has 20 server and 20 client threads running the same workload in parallel. The load time gives the single thread number; the 20 clients run gives the multi-threaded number. The test uses about 2-3 GB of data, so all is in RAM but is large enough not to be all in processor cache.
All times reported are real times, starting from the start of the first client and ending with the completion of the last client.
Do not confuse these results with official TPC-C. The measurement protocols are entirely incomparable.
| Test |
Platform |
Load (seconds) |
Run (seconds) |
GHz / cores / threads |
| 1 |
Amazon EC2 Extra Large (4 virtual cores) |
340 |
42 |
1.2 GHz? / 4 / 1 |
| 1 |
Amazon EC2 Extra Large (4 virtual cores) |
305 |
43.3 |
1.2 GHz? / 4 / 1 |
| 2 |
1 x dual-core AMD 5900 |
263 |
58.2 |
2.9 GHz / 2 / 1 |
| 3 |
2 x dual-core Xeon 5130 ("Woodcrest") |
245 |
35.7 |
2.0 GHz / 4 / 1 |
| 4 |
2 x quad-core Xeon 5410 ("Harpertown") |
237 |
18.0 |
2.33 GHz / 8 / 1 |
| 5 |
2 x quad-core Xeon 5520 ("Nehalem") |
162 |
18.3 |
2.26 GHz / 8 / 2 |
We tried two different EC2 instances to see if there would be variation. The variation was quite small. The tested EC2 instances costs 20 US cents per hour. The AMD dual-core costs 550 US dollars with 8G. The 3 Xeon configurations are Supermicro boards with 667MHz memory for the Xeon 5130 ("Woodcrest") and Xeon 5410 ("Harpertown"), and 800MHz memory for the Nehalem. The Xeon systems cost between 4000 and 7000 US dollars, with 5000 for a configuration with 2 x Xeon 5520 ("Nehalem"), 72 GB RAM, and 8 x 500 GB SATA disks.
Caveat: Due to slow memory (we could not get faster within available time), the results for the Nehalem do not take full advantage of its principal edge over the previous generation, i.e., memory subsystem. We'll see another time with faster memories.
The operating systems were various 64 bit Linux distributions.
We did some further measurements comparing Harpertown and Nehalem processors. The Nehalem chip was a bit faster for a slightly lower clock but we did not see any of the twofold and greater differences advertised by Intel.
We tried some RDF operations on the two last systems:
| operation |
Harpertown |
Nehalem |
| Build text index for DBpedia |
1080s |
770s |
| Entity Rank iteration |
263s |
251s |
Then we tried to see if the core multithreading of Nehalem could be seen anywhere. To this effect, we ran the Fibonacci function in SQL to serve as an example of an all in-cache integer operation. 16 concurrent operations took exactly twice as long as 8 concurrent ones, as expected.
For something that used memory, we took a count of RDF quads on two different indices, getting the same count. The database was a cluster setup with one process per core, so a count involved one thread per core. The counts in series took 5.02s and in parallel they took 4.27s.
Then we took a more memory intensive piece that read the RDF quads table in the order of one index and for each row checked that there is the equal row on another, differently-partitioned index. This is a cross-partition join. One of the indices is read sequentially and the other at random. The throughput can be reported as random-lookups-per-second. The data was English DBpedia, about 140M triples. One such query takes a couple of minutes with a 650% CPU utilization. Running multiple such queries should show effects of core multithreading since we expect frequent cache misses.
- On the host OS of the Nehalem system —
| n |
cpu% |
rows per second |
| 1 query |
503 |
906,413 |
| 2 queries |
1263 |
1,578,585 |
| 3 queries |
1204 |
1,566,849 |
- In a VM under Xen, on the Nehalem system —
| n |
cpu% |
rows per second |
| 1 query |
652 |
799,293 |
| 2 queries |
1266 |
1,486,710 |
| 3 queries |
1222 |
1,484,093 |
- On the host OS of the Harpertown system —
| n |
cpu% |
rows per second |
| 1 query |
648 |
1,041,448 |
| 2 queries |
708 |
1,124,866 |
The CPU percentages are as reported by the OS: user + system CPU divided by real time.
So, Nehalem is in general somewhat faster, around 20-30%, than Harpertown. The effect of core multithreading can be noticed but is not huge, another 20% or so for situations with more threads than cores. The join where Harpertown did better could be attributed to its larger cache — 12 MB vs 8 MB.
We see that Xen has a measurable but not prohibitive overhead; count a little under 10% for everything, also tasks with no I/O. The VM was set up to have all CPU for the test and the queries did not do disk I/O.
The executables were compiled with gcc with default settings. Specifying -march=nocona (Core 2 target) dropped the cross-partition join time mentioned above from 128s to 122s on Harpertown. We did not try this on Nehalem but presume the effect would be the same, since the out-of-order unit is not much different. We did not do anything about process-to-memory affinity on Nehalem, which is a non-uniform architecture. We would expect this to increase performance since we have many equal size processes with even load.
The mainstay of the Nehalem value proposition is a better memory subsystem. Since the unit we got was at 800 MHz memory, we did not see any great improvement. So if you buy Nehalem, you should make sure it is with 1333 MHz memory, else the best case will not be over 50% over a 667 MHz Core 2-based Xeon.
Nehalem remains a better deal for us because of more memory per board. One Nehalem box with 72 GB costs less than two Harpertown boxes with 32 GB and offers almost the same performance. Having a lot of memory in a small space is key. With faster memory, it might even outperform two Harpertown boxes, but this remains to be seen.
If space were not a constraint, we could make a cluster of 12 small workstations for the price of our largest system and get still more memory and more processor power per unit of memory. The Nehalem box was almost 4x faster than the AMD box but then it has 9x the memory, so the CPU to memory ratio might be better with the smaller boxes.
|
05/28/2009 10:54 GMT
|
Modified:
05/28/2009 11:15 GMT
|
Comparing Virtuoso Performance on Different Processors
[
Virtuso Data Space Bot
]
Over the years we have run Virtuoso on different hardware. We will here give a few figures that help identify the best price point for machines running Virtuoso.
Our test is very simple: Load 20 warehouses of TPC-C data, and then run one client per warehouse for 10,000 new orders. The way this is set up, disk I/O does not play a role and lock contention between the clients is minimal.
The test essentially has 20 server and 20 client threads running the same workload in parallel. The load time gives the single thread number; the 20 clients run gives the multi-threaded number. The test uses about 2-3 GB of data, so all is in RAM but is large enough not to be all in processor cache.
All times reported are real times, starting from the start of the first client and ending with the completion of the last client.
Do not confuse these results with official TPC-C. The measurement protocols are entirely incomparable.
| Test |
Platform |
Load (seconds) |
Run (seconds) |
GHz / cores / threads |
| 1 |
Amazon EC2 Extra Large (4 virtual cores) |
340 |
42 |
1.2 GHz? / 4 / 1 |
| 1 |
Amazon EC2 Extra Large (4 virtual cores) |
305 |
43.3 |
1.2 GHz? / 4 / 1 |
| 2 |
1 x dual-core AMD 5900 |
263 |
58.2 |
2.9 GHz / 2 / 1 |
| 3 |
2 x dual-core Xeon 5130 ("Woodcrest") |
245 |
35.7 |
2.0 GHz / 4 / 1 |
| 4 |
2 x quad-core Xeon 5410 ("Harpertown") |
237 |
18.0 |
2.33 GHz / 8 / 1 |
| 5 |
2 x quad-core Xeon 5520 ("Nehalem") |
162 |
18.3 |
2.26 GHz / 8 / 2 |
We tried two different EC2 instances to see if there would be variation. The variation was quite small. The tested EC2 instances costs 20 US cents per hour. The AMD dual-core costs 550 US dollars with 8G. The 3 Xeon configurations are Supermicro boards with 667MHz memory for the Xeon 5130 ("Woodcrest") and Xeon 5410 ("Harpertown"), and 800MHz memory for the Nehalem. The Xeon systems cost between 4000 and 7000 US dollars, with 5000 for a configuration with 2 x Xeon 5520 ("Nehalem"), 72 GB RAM, and 8 x 500 GB SATA disks.
Caveat: Due to slow memory (we could not get faster within available time), the results for the Nehalem do not take full advantage of its principal edge over the previous generation, i.e., memory subsystem. We'll see another time with faster memories.
The operating systems were various 64 bit Linux distributions.
We did some further measurements comparing Harpertown and Nehalem processors. The Nehalem chip was a bit faster for a slightly lower clock but we did not see any of the twofold and greater differences advertised by Intel.
We tried some RDF operations on the two last systems:
| operation |
Harpertown |
Nehalem |
| Build text index for DBpedia |
1080s |
770s |
| Entity Rank iteration |
263s |
251s |
Then we tried to see if the core multithreading of Nehalem could be seen anywhere. To this effect, we ran the Fibonacci function in SQL to serve as an example of an all in-cache integer operation. 16 concurrent operations took exactly twice as long as 8 concurrent ones, as expected.
For something that used memory, we took a count of RDF quads on two different indices, getting the same count. The database was a cluster setup with one process per core, so a count involved one thread per core. The counts in series took 5.02s and in parallel they took 4.27s.
Then we took a more memory intensive piece that read the RDF quads table in the order of one index and for each row checked that there is the equal row on another, differently-partitioned index. This is a cross-partition join. One of the indices is read sequentially and the other at random. The throughput can be reported as random-lookups-per-second. The data was English DBpedia, about 140M triples. One such query takes a couple of minutes with a 650% CPU utilization. Running multiple such queries should show effects of core multithreading since we expect frequent cache misses.
- On the host OS of the Nehalem system —
| n |
cpu% |
rows per second |
| 1 query |
503 |
906,413 |
| 2 queries |
1263 |
1,578,585 |
| 3 queries |
1204 |
1,566,849 |
- In a VM under Xen, on the Nehalem system —
| n |
cpu% |
rows per second |
| 1 query |
652 |
799,293 |
| 2 queries |
1266 |
1,486,710 |
| 3 queries |
1222 |
1,484,093 |
- On the host OS of the Harpertown system —
| n |
cpu% |
rows per second |
| 1 query |
648 |
1,041,448 |
| 2 queries |
708 |
1,124,866 |
The CPU percentages are as reported by the OS: user + system CPU divided by real time.
So, Nehalem is in general somewhat faster, around 20-30%, than Harpertown. The effect of core multithreading can be noticed but is not huge, another 20% or so for situations with more threads than cores. The join where Harpertown did better could be attributed to its larger cache — 12 MB vs 8 MB.
We see that Xen has a measurable but not prohibitive overhead; count a little under 10% for everything, also tasks with no I/O. The VM was set up to have all CPU for the test and the queries did not do disk I/O.
The executables were compiled with gcc with default settings. Specifying -march=nocona (Core 2 target) dropped the cross-partition join time mentioned above from 128s to 122s on Harpertown. We did not try this on Nehalem but presume the effect would be the same, since the out-of-order unit is not much different. We did not do anything about process-to-memory affinity on Nehalem, which is a non-uniform architecture. We would expect this to increase performance since we have many equal size processes with even load.
The mainstay of the Nehalem value proposition is a better memory subsystem. Since the unit we got was at 800 MHz memory, we did not see any great improvement. So if you buy Nehalem, you should make sure it is with 1333 MHz memory, else the best case will not be over 50% over a 667 MHz Core 2-based Xeon.
Nehalem remains a better deal for us because of more memory per board. One Nehalem box with 72 GB costs less than two Harpertown boxes with 32 GB and offers almost the same performance. Having a lot of memory in a small space is key. With faster memory, it might even outperform two Harpertown boxes, but this remains to be seen.
If space were not a constraint, we could make a cluster of 12 small workstations for the price of our largest system and get still more memory and more processor power per unit of memory. The Nehalem box was almost 4x faster than the AMD box but then it has 9x the memory, so the CPU to memory ratio might be better with the smaller boxes.
|
05/28/2009 10:54 GMT
|
Modified:
05/28/2009 11:15 GMT
|
Live Virtuoso instance hosting Linked Open Data (LOD) Cloud
[
Kingsley Uyi Idehen
]
We have reached a beachead re. the Virtuoso instance hosting the Linked Open Data (LOD) Cloud; meaning, we are not going to be performing any major updates and deletions short-term, bar incorporation of fresh data sets from the Freebase and Bio2RDF projects (both communities a prepping new RDF data sets). At the current time we have loaded 100% of all the very large data sets from the LOD Cloud. As result, we can start the process of exposing Linked Data virtues in a manner that's palatable to users, developers, and database professionals across the Web 1.0, 2.0, and 3.0 spectrums. What does this mean? You can use the "Search & Find" or"URI Lookup" or SPARQL endpoint associated with the LOD cloud hosting instance to perform the following tasks: - Find entities associated with full text search patterns -- Google Style, but with Entity & Text proximity Rank instead of Page Rank, since we are dealing with Entities rather than documents about entities
- Find and Lookup entities by Identifier (URI) -- which is helpful when locating URIs to use for identify entities in your own linked data spaces on the Web
- View entity descriptions via a variety of representation formats (HTML, RDFa, RDF/XML, N3, Turtle etc.)
- Determine uses of entity identifiers across the LOD cloud -- which helps you select preferred URIs based on usage statistics.
What does it offer Web 1.0 and 2.0 developers? If you don't want to use the SPARQL based Web Service, or other Linked Data Web oriented APIs for interacting with the LOD cloud programmatically, you can simply use the powerful REST style Web Service that provides URL parameters for performing full text oriented "Search", entity oriented "Find" queries, and faceted navigation over the huge data corpus with results data returned in JSON and XML formats. Next Steps: Amazon have agreed to add all the LOD Cloud data sets to their existing public data sets collective. Thus, the data sets we are loading will be available in "raw data" (RDF) format on the public data sets page via Named Elastic Block Storage (EBS) Snapshots); meaning, you can make an EC2 AMI (e.g. a Linux, Windows, Solaris) and install an RDF quad or triple store of choice into your AMI, then simply load data from the LOD cloud based on your needs. In addition to the above, we are also going to offer a Virtuoso 6.0 Cluster Edition based LOD Cloud AMI (as we've already done with DBpedia, MusicBrainz, NeuroCommons, and Bio2Rdf) that will enable you to simply instantiate a personal and service specific edition of Virtuoso with all the LOD data in place and fully tuned for performance and scalability; basically, you will simply press "Instantiate AMI" and a LOD cloud data space, in true Linked Data from, will be at your disposal within minutes (i.e. the time it takes the DB to start). Work on the migration of the LOD data to EC2 starts this week. Thus, if you are interested in contributing an RDF based data set to the LOD cloud now is the time to get your archive links in place on the (see: ESW Wiki page for LOD Data Sets).
|
03/30/2009 11:27 GMT
|
Modified:
04/01/2009 14:26 GMT
|
Live Virtuoso instance hosting Linked Open Data (LOD) Cloud
[
Kingsley Uyi Idehen
]
We have reached a beachead re. the Virtuoso instance hosting the Linked Open Data (LOD) Cloud; meaning, we are not going to be performing any major updates and deletions short-term, bar incorporation of fresh data sets from the Freebase and Bio2RDF projects (both communities a prepping new RDF data sets). At the current time we have loaded 100% of all the very large data sets from the LOD Cloud. As result, we can start the process of exposing Linked Data virtues in a manner that's palatable to users, developers, and database professionals across the Web 1.0, 2.0, and 3.0 spectrums. What does this mean? You can use the "Search & Find" or"URI Lookup" or SPARQL endpoint associated with the LOD cloud hosting instance to perform the following tasks: - Find entities associated with full text search patterns -- Google Style, but with Entity & Text proximity Rank instead of Page Rank, since we are dealing with Entities rather than documents about entities
- Find and Lookup entities by Identifier (URI) -- which is helpful when locating URIs to use for identify entities in your own linked data spaces on the Web
- View entity descriptions via a variety of representation formats (HTML, RDFa, RDF/XML, N3, Turtle etc.)
- Determine uses of entity identifiers across the LOD cloud -- which helps you select preferred URIs based on usage statistics.
What does it offer Web 1.0 and 2.0 developers? If you don't want to use the SPARQL based Web Service, or other Linked Data Web oriented APIs for interacting with the LOD cloud programmatically, you can simply use the powerful REST style Web Service that provides URL parameters for performing full text oriented "Search", entity oriented "Find" queries, and faceted navigation over the huge data corpus with results data returned in JSON and XML formats. Next Steps: Amazon have agreed to add all the LOD Cloud data sets to their existing public data sets collective. Thus, the data sets we are loading will be available in "raw data" (RDF) format on the public data sets page via Named Elastic Block Storage (EBS) Snapshots); meaning, you can make an EC2 AMI (e.g. a Linux, Windows, Solaris) and install an RDF quad or triple store of choice into your AMI, then simply load data from the LOD cloud based on your needs. In addition to the above, we are also going to offer a Virtuoso 6.0 Cluster Edition based LOD Cloud AMI (as we've already done with DBpedia, MusicBrainz, NeuroCommons, and Bio2Rdf) that will enable you to simply instantiate a personal and service specific edition of Virtuoso with all the LOD data in place and fully tuned for performance and scalability; basically, you will simply press "Instantiate AMI" and a LOD cloud data space, in true Linked Data from, will be at your disposal within minutes (i.e. the time it takes the DB to start). Work on the migration of the LOD data to EC2 starts this week. Thus, if you are interested in contributing an RDF based data set to the LOD cloud now is the time to get your archive links in place on the (see: ESW Wiki page for LOD Data Sets).
|
03/30/2009 11:27 GMT
|
Modified:
04/01/2009 14:26 GMT
|
Live Virtuoso instance hosting Linked Open Data (LOD) Cloud
[
Kingsley Uyi Idehen
]
We have reached a beachead re. the Virtuoso instance hosting the Linked Open Data (LOD) Cloud; meaning, we are not going to be performing any major updates and deletions short-term, bar incorporation of fresh data sets from the Freebase and Bio2RDF projects (both communities a prepping new RDF data sets). At the current time we have loaded 100% of all the very large data sets from the LOD Cloud. As result, we can start the process of exposing Linked Data virtues in a manner that's palatable to users, developers, and database professionals across the Web 1.0, 2.0, and 3.0 spectrums. What does this mean? You can use the "Search & Find" or"URI Lookup" or SPARQL endpoint associated with the LOD cloud hosting instance to perform the following tasks: - Find entities associated with full text search patterns -- Google Style, but with Entity & Text proximity Rank instead of Page Rank, since we are dealing with Entities rather than documents about entities
- Find and Lookup entities by Identifier (URI) -- which is helpful when locating URIs to use for identify entities in your own linked data spaces on the Web
- View entity descriptions via a variety of representation formats (HTML, RDFa, RDF/XML, N3, Turtle etc.)
- Determine uses of entity identifiers across the LOD cloud -- which helps you select preferred URIs based on usage statistics.
What does it offer Web 1.0 and 2.0 developers? If you don't want to use the SPARQL based Web Service, or other Linked Data Web oriented APIs for interacting with the LOD cloud programmatically, you can simply use the powerful REST style Web Service that provides URL parameters for performing full text oriented "Search", entity oriented "Find" queries, and faceted navigation over the huge data corpus with results data returned in JSON and XML formats. Next Steps: Amazon have agreed to add all the LOD Cloud data sets to their existing public data sets collective. Thus, the data sets we are loading will be available in "raw data" (RDF) format on the public data sets page via Named Elastic Block Storage (EBS) Snapshots); meaning, you can make an EC2 AMI (e.g. a Linux, Windows, Solaris) and install an RDF quad or triple store of choice into your AMI, then simply load data from the LOD cloud based on your needs. In addition to the above, we are also going to offer a Virtuoso 6.0 Cluster Edition based LOD Cloud AMI (as we've already done with DBpedia, MusicBrainz, NeuroCommons, and Bio2Rdf) that will enable you to simply instantiate a personal and service specific edition of Virtuoso with all the LOD data in place and fully tuned for performance and scalability; basically, you will simply press "Instantiate AMI" and a LOD cloud data space, in true Linked Data from, will be at your disposal within minutes (i.e. the time it takes the DB to start). Work on the migration of the LOD data to EC2 starts this week. Thus, if you are interested in contributing an RDF based data set to the LOD cloud now is the time to get your archive links in place on the (see: ESW Wiki page for LOD Data Sets).
|
03/30/2009 11:27 GMT
|
Modified:
04/01/2009 14:26 GMT
|
See the Lite: Embeddable/Background Virtuoso starts at 25MB
[
Orri Erling
]
We have received many requests for an embeddable-scale Virtuoso. In response to this, we have added a Lite mode, where the initial size of a server process is a tiny fraction of what the initial size would be with default settings. With 2MB of disk cache buffers (ini file setting, NumberOfBuffers = 256), the process size stays under 30MB on 32-bit Linux.
The value of this is that one can now have RDF and full text indexing on the desktop without running a Java VM or any other memory-intensive software. And of course, all of SQL (transactions, stored procedures, etc.) is in the same embeddably-sized container.
The Lite executable is a full Virtuoso executable; the Lite mode is controlled by a switch in the configuration file. The executable size is about 10MB for 32-bit Linux. A database created in the Lite mode will be converted into a fully-featured database (tables and indexes are added, among other things) if the server is started with the Lite setting "off"; functionality can be reverted to Lite mode, though it will now consume somewhat more memory, etc.
Lite mode offers full SQL and SPARQL/SPARUL (via SPASQL), but disables all HTTP-based services (WebDAV, application hosting, etc.). Clients can still use all typical database access mechanisms (i.e., ODBC, JDBC, OLE-DB, ADO.NET, and XMLA) to connect, including the Jena and Sesame frameworks for RDF. ODBC now offers full support of RDF data types for C-based clients. A Redland-compatible API also exists, for use with Redland v1.0.8 and later.
Especially for embedded use, we now allow restricting the listener to be a Unix socket, which allows client connections only from the localhost.
Shipping an embedded Virtuoso is easy. It just takes one executable and one configuration file. Performance is generally comparable to "normal" mode, except that Lite will be somewhat less scalable on multicore systems.
The Lite mode will be included in the next Virtuoso 5 Open Source release.
|
12/17/2008 09:34 GMT
|
Modified:
12/17/2008 12:03 GMT
|
See the Lite: Embeddable/Background Virtuoso starts at 25MB
[
Virtuso Data Space Bot
]
We have received many requests for an embeddable-scale Virtuoso. In response to this, we have added a Lite mode, where the initial size of a server process is a tiny fraction of what the initial size would be with default settings. With 2MB of disk cache buffers (ini file setting, NumberOfBuffers = 256), the process size stays under 30MB on 32-bit Linux.
The value of this is that one can now have RDF and full text indexing on the desktop without running a Java VM or any other memory-intensive software. And of course, all of SQL (transactions, stored procedures, etc.) is in the same embeddably-sized container.
The Lite executable is a full Virtuoso executable; the Lite mode is controlled by a switch in the configuration file. The executable size is about 10MB for 32-bit Linux. A database created in the Lite mode will be converted into a fully-featured database (tables and indexes are added, among other things) if the server is started with the Lite setting "off"; functionality can be reverted to Lite mode, though it will now consume somewhat more memory, etc.
Lite mode offers full SQL and SPARQL/SPARUL (via SPASQL), but disables all HTTP-based services (WebDAV, application hosting, etc.). Clients can still use all typical database access mechanisms (i.e., ODBC, JDBC, OLE-DB, ADO.NET, and XMLA) to connect, including the Jena and Sesame frameworks for RDF. ODBC now offers full support of RDF data types for C-based clients. A Redland-compatible API also exists, for use with Redland v1.0.8 and later.
Especially for embedded use, we now allow restricting the listener to be a Unix socket, which allows client connections only from the localhost.
Shipping an embedded Virtuoso is easy. It just takes one executable and one configuration file. Performance is generally comparable to "normal" mode, except that Lite will be somewhat less scalable on multicore systems.
The Lite mode will be included in the next Virtuoso 5 Open Source release.
|
12/17/2008 09:34 GMT
|
Modified:
12/17/2008 12:03 GMT
|
Virtuoso Vs. MySQL: Setting the Berlin Record Straight (update 2)
[
Orri Erling
]
In the context of the Berlin SPARQL Benchmark, I have repeatedly written about measurement procedures and steady state. The point is that the numbers at larger scales are unreliable due to cache behavior if one is not careful about measurement and does not have adequate warmup. Thus it came to pass that one cut of the BSBM paper had 3 seconds for MySQL and 100 for Virtuoso, basically through ignoring cache effects.
So we decided to do it ourselves.
The score is (updated with revised innodb_buffer_pool_size setting, based on advice noted down below):
| n-clients |
Virtuoso |
MySQL (with increased buffer pool size) |
MySQL (with default buffer poll size) |
| 1 |
41,161.33 |
27,023.11 |
12,171.41 |
| 4 |
127,918.30 |
(pending) |
37,566.82 |
| 8 |
218,162.29 |
105,524.23 |
51,104.39 |
| 16 |
214,763.58 |
98,852.42 |
47,589.18 |
The metric is the query mixes per hour from the BSBM test driver output. For the interested, the complete output is here.
The benchmark is pure SQL, nothing to do with SPARQL or RDF.
The hardware is 2 x Xeon 5345 (2 x quad core, 2.33 GHz), 16 G RAM. The OS is 64-bit Debian Linux.
The benchmark was run at a scale of 200,000. Each run had 2000 warm-up query mixes and 500 measured query mixes, which gives steady state, eliminating any effects of OS disk cache and the like. Both databases were configured to use 8G for disk cache. The test effectively runs from memory. We ran an analyze table on each MySQL table but noticed that this had no effect. Virtuoso does the stats sampling on the go; possibly MySQL also since the explicit stats did not make any difference. The MySQL tables were served by the InnoDB engine. MySQL appears to cache results of queries in some cases. This was not apparent in the tests.
The versions are 5.09 for Virtuoso and 5.1.29 for MySQL. You can download and examine --
MySQL ought to do better. We suspect that here, just as in the TPC-D experiment we made way back, the query plans are not quite right. Also we rarely saw over 300% CPU utilization for MySQL. It is possible there is a config parameter that affects this. The public is invited to tell us about such.
Update:
Andreas Schultz of the BSBM team advised us to increase the innodb_buffer_pool_size setting in the MySQL config. We did and it produced some improvement. Indeed, this is more like it, as we now see CPU utilization around 700% instead of the 300% in the previously published run, which rendered it suspect. Also, our experiments with TPC-D led us to expect better. We ran these things a few times so as to have warm cache.
On the first run, we noticed that the Innodb warm up time was somewhere well in excess of 2000 query mixes. Another time, we should make a graph of throughput as a function of time for both MySQL and Virtuoso. We recently made a greedy prefetch hack that should give us some mileage there. For the next BSBM, all we can advise is to run larger scale system for half an hour first and then measure and then measure again. If the second measurement is the same as the first then it is good.
As always, since MySQL is not our specialty, we confidently invite the public to tell us how to make it run faster. So, unless something more turns up, our next trial is a revisit of TPC-H.
|
11/20/2008 11:06 GMT
|
Modified:
11/24/2008 10:15 GMT
|
Virtuoso Vs. MySQL: Setting the Berlin Record Straight (update 2)
[
Virtuso Data Space Bot
]
In the context of the Berlin SPARQL Benchmark, I have repeatedly written about measurement procedures and steady state. The point is that the numbers at larger scales are unreliable due to cache behavior if one is not careful about measurement and does not have adequate warmup. Thus it came to pass that one cut of the BSBM paper had 3 seconds for MySQL and 100 for Virtuoso, basically through ignoring cache effects.
So we decided to do it ourselves.
The score is (updated with revised innodb_buffer_pool_size setting, based on advice noted down below):
| n-clients |
Virtuoso |
MySQL (with increased buffer pool size) |
MySQL (with default buffer poll size) |
| 1 |
41,161.33 |
27,023.11 |
12,171.41 |
| 4 |
127,918.30 |
(pending) |
37,566.82 |
| 8 |
218,162.29 |
105,524.23 |
51,104.39 |
| 16 |
214,763.58 |
98,852.42 |
47,589.18 |
The metric is the query mixes per hour from the BSBM test driver output. For the interested, the complete output is here.
The benchmark is pure SQL, nothing to do with SPARQL or RDF.
The hardware is 2 x Xeon 5345 (2 x quad core, 2.33 GHz), 16 G RAM. The OS is 64-bit Debian Linux.
The benchmark was run at a scale of 200,000. Each run had 2000 warm-up query mixes and 500 measured query mixes, which gives steady state, eliminating any effects of OS disk cache and the like. Both databases were configured to use 8G for disk cache. The test effectively runs from memory. We ran an analyze table on each MySQL table but noticed that this had no effect. Virtuoso does the stats sampling on the go; possibly MySQL also since the explicit stats did not make any difference. The MySQL tables were served by the InnoDB engine. MySQL appears to cache results of queries in some cases. This was not apparent in the tests.
The versions are 5.09 for Virtuoso and 5.1.29 for MySQL. You can download and examine --
MySQL ought to do better. We suspect that here, just as in the TPC-D experiment we made way back, the query plans are not quite right. Also we rarely saw over 300% CPU utilization for MySQL. It is possible there is a config parameter that affects this. The public is invited to tell us about such.
Update:
Andreas Schultz of the BSBM team advised us to increase the innodb_buffer_pool_size setting in the MySQL config. We did and it produced some improvement. Indeed, this is more like it, as we now see CPU utilization around 700% instead of the 300% in the previously published run, which rendered it suspect. Also, our experiments with TPC-D led us to expect better. We ran these things a few times so as to have warm cache.
On the first run, we noticed that the Innodb warm up time was somewhere well in excess of 2000 query mixes. Another time, we should make a graph of throughput as a function of time for both MySQL and Virtuoso. We recently made a greedy prefetch hack that should give us some mileage there. For the next BSBM, all we can advise is to run larger scale system for half an hour first and then measure and then measure again. If the second measurement is the same as the first then it is good.
As always, since MySQL is not our specialty, we confidently invite the public to tell us how to make it run faster. So, unless something more turns up, our next trial is a revisit of TPC-H.
|
11/20/2008 11:06 GMT
|
Modified:
11/24/2008 10:15 GMT
|
Welcoming Freebase to the Linked Data Web
[
Kingsley Uyi Idehen
]
Finally! That's all I can say re. Freebase :-) They've now plugged their database and their community driven data curation efforts into the burgeoning Linked Data Web.
Here are some examples of how we distill Entities (People, Places, Music, and other things) from Freebase (X)HTML pages (meaning: we don't have to start from RDF information resources as data sources for the eventual RDF Linked Data we generate):
Tip: Install our OpenLink Data Explorer extension for Firefox. Once installed, simply browse through Freebase, and whenever you encounter a page about something of interest, simply use the following sequences to distill (via the Page Description feature) the entities from the page you are reading:
-
CTRL-Click (Mac OS X)
-
Right+Click (Windows & Linux)
Related
|
10/31/2008 15:02 GMT
|
Modified:
10/31/2008 11:23 GMT
|
|
|