Details

OpenLink Software
Burlington, United States

Subscribe

Post Categories

Recent Articles

Community Member Blogs

Display Settings

articles per page.
order.

Translate

Showing posts in all categories RefreshRefresh
Benchmarks, Redux (part 15): BSBM Test Driver Enhancements [ Orri Erling ]

This article covers the changes we have made to the BSBM test driver during our series of experiments.

  • Drill-down mode - For queries that have a product type as parameter, the test driver will invoke the query multiple times with each time a random subtype of the product type of the previous invocation. The starting point of the drill-down is an a random type from a settable level in the hierarchy. The rationale for the drill-down mode is that depending on the parameter choice, there can be 1000x differences in query run time. Thus run times of consecutive query mixes will be incomparable unless we guarantee that each mix has a predictable number of queries with a product type from each level in the hierarchy.

  • Permutation of query mix - In the BI workload, the queries are run in a random order on each thread in multiuser mode. Doing exactly the same thing on many threads is not realistic for large queries. The data access patterns must be spread out in order to evaluate how bulk IO is organized with differing concurrent demands. The permutations are deterministic on consecutive runs and do not depend on the non-deterministic timing of concurrent activities. For queries with a drill-down, the individual executions that make up the drill-down are still consecutive.
  • New metrics - The BI Power is the geometric mean of query run times scaled to queries per hour and multiplied by the scale factor, where 100 Mt is considered the unit scale. The BI Throughput is the arithmetic mean of the run times scaled to QPH and adjusted to scale as with the Power metric. These are analogous to the TPC-H Power and Throughput metrics.

    The Power is defined as

    (scale_factor / 284826) * 3600 / ((t0 * t1 * ... * tn) ^(1 / n))

    The Throughput is defined as

    (scale_factor / 284826) * 3600 / ((t0 + t2 + ... + tn) / n)

    The magic number 284826 is the scale that generates approximately 100 million triples (100 Mt). We consider this "scale one." The reason for the multiplication is that scores at different scales should get similar numbers, otherwise 10x larger scale would result roughly in 10x lower throughput with the BI queries.

    We also show the percentage each query represents from the total time the test driver waits for responses.

  • Deadlock retry - When running update mixes, it is possible that a transaction gets aborted by a deadlock. We have made a retry logic for this.

  • Cluster mode - Cluster databases may have multiple interchangeable HTTP listeners. With this mode, one can specify multiple end-points so a multi-user workload can divide itself evenly over these.

  • Identifying matter - A version number was added to test driver output. Use of the new switches is also indicated in the test driver output.

  • SUT CPU - In comparing results it is crucial to differentiate between in memory runs and IO bound runs. To make this easier, we have added an option to report server CPU times over the timed portion (excluding warm-ups). A pluggable self-script determines the CPU times for the system; thus clusters can be handled, too. The time is given as a sum of the time the server processes have aged during the run and as a percentage over the wall-clock time.

These changes will soon be available as a diff and as a source tree. This version is labeled BSBM Test Driver 1.1-opl; the -opl signifies OpenLink additions.

We invite FU Berlin to include these enhancements into their Source Forge repository of the BSBM test driver. There is more precise documentation of these options in the README file in the above distribution.

The next planned upgrade of the test driver concerns adding support for "RDF-H", the RDF adaptation of the industry standard TPC-H decision support benchmark for RDBMS.

Benchmarks, Redux Series

# PermaLink Comments [0]
03/22/2011 18:32 GMT Modified: 03/22/2011 17:04 GMT
Benchmarks, Redux (part 15): BSBM Test Driver Enhancements [ Virtuso Data Space Bot ]

This article covers the changes we have made to the BSBM test driver during our series of experiments.

  • Drill-down mode - For queries that have a product type as parameter, the test driver will invoke the query multiple times with each time a random subtype of the product type of the previous invocation. The starting point of the drill-down is an a random type from a settable level in the hierarchy. The rationale for the drill-down mode is that depending on the parameter choice, there can be 1000x differences in query run time. Thus run times of consecutive query mixes will be incomparable unless we guarantee that each mix has a predictable number of queries with a product type from each level in the hierarchy.

  • Permutation of query mix - In the BI workload, the queries are run in a random order on each thread in multiuser mode. Doing exactly the same thing on many threads is not realistic for large queries. The data access patterns must be spread out in order to evaluate how bulk IO is organized with differing concurrent demands. The permutations are deterministic on consecutive runs and do not depend on the non-deterministic timing of concurrent activities. For queries with a drill-down, the individual executions that make up the drill-down are still consecutive.
  • New metrics - The BI Power is the geometric mean of query run times scaled to queries per hour and multiplied by the scale factor, where 100 Mt is considered the unit scale. The BI Throughput is the arithmetic mean of the run times scaled to QPH and adjusted to scale as with the Power metric. These are analogous to the TPC-H Power and Throughput metrics.

    The Power is defined as

    (scale_factor / 284826) * 3600 / ((t0 * t1 * ... * tn) ^(1 / n))

    The Throughput is defined as

    (scale_factor / 284826) * 3600 / ((t0 + t2 + ... + tn) / n)

    The magic number 284826 is the scale that generates approximately 100 million triples (100 Mt). We consider this "scale one." The reason for the multiplication is that scores at different scales should get similar numbers, otherwise 10x larger scale would result roughly in 10x lower throughput with the BI queries.

    We also show the percentage each query represents from the total time the test driver waits for responses.

  • Deadlock retry - When running update mixes, it is possible that a transaction gets aborted by a deadlock. We have made a retry logic for this.

  • Cluster mode - Cluster databases may have multiple interchangeable HTTP listeners. With this mode, one can specify multiple end-points so a multi-user workload can divide itself evenly over these.

  • Identifying matter - A version number was added to test driver output. Use of the new switches is also indicated in the test driver output.

  • SUT CPU - In comparing results it is crucial to differentiate between in memory runs and IO bound runs. To make this easier, we have added an option to report server CPU times over the timed portion (excluding warm-ups). A pluggable self-script determines the CPU times for the system; thus clusters can be handled, too. The time is given as a sum of the time the server processes have aged during the run and as a percentage over the wall-clock time.

These changes will soon be available as a diff and as a source tree. This version is labeled BSBM Test Driver 1.1-opl; the -opl signifies OpenLink additions.

We invite FU Berlin to include these enhancements into their Source Forge repository of the BSBM test driver. There is more precise documentation of these options in the README file in the above distribution.

The next planned upgrade of the test driver concerns adding support for "RDF-H", the RDF adaptation of the industry standard TPC-H decision support benchmark for RDBMS.

Benchmarks, Redux Series

# PermaLink Comments [0]
03/22/2011 18:32 GMT Modified: 03/22/2011 17:04 GMT
Benchmarks, Redux (part 10): LOD2 and the Benchmark Process [ Orri Erling ]

I have in the previous posts generally argued for and demonstrated the usefulness of benchmarks.

Here I will talk about how this could be organized in a way that is tractable, and takes vendor and end user interests into account. These are my views on the subject and do not represent a LOD2 members consensus, but have been discussed in the consortium.

My colleague Ivan Mikhailov once proposed that the only way to get benchmarks run right is to package them as a single script that does everything, like instant noodles -- just add water! But even instant noodles can be abused: Cook too long, add too much water, maybe forget to light the stove, and complain that the result is unsatisfyingly hard and brittle, lacking the suppleness one has grown to expect from this delicacy. No, the answer lies at the other end of the culinary spectrum, in gourmet cooking. Let the best cooks show what they can do, and let them work at it; let those who in fact have capacity and motivation for creating le chef d'oeuvre culinaire ("the culinary masterpiece") create it. Even so, there are many value points along the dimensions of preparation time, cost, and esthetic layout, not to forget taste and nutritional values. Indeed, an intimate knowledge de la vie secrete du canard ("the secret life of duck") is required in order to liberate the aroma that it might take flight and soar. In the previous, I have shed some light on how we prepare le canard, and if le canard be such then la dinde (turkey) might in some ways be analogous; who is to say?

In other words, as a vendor, we want to have complete control over the benchmarking process, and have it take place in our environment at a time of our choice. In exchange for this, we are ready to document and observe possibly complicated rules, document how the runs are made, and let others monitor and repeat them on the equipment on which the results are obtained. This is the TPC (Transaction Processing Performance Council) model.

Another culture of doing benchmarks is the periodic challenge model used in TREC, the Billion Triples Challenge, the Semantic Search Challenge and others. In this model, vendors prepare the benchmark submission and agree to joint publication.

A third party performing benchmarks by itself is uncommon in databases. Licenses even often explicitly prohibit this, for understandable reasons.

The LOD2 project has an outreach activity called Publink where we offer to help owners of data to publish it as Linked Data. Similarly, since FP 7s are supposed to offer a visible service to their communities, I proposed that LOD2 offer to serve a role in disseminating and auditing RDF store benchmarks.

One representative of an RDF store vendor I talked to, in relation to setting up a benchmark configuration of their product, told me that we could do this and that they would give some advice but that such an exercise was by its nature fundamentally flawed and could not possibly produce worthwhile results. The reason for this was that OpenLink engineers could not possibly learn enough about the other products nor unlearn enough of their own to make this a meaningful comparison.

Isn't this the very truth? Let the chefs mix their own spices.

This does not mean that there would not be comparability of results. If the benchmarks and processes are well defined, documented, and checked by a third party, these can be considered legitimate and not just one-off best-case results without further import.

In order to stretch the envelope, which is very much a LOD2 goal, this benchmarking should be done on a variety of equipment -- whatever works best at the scale in question. Increasing the scale remains a stated objective. LOD2 even promised to run things with a trillion triples in another 3 years.

Imagine that the unimpeachably impartial Berliners made house calls. Would this debase Justice to be a servant of mere show-off? Or would this on the contrary combine strict Justice with edifying Charity? Who indeed is in greater need of the light of objective evaluation than the vendor whose very nature makes a being of bias and prejudice?

Even better, CWI, with its stellar database pedigree, agreed in principle to audit RDF benchmarks in LOD2.

In this way one could get a stamp of approval for one's results regardless of when they were produced, and be free of the arbitrary schedule of third party benchmarking runs. On the relational side this is a process of some cost and complexity, but since the RDF side is still young and more on mutually friendly terms, the process can be somewhat lighter here. I did promise to draft some extra descriptions of process and result disclosure so that we could see how this goes.

We could even do this unilaterally -- just publish Virtuoso results according to a predefined reporting and verification format. If others wished to publish by the same rules, LOD2 could use some of the benchmarking funds for auditing the proceedings. This could all take place over the net, so we are not talking about any huge cost or prohibitive amount of trouble. It would be in the FP7 spirit that LOD2 provide this service for free, naturally within reason.

Then there is the matter of the BSBM Business Intelligence (BI) mix. At present, it seems everybody has chosen to defer the matter to another round of BSBM runs in the summer. This seems to fit the pattern of a public challenge with a few months given for contenders to prepare their submissions. Here we certainly should look at bigger scales and more diverse hardware than in the Berlin runs published this time around. The BI workload is in fact fairly cluster friendly, with big joins and aggregations that parallelize well. There it would definitely make sense to reserve an actual cluster, and have all contenders set up their gear on it. If all have access to the run environment and to monitoring tools, we can be reasonably sure that things will be done in a transparent manner.

(I will talk about the BI mix in more detail in part 13 and part 14 of this series.)

Once the BI mix has settled and there are a few interoperable implementations, likely in the summer, we could pass from the challenge model to a situation where vendors may publish results as they become available, with LOD2 offering its services for audit.

Of course, this could be done even before then, but the content of the mix might not be settled. We likely need to check it on a few implementations first.

For equipment, people can use their own, or LOD2 partners might on a case-by-case basis make some equipment available for running on the same hardware on which say the Virtuoso results were obtained. For example, FU Berlin could give people a login to get their recently published results fixed. Now this might or might not happen, so I will not hold my breath waiting for this but instead close with a proposal.

As a unilateral diplomatic overture I put forth the following: If other vendors are interested in 1:1 comparison of their results with our publications, we can offer them a login to the same equipment. They can set up and tune their systems, and perform the runs. We will just watch. As an extra quid pro quo, they can try Virtuoso as configured for the results we have published, with the same data. Like this, both parties get to see the others' technology with proper tuning and installation. What, if anything, is reported about this activity is up to the owner of the technology being tested. We will publish a set of benchmark rules that can serve as a guideline for mutually comparable reporting, but we cannot force anybody to use these. This all will function as a catalyst for technological advance, all to the ultimate benefit of the end user. If you wish to take advantage of this offer, you may contact Hugh Williams at OpenLink Software, and we will see how this can be arranged in practice.

The next post will talk about the actual content of benchmarks. The milestone after this will be when we publish the measurement and reporting protocols.

Benchmarks, Redux Series

# PermaLink Comments [0]
03/10/2011 18:29 GMT Modified: 03/14/2011 19:36 GMT
Benchmarks, Redux (part 10): LOD2 and the Benchmark Process [ Virtuso Data Space Bot ]

I have in the previous posts generally argued for and demonstrated the usefulness of benchmarks.

Here I will talk about how this could be organized in a way that is tractable, and takes vendor and end user interests into account. These are my views on the subject and do not represent a LOD2 members consensus, but have been discussed in the consortium.

My colleague Ivan Mikhailov once proposed that the only way to get benchmarks run right is to package them as a single script that does everything, like instant noodles -- just add water! But even instant noodles can be abused: Cook too long, add too much water, maybe forget to light the stove, and complain that the result is unsatisfyingly hard and brittle, lacking the suppleness one has grown to expect from this delicacy. No, the answer lies at the other end of the culinary spectrum, in gourmet cooking. Let the best cooks show what they can do, and let them work at it; let those who in fact have capacity and motivation for creating le chef d'oeuvre culinaire ("the culinary masterpiece") create it. Even so, there are many value points along the dimensions of preparation time, cost, and esthetic layout, not to forget taste and nutritional values. Indeed, an intimate knowledge de la vie secrete du canard ("the secret life of duck") is required in order to liberate the aroma that it might take flight and soar. In the previous, I have shed some light on how we prepare le canard, and if le canard be such then la dinde (turkey) might in some ways be analogous; who is to say?

In other words, as a vendor, we want to have complete control over the benchmarking process, and have it take place in our environment at a time of our choice. In exchange for this, we are ready to document and observe possibly complicated rules, document how the runs are made, and let others monitor and repeat them on the equipment on which the results are obtained. This is the TPC (Transaction Processing Performance Council) model.

Another culture of doing benchmarks is the periodic challenge model used in TREC, the Billion Triples Challenge, the Semantic Search Challenge and others. In this model, vendors prepare the benchmark submission and agree to joint publication.

A third party performing benchmarks by itself is uncommon in databases. Licenses even often explicitly prohibit this, for understandable reasons.

The LOD2 project has an outreach activity called Publink where we offer to help owners of data to publish it as Linked Data. Similarly, since FP 7s are supposed to offer a visible service to their communities, I proposed that LOD2 offer to serve a role in disseminating and auditing RDF store benchmarks.

One representative of an RDF store vendor I talked to, in relation to setting up a benchmark configuration of their product, told me that we could do this and that they would give some advice but that such an exercise was by its nature fundamentally flawed and could not possibly produce worthwhile results. The reason for this was that OpenLink engineers could not possibly learn enough about the other products nor unlearn enough of their own to make this a meaningful comparison.

Isn't this the very truth? Let the chefs mix their own spices.

This does not mean that there would not be comparability of results. If the benchmarks and processes are well defined, documented, and checked by a third party, these can be considered legitimate and not just one-off best-case results without further import.

In order to stretch the envelope, which is very much a LOD2 goal, this benchmarking should be done on a variety of equipment -- whatever works best at the scale in question. Increasing the scale remains a stated objective. LOD2 even promised to run things with a trillion triples in another 3 years.

Imagine that the unimpeachably impartial Berliners made house calls. Would this debase Justice to be a servant of mere show-off? Or would this on the contrary combine strict Justice with edifying Charity? Who indeed is in greater need of the light of objective evaluation than the vendor whose very nature makes a being of bias and prejudice?

Even better, CWI, with its stellar database pedigree, agreed in principle to audit RDF benchmarks in LOD2.

In this way one could get a stamp of approval for one's results regardless of when they were produced, and be free of the arbitrary schedule of third party benchmarking runs. On the relational side this is a process of some cost and complexity, but since the RDF side is still young and more on mutually friendly terms, the process can be somewhat lighter here. I did promise to draft some extra descriptions of process and result disclosure so that we could see how this goes.

We could even do this unilaterally -- just publish Virtuoso results according to a predefined reporting and verification format. If others wished to publish by the same rules, LOD2 could use some of the benchmarking funds for auditing the proceedings. This could all take place over the net, so we are not talking about any huge cost or prohibitive amount of trouble. It would be in the FP7 spirit that LOD2 provide this service for free, naturally within reason.

Then there is the matter of the BSBM Business Intelligence (BI) mix. At present, it seems everybody has chosen to defer the matter to another round of BSBM runs in the summer. This seems to fit the pattern of a public challenge with a few months given for contenders to prepare their submissions. Here we certainly should look at bigger scales and more diverse hardware than in the Berlin runs published this time around. The BI workload is in fact fairly cluster friendly, with big joins and aggregations that parallelize well. There it would definitely make sense to reserve an actual cluster, and have all contenders set up their gear on it. If all have access to the run environment and to monitoring tools, we can be reasonably sure that things will be done in a transparent manner.

(I will talk about the BI mix in more detail in part 13 and part 14 of this series.)

Once the BI mix has settled and there are a few interoperable implementations, likely in the summer, we could pass from the challenge model to a situation where vendors may publish results as they become available, with LOD2 offering its services for audit.

Of course, this could be done even before then, but the content of the mix might not be settled. We likely need to check it on a few implementations first.

For equipment, people can use their own, or LOD2 partners might on a case-by-case basis make some equipment available for running on the same hardware on which say the Virtuoso results were obtained. For example, FU Berlin could give people a login to get their recently published results fixed. Now this might or might not happen, so I will not hold my breath waiting for this but instead close with a proposal.

As a unilateral diplomatic overture I put forth the following: If other vendors are interested in 1:1 comparison of their results with our publications, we can offer them a login to the same equipment. They can set up and tune their systems, and perform the runs. We will just watch. As an extra quid pro quo, they can try Virtuoso as configured for the results we have published, with the same data. Like this, both parties get to see the others' technology with proper tuning and installation. What, if anything, is reported about this activity is up to the owner of the technology being tested. We will publish a set of benchmark rules that can serve as a guideline for mutually comparable reporting, but we cannot force anybody to use these. This all will function as a catalyst for technological advance, all to the ultimate benefit of the end user. If you wish to take advantage of this offer, you may contact Hugh Williams at OpenLink Software, and we will see how this can be arranged in practice.

The next post will talk about the actual content of benchmarks. The milestone after this will be when we publish the measurement and reporting protocols.

Benchmarks, Redux Series

# PermaLink Comments [0]
03/10/2011 18:29 GMT Modified: 03/14/2011 19:37 GMT
Benchmarks, Redux (part 1): On RDF Benchmarks [ Orri Erling ]

This post introduces a series on RDF benchmarking. In these posts I will cover the following:

  • Correct misleading information about us in the recent Berlin report: The load rate is off-the wall and the update mix is missing. We supply the right numbers and explain how to load things so that one gets decent performance.

  • Discuss configuration options for Virtuoso.

  • Tell a story about multithreading and its perils and how vectoring and scale-out can save us.

  • Analyze the run time behavior of Virtuoso 6 Single, 6 Cluster, and 7 Single.

  • Look at the benefits of SSDs (solid-state storage devices) over HDDs (hard disk devices; spinning platters), and I/O matters in general.

  • Talk in general about modalities of benchmark running, and how to reconcile vendors doing what they know best with the air of legitimacy of a third party. Whether to do things a la TPC or a la TREC? We will hopefully try a bit of both, at least so I have proposed to our partners in LOD2, the EU FP7 that also funded the recent Berlin report.

  • Outline the desiderata for an RDF benchmark that is not just an RDF-ized relational workload, the Social Intelligence Benchmark.

  • Talk about BSBM in specific. What does it measure?

  • Discuss some experiments with the BI use case of BSBM.

  • Document how the results mentioned here were obtained and suggest practices for benchmark running and disclosure.

The background is that the LOD2 FP7 project is supposed to deliver a report about the state of the art and benchmark laboratory by March 1. The Berlin report is a part thereof. In the project proposal we talk about an ongoing benchmarking activity and about having up-to-date installations of the relevant RDF stores and RDBMS.

Since this is taxpayer money for supposedly the common good, I see no reason why such a useful thing should be restricted to the project participants. On the other hand, running a display window of stuff for benchmarking, when in at least in some cases licenses prohibit unauthorized publishing of benchmark results might be seen to conflict with the spirit of the license if not its letter. We will see.

For now, my take is that we want to run benchmarks of all interesting software, inviting the vendors to tell us how to do that if they will, and maybe even letting them perform those runs themselves. Then we promise not to disclose results without the vendor's permission. Access to the installations is limited to whoever operates the equipment. Configuration files and detailed hardware specs and such on the other hand will be made public. If a run is published, it will be with permission and in a format that includes full information for replicating the experiment.

In the LOD2 proposal we also in so many words say that we will stretch the limits of the state of the art. This stretching is surely not limited to the project's own products but should also include the general benchmarking aspect. I will say with confidence that running single server benchmarks at a max 200 Mtriples of data is not stretching anything.

So to ameliorate this situation, I thought to run the same at 10x the scale on a couple of large boxes we have access to. 1 and 2 billion triples are still comfortably single server scales. Then we could go for example to Giovanni's cluster at DERI and do 10 and 20 billion triples, this should fly reasonably on 8 or 16 nodes of the DERI gear. Or we might talk to SEALS who by now should have their own lab. Even Amazon EC2 might be an option, although not the preferred one.

So I asked everybody about config instructions, which produced a certain amount of dismay as I might be said to be biased and to be skirting the edges of conflict of interest. The inquiry was not altogether negative though since Ontotext and Garlik provided some information. We will look into these this and next week. We will not publish any information without asking first.

In this series of posts I will only talk about OpenLink Software.

Benchmarks, Redux Series

# PermaLink Comments [0]
02/28/2011 15:20 GMT Modified: 03/14/2011 17:15 GMT
Benchmarks, Redux (part 1): On RDF Benchmarks [ Virtuso Data Space Bot ]

This post introduces a series on RDF benchmarking. In these posts I will cover the following:

  • Correct misleading information about us in the recent Berlin report: The load rate is off-the wall and the update mix is missing. We supply the right numbers and explain how to load things so that one gets decent performance.

  • Discuss configuration options for Virtuoso.

  • Tell a story about multithreading and its perils and how vectoring and scale-out can save us.

  • Analyze the run time behavior of Virtuoso 6 Single, 6 Cluster, and 7 Single.

  • Look at the benefits of SSDs (solid-state storage devices) over HDDs (hard disk devices; spinning platters), and I/O matters in general.

  • Talk in general about modalities of benchmark running, and how to reconcile vendors doing what they know best with the air of legitimacy of a third party. Whether to do things a la TPC or a la TREC? We will hopefully try a bit of both, at least so I have proposed to our partners in LOD2, the EU FP7 that also funded the recent Berlin report.

  • Outline the desiderata for an RDF benchmark that is not just an RDF-ized relational workload, the Social Intelligence Benchmark.

  • Talk about BSBM in specific. What does it measure?

  • Discuss some experiments with the BI use case of BSBM.

  • Document how the results mentioned here were obtained and suggest practices for benchmark running and disclosure.

The background is that the LOD2 FP7 project is supposed to deliver a report about the state of the art and benchmark laboratory by March 1. The Berlin report is a part thereof. In the project proposal we talk about an ongoing benchmarking activity and about having up-to-date installations of the relevant RDF stores and RDBMS.

Since this is taxpayer money for supposedly the common good, I see no reason why such a useful thing should be restricted to the project participants. On the other hand, running a display window of stuff for benchmarking, when in at least in some cases licenses prohibit unauthorized publishing of benchmark results might be seen to conflict with the spirit of the license if not its letter. We will see.

For now, my take is that we want to run benchmarks of all interesting software, inviting the vendors to tell us how to do that if they will, and maybe even letting them perform those runs themselves. Then we promise not to disclose results without the vendor's permission. Access to the installations is limited to whoever operates the equipment. Configuration files and detailed hardware specs and such on the other hand will be made public. If a run is published, it will be with permission and in a format that includes full information for replicating the experiment.

In the LOD2 proposal we also in so many words say that we will stretch the limits of the state of the art. This stretching is surely not limited to the project's own products but should also include the general benchmarking aspect. I will say with confidence that running single server benchmarks at a max 200 Mtriples of data is not stretching anything.

So to ameliorate this situation, I thought to run the same at 10x the scale on a couple of large boxes we have access to. 1 and 2 billion triples are still comfortably single server scales. Then we could go for example to Giovanni's cluster at DERI and do 10 and 20 billion triples, this should fly reasonably on 8 or 16 nodes of the DERI gear. Or we might talk to SEALS who by now should have their own lab. Even Amazon EC2 might be an option, although not the preferred one.

So I asked everybody about config instructions, which produced a certain amount of dismay as I might be said to be biased and to be skirting the edges of conflict of interest. The inquiry was not altogether negative though since Ontotext and Garlik provided some information. We will look into these this and next week. We will not publish any information without asking first.

In this series of posts I will only talk about OpenLink Software.

Benchmarks, Redux Series

# PermaLink Comments [0]
02/28/2011 15:20 GMT Modified: 03/14/2011 17:16 GMT
Benchmarks, Redux (part 1): On RDF Benchmarks [ Orri Erling ]

This post introduces a series on RDF benchmarking. In these posts I will cover the following:

  • Correct misleading information about us in the recent Berlin report: The load rate is off-the wall and the update mix is missing. We supply the right numbers and explain how to load things so that one gets decent performance.

  • Discuss configuration options for Virtuoso.

  • Tell a story about multithreading and its perils and how vectoring and scale-out can save us.

  • Analyze the run time behavior of Virtuoso 6 Single, 6 Cluster, and 7 Single.

  • Look at the benefits of SSDs (solid-state storage devices) over HDDs (hard disk devices; spinning platters), and I/O matters in general.

  • Talk in general about modalities of benchmark running, and how to reconcile vendors doing what they know best with the air of legitimacy of a third party. Whether to do things a la TPC or a la TREC? We will hopefully try a bit of both, at least so I have proposed to our partners in LOD2, the EU FP7 that also funded the recent Berlin report.

  • Outline the desiderata for an RDF benchmark that is not just an RDF-ized relational workload, the Social Intelligence Benchmark.

  • Talk about BSBM in specific. What does it measure?

  • Discuss some experiments with the BI use case of BSBM.

  • Document how the results mentioned here were obtained and suggest practices for benchmark running and disclosure.

The background is that the LOD2 FP7 project is supposed to deliver a report about the state of the art and benchmark laboratory by March 1. The Berlin report is a part thereof. In the project proposal we talk about an ongoing benchmarking activity and about having up-to-date installations of the relevant RDF stores and RDBMS.

Since this is taxpayer money for supposedly the common good, I see no reason why such a useful thing should be restricted to the project participants. On the other hand, running a display window of stuff for benchmarking, when in at least in some cases licenses prohibit unauthorized publishing of benchmark results might be seen to conflict with the spirit of the license if not its letter. We will see.

For now, my take is that we want to run benchmarks of all interesting software, inviting the vendors to tell us how to do that if they will, and maybe even letting them perform those runs themselves. Then we promise not to disclose results without the vendor's permission. Access to the installations is limited to whoever operates the equipment. Configuration files and detailed hardware specs and such on the other hand will be made public. If a run is published, it will be with permission and in a format that includes full information for replicating the experiment.

In the LOD2 proposal we also in so many words say that we will stretch the limits of the state of the art. This stretching is surely not limited to the project's own products but should also include the general benchmarking aspect. I will say with confidence that running single server benchmarks at a max 200 Mtriples of data is not stretching anything.

So to ameliorate this situation, I thought to run the same at 10x the scale on a couple of large boxes we have access to. 1 and 2 billion triples are still comfortably single server scales. Then we could go for example to Giovanni's cluster at DERI and do 10 and 20 billion triples, this should fly reasonably on 8 or 16 nodes of the DERI gear. Or we might talk to SEALS who by now should have their own lab. Even Amazon EC2 might be an option, although not the preferred one.

So I asked everybody about config instructions, which produced a certain amount of dismay as I might be said to be biased and to be skirting the edges of conflict of interest. The inquiry was not altogether negative though since Ontotext and Garlik provided some information. We will look into these this and next week. We will not publish any information without asking first.

In this series of posts I will only talk about OpenLink Software.

Benchmarks, Redux Series

# PermaLink Comments [0]
02/28/2011 15:20 GMT Modified: 03/14/2011 17:15 GMT
Benchmarks, Redux (part 1): On RDF Benchmarks [ Virtuso Data Space Bot ]

This post introduces a series on RDF benchmarking. In these posts I will cover the following:

  • Correct misleading information about us in the recent Berlin report: The load rate is off-the wall and the update mix is missing. We supply the right numbers and explain how to load things so that one gets decent performance.

  • Discuss configuration options for Virtuoso.

  • Tell a story about multithreading and its perils and how vectoring and scale-out can save us.

  • Analyze the run time behavior of Virtuoso 6 Single, 6 Cluster, and 7 Single.

  • Look at the benefits of SSDs (solid-state storage devices) over HDDs (hard disk devices; spinning platters), and I/O matters in general.

  • Talk in general about modalities of benchmark running, and how to reconcile vendors doing what they know best with the air of legitimacy of a third party. Whether to do things a la TPC or a la TREC? We will hopefully try a bit of both, at least so I have proposed to our partners in LOD2, the EU FP7 that also funded the recent Berlin report.

  • Outline the desiderata for an RDF benchmark that is not just an RDF-ized relational workload, the Social Intelligence Benchmark.

  • Talk about BSBM in specific. What does it measure?

  • Discuss some experiments with the BI use case of BSBM.

  • Document how the results mentioned here were obtained and suggest practices for benchmark running and disclosure.

The background is that the LOD2 FP7 project is supposed to deliver a report about the state of the art and benchmark laboratory by March 1. The Berlin report is a part thereof. In the project proposal we talk about an ongoing benchmarking activity and about having up-to-date installations of the relevant RDF stores and RDBMS.

Since this is taxpayer money for supposedly the common good, I see no reason why such a useful thing should be restricted to the project participants. On the other hand, running a display window of stuff for benchmarking, when in at least in some cases licenses prohibit unauthorized publishing of benchmark results might be seen to conflict with the spirit of the license if not its letter. We will see.

For now, my take is that we want to run benchmarks of all interesting software, inviting the vendors to tell us how to do that if they will, and maybe even letting them perform those runs themselves. Then we promise not to disclose results without the vendor's permission. Access to the installations is limited to whoever operates the equipment. Configuration files and detailed hardware specs and such on the other hand will be made public. If a run is published, it will be with permission and in a format that includes full information for replicating the experiment.

In the LOD2 proposal we also in so many words say that we will stretch the limits of the state of the art. This stretching is surely not limited to the project's own products but should also include the general benchmarking aspect. I will say with confidence that running single server benchmarks at a max 200 Mtriples of data is not stretching anything.

So to ameliorate this situation, I thought to run the same at 10x the scale on a couple of large boxes we have access to. 1 and 2 billion triples are still comfortably single server scales. Then we could go for example to Giovanni's cluster at DERI and do 10 and 20 billion triples, this should fly reasonably on 8 or 16 nodes of the DERI gear. Or we might talk to SEALS who by now should have their own lab. Even Amazon EC2 might be an option, although not the preferred one.

So I asked everybody about config instructions, which produced a certain amount of dismay as I might be said to be biased and to be skirting the edges of conflict of interest. The inquiry was not altogether negative though since Ontotext and Garlik provided some information. We will look into these this and next week. We will not publish any information without asking first.

In this series of posts I will only talk about OpenLink Software.

Benchmarks, Redux Series

# PermaLink Comments [0]
02/28/2011 15:20 GMT Modified: 03/14/2011 17:16 GMT
DBpedia + BBC (combined) Linked Data Space Installation Guide [ Kingsley Uyi Idehen ]

What?

The DBpedia + BBC Combo Linked Dataset is a preconfigured Virtuoso Cluster (4 Virtuoso Cluster Nodes, each comprised of one Virtuoso Instance; initial deployment is to a single Cluster Host, but license may be converted for physically distributed deployment), available via the Amazon EC2 Cloud, preloaded with the following datasets:

Why?

The BBC has been publishing Linked Data from its Web Data Space for a number of years. In line with best practices for injecting Linked Data into the World Wide Web (Web), the BBC datasets are interlinked with other datasets such as DBpedia and MusicBrainz.

Typical follow-your-nose exploration using a Web Browser (or even via sophisticated SPARQL query crawls) isn't always practical once you get past the initial euphoria that comes from comprehending the Linked Data concept. As your queries get more complex, the overhead of remote sub-queries increases its impact, until query results take so long to return that you simply give up.

Thus, maximizing the effects of the BBC's efforts requires Linked Data that shares locality in a Web-accessible Data Space — i.e., where all Linked Data sets have been loaded into the same data store or warehouse. This holds true even when leveraging SPARQL-FED style virtualization — there's always a need to localize data as part of any marginally-decent locality-aware cost-optimization algorithm.

This DBpedia + BBC dataset, exposed via a preloaded and preconfigured Virtuoso Cluster, delivers a practical point of presence on the Web for immediate and cost-effective exploitation of Linked Data at the individual and/or service specific levels.

How?

To work through this guide, you'll need to start with 90 GB of free disk space. (Only 41 GB will be consumed after you delete the installer archives, but starting with 90+ GB ensures enough work space for the installation.)

Install Virtuoso

  1. Download Virtuoso installer archive(s). You must deploy the Personal or Enterprise Edition; the Open Source Edition does not support Shared-Nothing Cluster Deployment.

  2. Obtain a Virtuoso Cluster license.

  3. Install Virtuoso.

  4. Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell and install directory):

    . /opt/virtuoso/virtuoso-enterprise.sh
  5. Optional: To keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME environment variable to a different directory, e.g.,

    export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/

    Note: You will have to adjust this setting every time you shift between this cluster setup and your single-server setup. Either may be made your environment's default through the virtuoso-enterprise.sh and related scripts.

  6. Set up your cluster by running the mkcluster.sh script. Note that initial deployment of the DBpedia + BBC Combo requires a 4 node cluster, which is the default for this script.

  7. Start the Virtuoso Cluster with this command:

    virtuoso-start.sh
  8. Stop the Virtuoso Cluster with this command:

    virtuoso-stop.sh

Using the DBpedia + BBC Combo dataset

  1. Navigate to your installation directory.

  2. Download the combo dataset installer script — bbc-dbpedia-install.sh.

  3. For best results, set the downloaded script to fully executable using this command:

    chmod 755 bbc-dbpedia-install.sh
  4. Shut down any Virtuoso instances that may be currently running.

  5. Optional: As above, if you have decided to keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME environment variable appropriately, e.g.,

    export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
  6. Run the combo dataset installer script with this command:

    sh bbc-dbpedia-install.sh

Verify installation

The combo dataset typically deploys to EC2 virtual machines in under 90 minutes; your time will vary depending on your network connection speed, machine speed, and other variables.

Once the script completes, perform the following steps:

  1. Verify that the Virtuoso Conductor (HTTP-based Admin UI) is in place via:

    http://localhost:[port]/conductor
  2. Verify that the Virtuoso SPARQL endpoint is in place via:

    http://localhost:[port]/sparql
  3. Verify that the Precision Search & Find UI is in place via:

    http://localhost:[port]/fct
  4. Verify that the Virtuoso hosted PivotViewer is in place via:

    http://localhost:[port]/PivotViewer

Related

# PermaLink Comments [0]
02/17/2011 17:15 GMT Modified: 03/29/2011 10:09 GMT
Virtuoso + DBpedia 3.6 Installation Guide (Update 1) [ Kingsley Uyi Idehen ]

What is DBpedia?

DBpedia is a community effort to provide a contemporary deductive database derived from Wikipedia content. Project contributions can be partitioned as follows:

  1. Ontology Construction and Maintenance
  2. Dataset Generation via Wikipedia Content Extraction & Transformation
  3. Live Database Maintenance & Administration -- includes actual Linked Data loading and publishing, provision of SPARQL endpoint, and traditional DBA activity
  4. Internationalization.

Why is DBpedia important?

Comprising the nucleus of the Linked Open Data effort, DBpedia also serves as a fulcrum for the burgeoning Web of Linked Data by delivering a dense and highly-interlinked lookup database. In its most basic form, DBpedia is a great source of strong and resolvable identifiers for People, Places, Organizations, Subject Matter, and many other data items of interest. Naturally, it provides a fantastic starting point for comprehending the fundamental concepts underlying TimBL's initial Linked Data meme.

How do I use DBpedia?

Depending on your particular requirements, whether personal or service-specific, DBpedia offers the following:

  • Datasets that can be loaded on your deductive database (also known as triple or quad stores) platform of choice
  • Live browsable HTML+RDFa based entity description pages
  • A wide variety of data formats for importing entity description data into a broad range of existing applications and services
  • A SPARQL endpoint allowing ad-hoc querying over HTTP using the SPARQL query language, and delivering results serialized in a variety of formats
  • A broad variety of tools covering query by example, faceted browsing, full text search, entity name lookups, etc.

What is the DBpedia 3.6 + Virtuoso Cluster Edition Combo?

OpenLink Software has preloaded the DBpedia 3.6 datasets into a preconfigured Virtuoso Cluster Edition database, and made the package available for easy installation.

Why is the DBpedia+Virtuoso package important?

The DBpedia+Virtuoso package provides a cost-effective option for personal or service-specific incarnations of DBpedia.

For instance, you may have a service that isn't best-served by competing with the rest of the world for ad-hoc query time and resources on the live instance, which itself operates under various restrictions which enable this ad-hoc query service to be provided at Web Scale.

Now you can easily commission your own instance and quickly exploit DBpedia and Virtuoso's database feature set to the max, powered by your own hardware and network infrastructure.

How do I use the DBpedia+Virtuoso package?

Pre-requisites are simply:

  1. Functional Virtuoso Cluster Edition installation.
  2. Virtuoso Cluster Edition License.
  3. 90 GB of free disk space -- you ultimately only need 43 gigs, but this our recommended free disk space size pre installation completion.

To install the Virtuoso Cluster Edition simply perform the following steps:

  1. Download Software.
  2. Run installer
  3. Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell):

    . /opt/virtuoso/virtuoso-enterprise.sh
  4. Run the mkcluster.sh script which defaults to a 4 node cluster
  5. Set VIRTUOSO_HOME environment variable -- if you want to start cluster databases distinct from single server databases via distinct root directory for database files (one that isn't adjacent to single-server database directories)
  6. Start Virtuoso Cluster Edition instances using command:
    virtuoso-start.sh
  7. Stop Virtuoso Cluster Edition instances using command:
    virtuoso-stop.sh

To install your personal or service specific edition of DBpedia simply perform the following steps:

  1. Navigate to your installation directory
  2. Download Installer script (dbpedia-install.sh)
  3. Set execution mode on script using command:
    chmod 755 dbpedia-install.sh
  4. Shutdown any Virtuoso instances that may be currently running
  5. Set your VIRTUOSO_HOME environment variable, e.g., to the current directory, via command (this may vary depending on your shell):
    export VIRTUOSO_HOME=`pwd`
  6. Run script using command:
    sh dbpedia-install.sh

Once the installation completes (approximately 1 hour and 30 minutes from start time), perform the following steps:

  1. Verify that the Virtuoso Conductor (HTML based Admin UI) is in place via:
    http://localhost:[port]/conductor
  2. Verify that the Precision Search & Find UI is in place via:
    http://localhost:[port]/fct
  3. Verify that DBpedia's Green Entity Description Pages are in place via:
    http://localhost:[port]/resource/DBpedia

Related

# PermaLink Comments [0]
01/24/2011 20:08 GMT Modified: 01/25/2011 14:46 GMT
 <<     | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform