Details
OpenLink Software
Burlington, United States
Subscribe
Post Categories
Recent Articles
Community Member Blogs
Display Settings
Translate
|
Showing posts in all categories Refresh
Virtuoso 7 Release
[
Virtuso Data Space Bot
]
The quest of OpenLink Software is to bring flexibility, efficiency, and expressive power to people working with data. For the past several years, this has been focused on making graph data models viable for the enterprise. Flexibility in schema evolution is a central aspect of this, as is the ability to share identifiers across different information systems, i.e., giving things URIs instead of synthetic keys that are not interpretable outside of a particular application.
With Virtuoso 7, we dramatically improve the efficiency of all this. With databases in the billions of relations (also known as triples, or 3-tuples), we can fit about 3x as many relations in the same space (disk and RAM) as with Virtuoso 6. Single-threaded query speed is up to 3x better, plus there is intra-query parallelization even in single-server configurations. Graph data workloads are all about random lookups. With these, having data in RAM is all-important. With 3x space efficiency, you can run with 3x more data in the same space before starting to go to disk. In some benchmarks, this can make a 20x gain.
Also the Virtuoso scale-out support is fundamentally reworked, with much more parallelism and better deployment flexibility.
So, for graph data, Virtuoso 7 is a major step in the coming of age of the technology. Data keeps growing and time is getting scarcer, so we need more flexibility and more performance at the same time.
So, let’s talk about how we accomplish this. Column stores have been the trend in relational data warehousing for over a decade. With column stores comes vectored execution, i.e., running any operation on a large number of values at one time. Instead of running one operation on one value, then the next operation on the result, and so forth, you run the first operation on thousands or hundreds-of-thousands of values, then the next one on the results of this, and so on.
Column-wise storage brings space efficiency, since values in one column of a table tend to be alike -- whether repeating, sorted, within a specific range, or picked from a particular set of possible values. With graph data, where there are no columns as such, the situation is exactly the same -- just substitute the word predicate for column. Space efficiency brings speed -- first by keeping more of the data in memory; secondly by having less data travel between CPU and memory. Vectoring makes sure that data that are closely located get accessed in close temporal proximity, hence improving cache utilization. When there is no locality, there are a lot of operations pending at the same time, as things always get done on a set of values instead of on a single value. This is the crux of the science of columns and vectoring.
Of the prior work in column stores, Virtuoso may most resemble Vertica, well described in Daniel Abadi’s famous PhD thesis. Virtuoso itself is described in IEEE Data Engineering Bulletin, March 2012 (PDF). The first experiments in column store technology with Virtuoso were in 2009, published at the SemData workshop at VLDB 2010 in Singapore. We tried storing TPC H as graph data and in relational tables, each with both rows and columns, and found that we could get 6 bytes per quad space utilization with the RDF-ization of TPC H, as opposed to 27 bytes with the row-wise compressed RDF storage model. The row-wise compression itself is 3x more compact than a row-wise representation with no compression.
Memory is the key to speed, and space efficiency is the key to memory. Performance comes from two factors: locality and parallelism. Both are addressed by column store technology. This made me a convert.
At this time, we also started the EU FP7 project, LOD2, most specifically working with Peter Boncz of CWI, the king of the column store, famous for MonetDB and VectorWise. This cooperation goes on within LOD2 and has extended to LDBC, an FP7 for designing benchmarks for graph and RDF databases. Peter has given us a world of valuable insight and experience in all aspects of avant garde database, from adaptive techniques to query optimization and beyond. One thing that was recently published is the results for Virtuoso cluster at CWI, running analytics on 150 billion relations on CWI’s SciLens cluster.
The SQL relational table-oriented databases and property graph-oriented databases (Graph for short) are both rooted in relational database science. Graph management simply introduces extra challenges with regards to scalability. Hence, at OpenLink Software, having a good grounding in the best practices of relational columnar (or column-wise) database management technology is vital.
Virtuoso is more prominently known for high-performance RDF-based graph database technology, but the entirety of its SQL relational data management functionality (which is the foundation for graph store) is vectored, and even allows users to choose between row-wise and column-wise physical layouts, index by index.
It has been asked: is this a new NoSQL engine? Well, there isn’t really such a thing. There are of course database engines that do not have SQL support and it has become trendy to call them "NoSQL." So, in this space, Virtuoso is an engine that does support SQL, plus SPARQL, and is designed to do big joins and aggregation (i.e., analytics) and fast bulk load, as well as ACID transactions on small updates, all with column store space efficiency. It is not only for big scans, as people tend to think about column stores, since it can also be used in compact embedded form.
Virtuoso also delivers great parallelism and throughput in a scale-out setting, with no restrictions on transactions and no limits on joining. The base is in relational database science, but all the adaptations that RDF and graph workloads need are built-in, with core level support for run-time data-typing, URIs as native Reference types, user-defined custom data types, etc.
Now that the major milestone of releasing Virtuoso 7 (open source and commercial editions) has been reached, the next steps include enabling our current and future customers to attain increased agility from big (linked) open data exploits. Technically, it will also include continued participation in DBMS industry benchmarks, such as those from the TPC, and others under development via the Linked Data Benchmark Council (LDBC), plus other social-media-oriented challenges that arise in this exciting data access, integration, and management innovation continuum. Thus, continue to expect new optimization tricks to be introduced at frequent intervals through the open source development branch at GitHub, between major commercial releases.
Related
|
05/13/2013 18:06 GMT
|
Modified:
05/13/2013 18:53 GMT
|
Virtuoso 7 Release
[
Orri Erling
]
The quest of OpenLink Software is to bring flexibility, efficiency, and expressive power to people working with data. For the past several years, this has been focused on making graph data models viable for the enterprise. Flexibility in schema evolution is a central aspect of this, as is the ability to share identifiers across different information systems, i.e., giving things URIs instead of synthetic keys that are not interpretable outside of a particular application.
With Virtuoso 7, we dramatically improve the efficiency of all this. With databases in the billions of relations (also known as triples, or 3-tuples), we can fit about 3x as many relations in the same space (disk and RAM) as with Virtuoso 6. Single-threaded query speed is up to 3x better, plus there is intra-query parallelization even in single-server configurations. Graph data workloads are all about random lookups. With these, having data in RAM is all-important. With 3x space efficiency, you can run with 3x more data in the same space before starting to go to disk. In some benchmarks, this can make a 20x gain.
Also the Virtuoso scale-out support is fundamentally reworked, with much more parallelism and better deployment flexibility.
So, for graph data, Virtuoso 7 is a major step in the coming of age of the technology. Data keeps growing and time is getting scarcer, so we need more flexibility and more performance at the same time.
So, let’s talk about how we accomplish this. Column stores have been the trend in relational data warehousing for over a decade. With column stores comes vectored execution, i.e., running any operation on a large number of values at one time. Instead of running one operation on one value, then the next operation on the result, and so forth, you run the first operation on thousands or hundreds-of-thousands of values, then the next one on the results of this, and so on.
Column-wise storage brings space efficiency, since values in one column of a table tend to be alike -- whether repeating, sorted, within a specific range, or picked from a particular set of possible values. With graph data, where there are no columns as such, the situation is exactly the same -- just substitute the word predicate for column. Space efficiency brings speed -- first by keeping more of the data in memory; secondly by having less data travel between CPU and memory. Vectoring makes sure that data that are closely located get accessed in close temporal proximity, hence improving cache utilization. When there is no locality, there are a lot of operations pending at the same time, as things always get done on a set of values instead of on a single value. This is the crux of the science of columns and vectoring.
Of the prior work in column stores, Virtuoso may most resemble Vertica, well described in Daniel Abadi’s famous PhD thesis. Virtuoso itself is described in IEEE Data Engineering Bulletin, March 2012 (PDF). The first experiments in column store technology with Virtuoso were in 2009, published at the SemData workshop at VLDB 2010 in Singapore. We tried storing TPC H as graph data and in relational tables, each with both rows and columns, and found that we could get 6 bytes per quad space utilization with the RDF-ization of TPC H, as opposed to 27 bytes with the row-wise compressed RDF storage model. The row-wise compression itself is 3x more compact than a row-wise representation with no compression.
Memory is the key to speed, and space efficiency is the key to memory. Performance comes from two factors: locality and parallelism. Both are addressed by column store technology. This made me a convert.
At this time, we also started the EU FP7 project, LOD2, most specifically working with Peter Boncz of CWI, the king of the column store, famous for MonetDB and VectorWise. This cooperation goes on within LOD2 and has extended to LDBC, an FP7 for designing benchmarks for graph and RDF databases. Peter has given us a world of valuable insight and experience in all aspects of avant garde database, from adaptive techniques to query optimization and beyond. One thing that was recently published is the results for Virtuoso cluster at CWI, running analytics on 150 billion relations on CWI’s SciLens cluster.
The SQL relational table-oriented databases and property graph-oriented databases (Graph for short) are both rooted in relational database science. Graph management simply introduces extra challenges with regards to scalability. Hence, at OpenLink Software, having a good grounding in the best practices of relational columnar (or column-wise) database management technology is vital.
Virtuoso is more prominently known for high-performance RDF-based graph database technology, but the entirety of its SQL relational data management functionality (which is the foundation for graph store) is vectored, and even allows users to choose between row-wise and column-wise physical layouts, index by index.
It has been asked: is this a new NoSQL engine? Well, there isn’t really such a thing. There are of course database engines that do not have SQL support and it has become trendy to call them "NoSQL." So, in this space, Virtuoso is an engine that does support SQL, plus SPARQL, and is designed to do big joins and aggregation (i.e., analytics) and fast bulk load, as well as ACID transactions on small updates, all with column store space efficiency. It is not only for big scans, as people tend to think about column stores, since it can also be used in compact embedded form.
Virtuoso also delivers great parallelism and throughput in a scale-out setting, with no restrictions on transactions and no limits on joining. The base is in relational database science, but all the adaptations that RDF and graph workloads need are built-in, with core level support for run-time data-typing, URIs as native Reference types, user-defined custom data types, etc.
Now that the major milestone of releasing Virtuoso 7 (open source and commercial editions) has been reached, the next steps include enabling our current and future customers to attain increased agility from big (linked) open data exploits. Technically, it will also include continued participation in DBMS industry benchmarks, such as those from the TPC, and others under development via the Linked Data Benchmark Council (LDBC), plus other social-media-oriented challenges that arise in this exciting data access, integration, and management innovation continuum. Thus, continue to expect new optimization tricks to be introduced at frequent intervals through the open source development branch at GitHub, between major commercial releases.
Related
|
05/13/2013 18:05 GMT
|
Modified:
05/13/2013 18:52 GMT
|
LDBC: A Socio-technical Perspective
[
Virtuso Data Space Bot
]
(Originally posted to the LDBC blog.)
In recent days, cyberspace has seen some discussion concerning the relationship of the EU FP7 project LDBC (Linked Data Benchmark Council) and sociotechnical considerations. It has been suggested that LDBC, to its own and the community’s detriment, ignores sociotechnical aspects.
LDBC, as research projects go, actually has an unusually large, and as of this early date, successful and thriving sociotechnical aspect, i.e., involvement of users and vendors alike. I will here discuss why, insofar as the technical output of the project goes, sociotechnical metrics are in fact out of scope. Then yet again, to what degree the benefits potentially obtained from the use of LDBC outcomes are in fact realized does have a strong dependence on community building, a social process.
One criticism of big data projects we sometimes encounter is the point that data without context is not useful. Further, one cannot just assume that one can throw several data sets together and get meaning from this, as there may be different semantics for similar looking things, just think of 7 different definitions of blood pressure.
In its initial user community meeting, LDBC was, according to its charter, focusing mostly on cases where the data is already in existence and of sufficient quality for the application at hand.
Michael Brodie, Chief Scientist at Verizon, is a well known advocate of focusing on meaning of data, not only on processing performance. There is a piece on this matter by him, Peter Boncz, Chris Bizer, and myself on the Sigmod Record: "The Meaningful Use of Big Data: Four Perspectives – Four Challenges".
I had a conversation with Michael at a DERI meeting a couple of years ago about measuring the total cost of technology adoption, thus including socio-technical aspects such as acceptance by users, learning curves of various stakeholders, whether in fact one could demonstrate an overall gain in productivity arising from semantic technologies. [in my words, paraphrased]
"Can one measure the effectiveness of different approaches to data integration?" asked I.
"Of course one can," answered Michael, "this only involves carrying out the same task with two different technologies, two different teams and then doing a double blind test with users. However, this never happens. Nobody does this because doing the task even once in a large organization is enormously costly and nobody will even seriously consider doubling the expense."
LDBC does in fact intend to address technical aspects of data integration, i.e., schema conversion, entity resolution, and the like. Addressing the sociotechnical aspects of this (whether one should integrate in the first place, whether the integration result adds value, whether it violates privacy or security concerns, whether users will understand the result, what the learning curves are, etc.) is simply too diverse and so totally domain dependent that a general purpose metric cannot be developed, at least not in the time and budget constraints of the project. Further, adding a large human element in the experimental setting (e.g., how skilled the developers are, how well the stakeholders can explain their needs, how often these needs change, etc.) will lead to experiments that are so expensive to carry out and whose results will have so many unquantifiable factors that these will constitute an insuperable barrier to adoption.
Experience demonstrates that even agreeing on the relative importance of quantifiable metrics of database performance is hard enough. Overreaching would compromise the project's ability to deliver its core value. Let us next talk about this.
It is only a natural part of the political landscape that the EC's research funding choices are criticized by some members of the public. Some criticism is about the emphasis on big data. Big data is a fact on the ground, and research and industry need to deal with it. Of course, there have been and will be critics of technology in general on moral or philosophical grounds. Instead of opening this topic, I will refer you to an article by Michael Brodie. In a world where big data is a given, lowering the entry threshold for big data applications, thus making them available not only to government agencies and the largest businesses, seems ethical to me, as per Brodie's checklist. LDBC will contribute to this by driving greater availability, better performance, and lower cost for these technologies.
Once we accept that big data is there and is important, we arrive at the issue of deriving actionable meaning from it. A prerequisite of deriving actionable meaning from big data is the ability to flexibly process this data. LDBC is about creating metrics for this. The prerequisites for flexibly working with data are fairly independent of the specific use case, while the criteria of meaning, let alone actionable analysis, are very domain specific. Therefore, in order to provide the greatest service to the broadest constituency, LDBC focuses on measuring that which is most generic, yet will underlie any decision support or other data processing deployment that involves RDF or graph data.
I would say that LDBC is an exceptionally effective use of taxpayer money. LDBC will produce metrics that will drive technological innovation for years to come. The total money spent towards pursuing goals set forth by LDBC is likely to vastly exceed the budget of LDBC. Only think of the person-centuries or even millennia that have gone into optimizing for TPC-C and TPC-H. The vast majority of the money spent for these pursuits is paid by industry, not by research funding. It is spent worldwide, not in Europe alone.
Thus, if LDBC is successful, a limited amount of EC research money will influence how much greater product development budgets are spent in the future. This multiplier effect applies of course to highly successful research outcomes in general but is especially clear with LDBC.
European research funding has played a significant role in creating the foundations of the RDF/Linked Data scene. LDBC is a continuation of this policy, however the focus has now shifted to reflect the greater maturity of the technology. LDBC is now about making the RDF and graph database sectors into mature industries whose products can predictably tackle the challenges out there.
LDBC: A Socio-technical Perspective
[
Orri Erling
]
(Originally posted to the LDBC blog.)
In recent days, cyberspace has seen some discussion concerning the relationship of the EU FP7 project LDBC (Linked Data Benchmark Council) and sociotechnical considerations. It has been suggested that LDBC, to its own and the community’s detriment, ignores sociotechnical aspects.
LDBC, as research projects go, actually has an unusually large, and as of this early date, successful and thriving sociotechnical aspect, i.e., involvement of users and vendors alike. I will here discuss why, insofar as the technical output of the project goes, sociotechnical metrics are in fact out of scope. Then yet again, to what degree the benefits potentially obtained from the use of LDBC outcomes are in fact realized does have a strong dependence on community building, a social process.
One criticism of big data projects we sometimes encounter is the point that data without context is not useful. Further, one cannot just assume that one can throw several data sets together and get meaning from this, as there may be different semantics for similar looking things, just think of 7 different definitions of blood pressure.
In its initial user community meeting, LDBC was, according to its charter, focusing mostly on cases where the data is already in existence and of sufficient quality for the application at hand.
Michael Brodie, Chief Scientist at Verizon, is a well known advocate of focusing on meaning of data, not only on processing performance. There is a piece on this matter by him, Peter Boncz, Chris Bizer, and myself on the Sigmod Record: "The Meaningful Use of Big Data: Four Perspectives – Four Challenges".
I had a conversation with Michael at a DERI meeting a couple of years ago about measuring the total cost of technology adoption, thus including socio-technical aspects such as acceptance by users, learning curves of various stakeholders, whether in fact one could demonstrate an overall gain in productivity arising from semantic technologies. [in my words, paraphrased]
"Can one measure the effectiveness of different approaches to data integration?" asked I.
"Of course one can," answered Michael, "this only involves carrying out the same task with two different technologies, two different teams and then doing a double blind test with users. However, this never happens. Nobody does this because doing the task even once in a large organization is enormously costly and nobody will even seriously consider doubling the expense."
LDBC does in fact intend to address technical aspects of data integration, i.e., schema conversion, entity resolution, and the like. Addressing the sociotechnical aspects of this (whether one should integrate in the first place, whether the integration result adds value, whether it violates privacy or security concerns, whether users will understand the result, what the learning curves are, etc.) is simply too diverse and so totally domain dependent that a general purpose metric cannot be developed, at least not in the time and budget constraints of the project. Further, adding a large human element in the experimental setting (e.g., how skilled the developers are, how well the stakeholders can explain their needs, how often these needs change, etc.) will lead to experiments that are so expensive to carry out and whose results will have so many unquantifiable factors that these will constitute an insuperable barrier to adoption.
Experience demonstrates that even agreeing on the relative importance of quantifiable metrics of database performance is hard enough. Overreaching would compromise the project's ability to deliver its core value. Let us next talk about this.
It is only a natural part of the political landscape that the EC's research funding choices are criticized by some members of the public. Some criticism is about the emphasis on big data. Big data is a fact on the ground, and research and industry need to deal with it. Of course, there have been and will be critics of technology in general on moral or philosophical grounds. Instead of opening this topic, I will refer you to an article by Michael Brodie. In a world where big data is a given, lowering the entry threshold for big data applications, thus making them available not only to government agencies and the largest businesses, seems ethical to me, as per Brodie's checklist. LDBC will contribute to this by driving greater availability, better performance, and lower cost for these technologies.
Once we accept that big data is there and is important, we arrive at the issue of deriving actionable meaning from it. A prerequisite of deriving actionable meaning from big data is the ability to flexibly process this data. LDBC is about creating metrics for this. The prerequisites for flexibly working with data are fairly independent of the specific use case, while the criteria of meaning, let alone actionable analysis, are very domain specific. Therefore, in order to provide the greatest service to the broadest constituency, LDBC focuses on measuring that which is most generic, yet will underlie any decision support or other data processing deployment that involves RDF or graph data.
I would say that LDBC is an exceptionally effective use of taxpayer money. LDBC will produce metrics that will drive technological innovation for years to come. The total money spent towards pursuing goals set forth by LDBC is likely to vastly exceed the budget of LDBC. Only think of the person-centuries or even millennia that have gone into optimizing for TPC-C and TPC-H. The vast majority of the money spent for these pursuits is paid by industry, not by research funding. It is spent worldwide, not in Europe alone.
Thus, if LDBC is successful, a limited amount of EC research money will influence how much greater product development budgets are spent in the future. This multiplier effect applies of course to highly successful research outcomes in general but is especially clear with LDBC.
European research funding has played a significant role in creating the foundations of the RDF/Linked Data scene. LDBC is a continuation of this policy, however the focus has now shifted to reflect the greater maturity of the technology. LDBC is now about making the RDF and graph database sectors into mature industries whose products can predictably tackle the challenges out there.
LDBC - the Linked Data Benchmark Council
[
Virtuso Data Space Bot
]
(This posting was inadvertently delayed from the time of its writing, 2012-11-21.)
The Linked Data Benchmark Council (LDBC) project is officially starting now.
This represents a serious effort towards making relevant and well thought out metrics for RDF and graph databases and defining protocols for measurement and publishing of well documented and reproducible results. This also entails the creation of a TPC-analog for the graph and RDF domains.
The project brings together leading vendors, with OpenLink and Ontotext representing the RDF side and Neo Technology and Sparsity Technologies representing the graph database side. Peter Boncz of MonetDB and Vectorwise fame is the technical director, with participation from the Technical University of Munich with Thomas Neumann, known for RDF3X and HyPer. La Universitat Politècnica de Catalunya coordinates the project and brings strong academic expertise in graph databasing, also representing their Sparsity Technologies spinoff. FORTH (Foundation for Research and Technology - Hellas) of Crete contributes expertise in data integration and provenance. STI Innsbruck participates in community building and outreach.
The consortium has second-to-none understanding of benchmarking and has sufficient time allotted to the task for producing world class work, comparable to the TPC benchmarks. This has to date never been realized in the RDF or graph space.
History demonstrates that whenever something that is sufficiently important starts getting systematically measured, there is an improvement in the metric. The early days of the TPC saw a 40-fold increase in transaction processing speed. TPC-H continues to be, after 18 years, well used as a basis of quantifying advances in analytics databases.
A serious initiative for well-thought-out benchmarks for guiding the emerging RDF and graph database markets is nothing short of a necessary precondition for the emergence of a serious market with several vendors offering mutually comparable products.
Benchmarks are only as good as their credibility and adoption. For this reason, LDBC has been in touch with all graph and RDF vendors we could find, and has received a positive statement of intent from most, indicating that they would participate in a LDBC organization and contribute to shaping benchmarks.
There is further a Technical User Community, with its initial meeting this week, where present-day end users of RDF and graph databases will voice their wishes for benchmark development. Thus benchmarks will be grounded in use cases contributed by real users.
With these elements in place we have every reason to expect relevant benchmarks with broad adoption, with all the benefits this entails.
LDBC - the Linked Data Benchmark Council
[
Orri Erling
]
(This posting was inadvertently delayed from the time of its writing, 2012-11-21.)
The Linked Data Benchmark Council (LDBC) project is officially starting now.
This represents a serious effort towards making relevant and well thought out metrics for RDF and graph databases and defining protocols for measurement and publishing of well documented and reproducible results. This also entails the creation of a TPC-analog for the graph and RDF domains.
The project brings together leading vendors, with OpenLink and Ontotext representing the RDF side and Neo Technology and Sparsity Technologies representing the graph database side. Peter Boncz of MonetDB and Vectorwise fame is the technical director, with participation from the Technical University of Munich with Thomas Neumann, known for RDF3X and HyPer. La Universitat Politècnica de Catalunya coordinates the project and brings strong academic expertise in graph databasing, also representing their Sparsity Technologies spinoff. FORTH (Foundation for Research and Technology - Hellas) of Crete contributes expertise in data integration and provenance. STI Innsbruck participates in community building and outreach.
The consortium has second-to-none understanding of benchmarking and has sufficient time allotted to the task for producing world class work, comparable to the TPC benchmarks. This has to date never been realized in the RDF or graph space.
History demonstrates that whenever something that is sufficiently important starts getting systematically measured, there is an improvement in the metric. The early days of the TPC saw a 40-fold increase in transaction processing speed. TPC-H continues to be, after 18 years, well used as a basis of quantifying advances in analytics databases.
A serious initiative for well-thought-out benchmarks for guiding the emerging RDF and graph database markets is nothing short of a necessary precondition for the emergence of a serious market with several vendors offering mutually comparable products.
Benchmarks are only as good as their credibility and adoption. For this reason, LDBC has been in touch with all graph and RDF vendors we could find, and has received a positive statement of intent from most, indicating that they would participate in a LDBC organization and contribute to shaping benchmarks.
There is further a Technical User Community, with its initial meeting this week, where present-day end users of RDF and graph databases will voice their wishes for benchmark development. Thus benchmarks will be grounded in use cases contributed by real users.
With these elements in place we have every reason to expect relevant benchmarks with broad adoption, with all the benefits this entails.
LDBC Technical User Community Meeting
[
Virtuso Data Space Bot
]
The LDBC Technical User Community (TUC) had its initial meeting in Barcelona last week.
First we wish to thank the many end user organizations that were present. This clearly validates the project's mission and demonstrates that there is acute awareness of the need for better metrics in the field. In the following, I will summarize the requirements that were brought forth.
-
Scale out - There was near unanimity among users that even if present workloads could be handled on single servers, a scale-out growth path was highly desirable. On the other hand, some applications were scale-out based from the get go. Even when not actually used, a scale-out capability is felt to be an insurance against future need.
-
Making limits explicit - How far can this technology go? Benchmarks need to demonstrate at what scales the products being considered work best, and where they will grind to a halt. Also, the impact of scale-out on performance needs to be made clear. The cost of solutions at different scales must be made explicit.
Many of these requirements will be met by simply following TPC practices. Now, vendors cannot be expected to publish numbers for cases where their products fail, but they do have incentives for publishing numbers on large data, and at least giving a price/performance point that exceeds most user needs.
-
Fault tolerance and operational characteristics - Present day benchmarks (e.g., the TPC ones) hardly address operational aspects that most enterprise deployments will encounter. This was already stated by Michael Stonebraker at the first TPC performance evaluation workshop some years back at VLDB in Lyon. Users want to know the price/performance impact of making fault-tolerant systems and wish to have metrics for things like backup and bulk load under online conditions. A need to operate across multiple geographies was present in more than one use case, thus requiring a degree of asynchronous replication such as log shipping.
-
Update-intensive workloads - Unlike one might think, RDF uses are not primarily load-once-plus-lookup. Freshness of data creates value, and databases, even if they are warehouses in character, need to be kept up to date much better than just by periodic reload. Online updates may be small, as for example refreshing news feeds or web crawls, where the unit of update is small but updates are many, but also replacing reference data sets of hundreds of millions of triples. The latter requirement exceeds what is practical in a single transaction. ACID was generally desired, with some interest also in eventual consistency. We did not get use cases with much repeatable read (e.g., updating account balances), but rather atomic and durable replacement of sets of statements.
-
Inference - Class and property hierarchies were common, followed by use of transitivity. owl:sameAs was not in much use, being too dangerous, i.e., a single statement may potentially have huge effect and produce unpredictable sets of properties for instances, for which applications are not prepared. Beyond these, the wishes for inference, with use cases ranging from medicine to forensics, were outside of the OWL domain. These typically involved probability scores adding up the joint occurrence of complex criteria with some numeric computation (e.g. time intervals, geography, etc.).
As materialization of forward closure is the prevalent mode of implementing inference in RDF, users wished to have a measure of its cost in space and time, especially under online-update loads.
-
Text, XML, and Geospatial - There is no online application that does not have text search. In publishing, this is hardly ever provided by an RDF store, even if there is one in the mix. Even so, there is an understandable desire to consolidate systems, i.e., to not have an XML database for content and a separate RDF database for metadata. Also, many applications have a geospatial element. One wish was to combine XPATH/XQuery with SPARQL, and it was implied that query optimization should create good plans under these conditions.
There was extensive discussion especially on benchmarking full-text. Such a benchmark would need to address the quality of relevance ranking. Doing new work in this space is clearly out of scope for LDBC, but an IR benchmark could be reused as an add-on to provide a quality score. The performance score would come from the LDBC side of the benchmark. Now, many of the applications of text (e.g., news) might not even sort on text match score, but rather by time. Also if the text search is applied to metadata like labels or URI strings, the quality of a match is a non-issue, as there is no document context.
-
Data integration - Almost all applications had some element of data integration. Indeed, if one uses RDF in the first place, the motivation usually has to do with schema flexibility. Having a relational schema for everything is often seen to be too hard to maintain and to lead to too much development time before an initial version of an application or answer of a business question. Data integration is everywhere but stays elusive for benchmarking. Every time it is different and most vendors present do not offer products for this specific need. Many ideas were presented, including using SPARQL for entity resolution, and for checking consistency of an integration result.
A central issue of benchmark design is having an understandable metric. People cannot make sense of more than a few figures. The TPC practice of throughput at scale and price per unit of throughput at scale is a successful example. However, it may be difficult to agree on relative weights of components if a metric is an aggregate of too many things. Also, if a benchmark has too many optional parts, metrics easily become too complicated. On the other hand, requiring too many features (e.g. XML, full text, geospatial) restricts the number of possible participants.
To stimulate innovation, a benchmark needs to be difficult but restricted to a specific domain. TPC-H is a good example, favoring specialized systems built for analytics alone. To be a predictor of total cost and performance in a complex application, a benchmark must include much more functionality, and will favor general purpose systems that do many things but are not necessarily outstanding in any single aspect.
After 1-1/2 days with users, the project team met to discuss actual benchmark task forces to be started. The conclusion was that work would initially proceed around two use cases: publishing, and social networks. The present use of RDF by the BBC and the Press Association provides the background scenario for the publishing benchmark, and the work carried out around the Social Intelligence Benchmark (SIB) in LOD2 will provide a starting point for the social network benchmark. Additionally, user scenarios from the DEX graph database user base will help shape the SN workload.
A data integration task force needs more clarification, but work in this direction is in progress.
In practice, driving progress needs well-focused benchmarks with special trick questions intended to stress specific aspects of a database engine. Providing an overall perspective on cost and online operations needs a broad mix of features to be covered.
These needs will be reconciled by having many metrics inside a single use case, i.e., a social network data set can be used for transactional updates, for lookup queries, for graph analytics, and for TPC-H style business intelligence questions, especially if integrated with another more-relational dataset. Thus there will be a mix of metrics, from transactions to analytics, with single and multiuser workloads. Whether these are packaged as separate benchmarks, or as optional sections of one, remains to be seen.
LDBC Technical User Community Meeting
[
Orri Erling
]
The LDBC Technical User Community (TUC) had its initial meeting in Barcelona last week.
First we wish to thank the many end user organizations that were present. This clearly validates the project's mission and demonstrates that there is acute awareness of the need for better metrics in the field. In the following, I will summarize the requirements that were brought forth.
-
Scale out - There was near unanimity among users that even if present workloads could be handled on single servers, a scale-out growth path was highly desirable. On the other hand, some applications were scale-out based from the get go. Even when not actually used, a scale-out capability is felt to be an insurance against future need.
-
Making limits explicit - How far can this technology go? Benchmarks need to demonstrate at what scales the products being considered work best, and where they will grind to a halt. Also, the impact of scale-out on performance needs to be made clear. The cost of solutions at different scales must be made explicit.
Many of these requirements will be met by simply following TPC practices. Now, vendors cannot be expected to publish numbers for cases where their products fail, but they do have incentives for publishing numbers on large data, and at least giving a price/performance point that exceeds most user needs.
-
Fault tolerance and operational characteristics - Present day benchmarks (e.g., the TPC ones) hardly address operational aspects that most enterprise deployments will encounter. This was already stated by Michael Stonebraker at the first TPC performance evaluation workshop some years back at VLDB in Lyon. Users want to know the price/performance impact of making fault-tolerant systems and wish to have metrics for things like backup and bulk load under online conditions. A need to operate across multiple geographies was present in more than one use case, thus requiring a degree of asynchronous replication such as log shipping.
-
Update-intensive workloads - Unlike one might think, RDF uses are not primarily load-once-plus-lookup. Freshness of data creates value, and databases, even if they are warehouses in character, need to be kept up to date much better than just by periodic reload. Online updates may be small, as for example refreshing news feeds or web crawls, where the unit of update is small but updates are many, but also replacing reference data sets of hundreds of millions of triples. The latter requirement exceeds what is practical in a single transaction. ACID was generally desired, with some interest also in eventual consistency. We did not get use cases with much repeatable read (e.g., updating account balances), but rather atomic and durable replacement of sets of statements.
-
Inference - Class and property hierarchies were common, followed by use of transitivity. owl:sameAs was not in much use, being too dangerous, i.e., a single statement may potentially have huge effect and produce unpredictable sets of properties for instances, for which applications are not prepared. Beyond these, the wishes for inference, with use cases ranging from medicine to forensics, were outside of the OWL domain. These typically involved probability scores adding up the joint occurrence of complex criteria with some numeric computation (e.g. time intervals, geography, etc.).
As materialization of forward closure is the prevalent mode of implementing inference in RDF, users wished to have a measure of its cost in space and time, especially under online-update loads.
-
Text, XML, and Geospatial - There is no online application that does not have text search. In publishing, this is hardly ever provided by an RDF store, even if there is one in the mix. Even so, there is an understandable desire to consolidate systems, i.e., to not have an XML database for content and a separate RDF database for metadata. Also, many applications have a geospatial element. One wish was to combine XPATH/XQuery with SPARQL, and it was implied that query optimization should create good plans under these conditions.
There was extensive discussion especially on benchmarking full-text. Such a benchmark would need to address the quality of relevance ranking. Doing new work in this space is clearly out of scope for LDBC, but an IR benchmark could be reused as an add-on to provide a quality score. The performance score would come from the LDBC side of the benchmark. Now, many of the applications of text (e.g., news) might not even sort on text match score, but rather by time. Also if the text search is applied to metadata like labels or URI strings, the quality of a match is a non-issue, as there is no document context.
-
Data integration - Almost all applications had some element of data integration. Indeed, if one uses RDF in the first place, the motivation usually has to do with schema flexibility. Having a relational schema for everything is often seen to be too hard to maintain and to lead to too much development time before an initial version of an application or answer of a business question. Data integration is everywhere but stays elusive for benchmarking. Every time it is different and most vendors present do not offer products for this specific need. Many ideas were presented, including using SPARQL for entity resolution, and for checking consistency of an integration result.
A central issue of benchmark design is having an understandable metric. People cannot make sense of more than a few figures. The TPC practice of throughput at scale and price per unit of throughput at scale is a successful example. However, it may be difficult to agree on relative weights of components if a metric is an aggregate of too many things. Also, if a benchmark has too many optional parts, metrics easily become too complicated. On the other hand, requiring too many features (e.g. XML, full text, geospatial) restricts the number of possible participants.
To stimulate innovation, a benchmark needs to be difficult but restricted to a specific domain. TPC-H is a good example, favoring specialized systems built for analytics alone. To be a predictor of total cost and performance in a complex application, a benchmark must include much more functionality, and will favor general purpose systems that do many things but are not necessarily outstanding in any single aspect.
After 1-1/2 days with users, the project team met to discuss actual benchmark task forces to be started. The conclusion was that work would initially proceed around two use cases: publishing, and social networks. The present use of RDF by the BBC and the Press Association provides the background scenario for the publishing benchmark, and the work carried out around the Social Intelligence Benchmark (SIB) in LOD2 will provide a starting point for the social network benchmark. Additionally, user scenarios from the DEX graph database user base will help shape the SN workload.
A data integration task force needs more clarification, but work in this direction is in progress.
In practice, driving progress needs well-focused benchmarks with special trick questions intended to stress specific aspects of a database engine. Providing an overall perspective on cost and online operations needs a broad mix of features to be covered.
These needs will be reconciled by having many metrics inside a single use case, i.e., a social network data set can be used for transactional updates, for lookup queries, for graph analytics, and for TPC-H style business intelligence questions, especially if integrated with another more-relational dataset. Thus there will be a mix of metrics, from transactions to analytics, with single and multiuser workloads. Whether these are packaged as separate benchmarks, or as optional sections of one, remains to be seen.
Developer Recruitment Exercise
[
Virtuso Data Space Bot
]
The specification of the exercise referred to in the previous post may be found below.
Questions on the exercise can be sent to the email specified in the previous post. I may schedule a phone call to answer questions based on the initial email contact.
We seek to have all applicants complete the exercise before October 1.
General
The exercise consists of implementing a part of the TPC-C workload in memory, in C or C++. TPC-C is the long-time industry standard benchmark for transaction processing performance. We use this as a starting point for an exercise for assessing developer skill level in writing heavily multithreaded, performance-critical code.
The application performs a series of transactions against an in-memory database, encountering lock contention and occasional deadlocks. The application needs to provide atomicity, consistency, and isolation for transactions. The task consists of writing the low-level data structures for storing the memory-resident database and for managing concurrency, including lock queueing, deadlock detection, and commit/rollback. The solutions are evaluated based on their actual measured multithreaded performance on commodity servers, e.g., 8- or 12-cores of Intel Xeon.
OpenLink provides the code for data generation and driving the test. This is part of the TPC-C kit in Virtuoso Open Source. The task is to replace the SQL API calls with equivalent in-process function calls against the in-memory database developed as part of the exercise.
Rules
We are aware that the best solution to the problem may be running transactions single-threaded against in-memory hash tables without any concurrency control. The application data may be partitioned so that a single transaction can be in most cases assigned to a partition, which it will get for itself for the few microseconds it takes to do its job. For this exercise, this solution is explicitly ruled out. The application must demonstrate shared access to data, with a transaction holding multiple concurrent locks and being liable to deadlock.
TPC-C can be written so as to avoid deadlocks by always locking in a certain order. This is also expressly prohibited; in specific, the stock rows of a new order transaction must be locked in the order they are specified in the invocation. In application terms this makes no sense, but for purposes of the exercise this will serve as a natural source of deadlocks.
Parameters
The application needs to offer an interactive or scripted interface (command line is OK) which provides the following operations:
-
Clear and initialize a database of n warehouses.
-
Run n threads, each doing m new order transactions. Each thread has a home warehouse and occasionally accesses other warehouse's data. This reports the real time elapsed and the number of retries arising from deadlocks.
-
Check the consistency between the stock, orders, and order_line data structures.
-
Report system status such as clocks spent waiting for specific mutexes. This is supplied as part of the OpenLink library used by the data generator.
Data Structures
The transactions are written as C functions. The data is represented as C structs, and tree indices or hash tables are used for value-based access to the structures by key. The application has no persistent storage. The structures reference each other by the key values as in the database, so no direct pointers. The key values are to be translated into pointers with a hash table or other index-like structure.
The application must be thread-safe, and transactions must be able to roll back. Transactions will sometimes wait for each other in updating shared resources such as stock or district or warehouse balances. The application must be written so as to implement fine-grained locking, and each transaction must be able to hold multiple locks. The application must be able to detect deadlocks. For deadlock recovery, it is acceptable to abort the transaction that detects the deadlock.
C++ template libraries may be used but one must pay attention to their efficiency.
The new order transaction is the only required transaction.
All numbers can be represented as integers. This holds equally for key
columns as for monetary amounts.
All index structures (e.g., hash tables) in the application must be thread safe, so that an insert would be safe with concurrent access or concurrent inserts. This holds also for index structures for tables which do not get inserts in the test (e.g. item, customer, stock, etc.).
A sequence object must not be used for assigning new values to the O_ID column of ORDERS. These values must come from the D_NEXT_O_ID column of the DISTRICT table. If a new order transaction rolls back, its update of D_NEXT_O_ID is also rolled back. This causes O_ID values to always be consecutive within a district.
TPC-C Functionality
The application must implement the TPC-C new order transaction in full. This must not avoid deadlocks by ordering locking on stock rows. See the rules section.
The transaction must have the semantics specified in TPC-C, except for durability.
Supporting Files
The test driver calling the transaction procedures is in tpccodbc.c. This can be reused so as to call the transaction procedure in process instead of the ODBC exec.
The user interface may be a command line menu with run options for different numbers of transactions with different thread counts and an option for integrity check.
The integrity check consists of verifying s_cnt_order against the orders and checking that max (O_ID) and D_NEXT_O_ID match within each district.
Running the application should give different statistics such as CPU%, cumulative time spent waiting for locks, etc. The rdtsc instruction can be used for getting clock counts for timing.
Points to Note
This section summarizes some of the design patterns and coding tricks we expect to see in a solution to the exercise. These may seem self-evident to some, but experience indicates that this is not universally so.
-
The TPC-C transaction profile for new order specifies a semantics for the operation. The order of locking is left to the implementation as long as the semantics are in effect. The application will be tested with many clients on the same warehouse, running as fast as they can. So lock contention is expected. Therefore, the transaction should be written so as to acquire the locks with the greatest contention as late as possible. No locks need be acquired for the item table since none of the transactions will update it.
-
For implementing locks, using a mutex to serialize access to application resources is not enough. Many locks will be acquired by each transaction, in an unpredictable order. Unless explicit queueing for locks is implemented with deadlock detection, the application will not work.
-
If waiting for a mutex causes the operating system to stop a thread, even when there are cores free, the latency is multiple microseconds, even if the mutex is released by its owner on the next cycle after the waiting thread is suspended. This will destroy any benefit from parallelism unless one is very careful. Programmers do not seem to instinctively know this.
Therefore any structure to which access must be serialized (e.g. hash tables, locks, etc.) needs to be protected by a mutex but must be partitioned so that there are tens or hundreds of mutexes depending on which section of the structure one is accessing.
Submissions that protect a hash table or other index-like structure for a whole application table with a single mutex or rw lock will be discarded off the bat.
Even while using many mutexes, one must hold them for a minimum of time. When accessing a hash table, do the invariant parts first; acquire the mutex after that. For example, if you calculate the hash number after acquiring the mutex for the hash table, the submission will be rejected.
The TPC-C application has some local and some scattered access. Orders are local, and stock and item lines are scattered. When doing scattered memory accesses, the program should be written so that the CPU will, from a single thread, have multiple concurrent cache misses in flight at all times. So, when accessing 10 stock lines, calculate the hash numbers first; then access the memory, deferring any branches based on the accessed values. In this way, out of order execution will miss the CPU cache for many independent addresses in parallel. One can use the gcc __builtin_prefetch primitive, or simply write the program so as to have mutually data-independent memory accesses in close proximity.
For detecting deadlocks, a global transaction wait graph may have to be maintained. This will need to be maintained in a serialized manner. If many threads access this, the accesses must be serialized on a global mutex. This may be very bad if the deadlock detection takes a long time. Alternately, the wait graph may be maintained on another thread. The thread will get notices of waits and transacts from worker threads with some delay. Having spotted a cycle, it may kill one or another party. This will require some inter-thread communication. The submission may address this matter in any number of ways.
However, just acquiring a lock without wait must not involve getting a global mutex. Going to wait will have to do so, were it only for queueing a notice to a monitor thread. Using a socket-to-self might appear to circumvent this, but the communication stack will have mutexes inside so this is no better.
Evaluation Criteria
The exercise will be evaluated based on the run time performance, especially multicore scalability of the result.
Extra points are not given for implementing interfaces or for being object oriented. Interfaces, templates, and objects are not forbidden as such, but their cost must not exceed the difference between getting an address from a virtual table and calling a function directly.
The locking implementation must be correct. It can be limited to exclusive locks and need not support isolation other than repeatable read. Running the application must demonstrate deadlocks and working recovery from these.
Code and Libraries To Be Used
The TPC-C data generator and test driver are in the Virtuoso Open Source distribution, in the files binsrc/tests/tpcc*.c and files included from these. You can make the exercise in the same directory and just alter the files or make script. The application is standalone and has no other relation to the Virtuoso code. The libsrc/Thread threading wrappers may be used. If not using these, make a wrapper similar to mutex_enter when MTX_METER is defined so that it counts the waits and clocks spent during wait. Also have a report like that in mutex_stat() for the mutex wait frequency and duration.
Developer Recruitment Exercise
[
Orri Erling
]
The specification of the exercise referred to in the previous post may be found below.
Questions on the exercise can be sent to the email specified in the previous post. I may schedule a phone call to answer questions based on the initial email contact.
We seek to have all applicants complete the exercise before October 1.
General
The exercise consists of implementing a part of the TPC-C workload in memory, in C or C++. TPC-C is the long-time industry standard benchmark for transaction processing performance. We use this as a starting point for an exercise for assessing developer skill level in writing heavily multithreaded, performance-critical code.
The application performs a series of transactions against an in-memory database, encountering lock contention and occasional deadlocks. The application needs to provide atomicity, consistency, and isolation for transactions. The task consists of writing the low-level data structures for storing the memory-resident database and for managing concurrency, including lock queueing, deadlock detection, and commit/rollback. The solutions are evaluated based on their actual measured multithreaded performance on commodity servers, e.g., 8- or 12-cores of Intel Xeon.
OpenLink provides the code for data generation and driving the test. This is part of the TPC-C kit in Virtuoso Open Source. The task is to replace the SQL API calls with equivalent in-process function calls against the in-memory database developed as part of the exercise.
Rules
We are aware that the best solution to the problem may be running transactions single-threaded against in-memory hash tables without any concurrency control. The application data may be partitioned so that a single transaction can be in most cases assigned to a partition, which it will get for itself for the few microseconds it takes to do its job. For this exercise, this solution is explicitly ruled out. The application must demonstrate shared access to data, with a transaction holding multiple concurrent locks and being liable to deadlock.
TPC-C can be written so as to avoid deadlocks by always locking in a certain order. This is also expressly prohibited; in specific, the stock rows of a new order transaction must be locked in the order they are specified in the invocation. In application terms this makes no sense, but for purposes of the exercise this will serve as a natural source of deadlocks.
Parameters
The application needs to offer an interactive or scripted interface (command line is OK) which provides the following operations:
-
Clear and initialize a database of n warehouses.
-
Run n threads, each doing m new order transactions. Each thread has a home warehouse and occasionally accesses other warehouse's data. This reports the real time elapsed and the number of retries arising from deadlocks.
-
Check the consistency between the stock, orders, and order_line data structures.
-
Report system status such as clocks spent waiting for specific mutexes. This is supplied as part of the OpenLink library used by the data generator.
Data Structures
The transactions are written as C functions. The data is represented as C structs, and tree indices or hash tables are used for value-based access to the structures by key. The application has no persistent storage. The structures reference each other by the key values as in the database, so no direct pointers. The key values are to be translated into pointers with a hash table or other index-like structure.
The application must be thread-safe, and transactions must be able to roll back. Transactions will sometimes wait for each other in updating shared resources such as stock or district or warehouse balances. The application must be written so as to implement fine-grained locking, and each transaction must be able to hold multiple locks. The application must be able to detect deadlocks. For deadlock recovery, it is acceptable to abort the transaction that detects the deadlock.
C++ template libraries may be used but one must pay attention to their efficiency.
The new order transaction is the only required transaction.
All numbers can be represented as integers. This holds equally for key
columns as for monetary amounts.
All index structures (e.g., hash tables) in the application must be thread safe, so that an insert would be safe with concurrent access or concurrent inserts. This holds also for index structures for tables which do not get inserts in the test (e.g. item, customer, stock, etc.).
A sequence object must not be used for assigning new values to the O_ID column of ORDERS. These values must come from the D_NEXT_O_ID column of the DISTRICT table. If a new order transaction rolls back, its update of D_NEXT_O_ID is also rolled back. This causes O_ID values to always be consecutive within a district.
TPC-C Functionality
The application must implement the TPC-C new order transaction in full. This must not avoid deadlocks by ordering locking on stock rows. See the rules section.
The transaction must have the semantics specified in TPC-C, except for durability.
Supporting Files
The test driver calling the transaction procedures is in tpccodbc.c. This can be reused so as to call the transaction procedure in process instead of the ODBC exec.
The user interface may be a command line menu with run options for different numbers of transactions with different thread counts and an option for integrity check.
The integrity check consists of verifying s_cnt_order against the orders and checking that max (O_ID) and D_NEXT_O_ID match within each district.
Running the application should give different statistics such as CPU%, cumulative time spent waiting for locks, etc. The rdtsc instruction can be used for getting clock counts for timing.
Points to Note
This section summarizes some of the design patterns and coding tricks we expect to see in a solution to the exercise. These may seem self-evident to some, but experience indicates that this is not universally so.
-
The TPC-C transaction profile for new order specifies a semantics for the operation. The order of locking is left to the implementation as long as the semantics are in effect. The application will be tested with many clients on the same warehouse, running as fast as they can. So lock contention is expected. Therefore, the transaction should be written so as to acquire the locks with the greatest contention as late as possible. No locks need be acquired for the item table since none of the transactions will update it.
-
For implementing locks, using a mutex to serialize access to application resources is not enough. Many locks will be acquired by each transaction, in an unpredictable order. Unless explicit queueing for locks is implemented with deadlock detection, the application will not work.
-
If waiting for a mutex causes the operating system to stop a thread, even when there are cores free, the latency is multiple microseconds, even if the mutex is released by its owner on the next cycle after the waiting thread is suspended. This will destroy any benefit from parallelism unless one is very careful. Programmers do not seem to instinctively know this.
Therefore any structure to which access must be serialized (e.g. hash tables, locks, etc.) needs to be protected by a mutex but must be partitioned so that there are tens or hundreds of mutexes depending on which section of the structure one is accessing.
Submissions that protect a hash table or other index-like structure for a whole application table with a single mutex or rw lock will be discarded off the bat.
Even while using many mutexes, one must hold them for a minimum of time. When accessing a hash table, do the invariant parts first; acquire the mutex after that. For example, if you calculate the hash number after acquiring the mutex for the hash table, the submission will be rejected.
The TPC-C application has some local and some scattered access. Orders are local, and stock and item lines are scattered. When doing scattered memory accesses, the program should be written so that the CPU will, from a single thread, have multiple concurrent cache misses in flight at all times. So, when accessing 10 stock lines, calculate the hash numbers first; then access the memory, deferring any branches based on the accessed values. In this way, out of order execution will miss the CPU cache for many independent addresses in parallel. One can use the gcc __builtin_prefetch primitive, or simply write the program so as to have mutually data-independent memory accesses in close proximity.
For detecting deadlocks, a global transaction wait graph may have to be maintained. This will need to be maintained in a serialized manner. If many threads access this, the accesses must be serialized on a global mutex. This may be very bad if the deadlock detection takes a long time. Alternately, the wait graph may be maintained on another thread. The thread will get notices of waits and transacts from worker threads with some delay. Having spotted a cycle, it may kill one or another party. This will require some inter-thread communication. The submission may address this matter in any number of ways.
However, just acquiring a lock without wait must not involve getting a global mutex. Going to wait will have to do so, were it only for queueing a notice to a monitor thread. Using a socket-to-self might appear to circumvent this, but the communication stack will have mutexes inside so this is no better.
Evaluation Criteria
The exercise will be evaluated based on the run time performance, especially multicore scalability of the result.
Extra points are not given for implementing interfaces or for being object oriented. Interfaces, templates, and objects are not forbidden as such, but their cost must not exceed the difference between getting an address from a virtual table and calling a function directly.
The locking implementation must be correct. It can be limited to exclusive locks and need not support isolation other than repeatable read. Running the application must demonstrate deadlocks and working recovery from these.
Code and Libraries To Be Used
The TPC-C data generator and test driver are in the Virtuoso Open Source distribution, in the files binsrc/tests/tpcc*.c and files included from these. You can make the exercise in the same directory and just alter the files or make script. The application is standalone and has no other relation to the Virtuoso code. The libsrc/Thread threading wrappers may be used. If not using these, make a wrapper similar to mutex_enter when MTX_METER is defined so that it counts the waits and clocks spent during wait. Also have a report like that in mutex_stat() for the mutex wait frequency and duration.
|
|