Details

OpenLink Software
Burlington, United States

Subscribe

Post Categories

Recent Articles

Community Member Blogs

Display Settings

articles per page.
order.

Translate

Showing posts in all categories RefreshRefresh
LDBC: A Socio-technical Perspective [ Virtuso Data Space Bot ]

(Originally posted to the LDBC blog.)

In recent days, cyberspace has seen some discussion concerning the relationship of the EU FP7 project LDBC (Linked Data Benchmark Council) and sociotechnical considerations. It has been suggested that LDBC, to its own and the community’s detriment, ignores sociotechnical aspects.

LDBC, as research projects go, actually has an unusually large, and as of this early date, successful and thriving sociotechnical aspect, i.e., involvement of users and vendors alike. I will here discuss why, insofar as the technical output of the project goes, sociotechnical metrics are in fact out of scope. Then yet again, to what degree the benefits potentially obtained from the use of LDBC outcomes are in fact realized does have a strong dependence on community building, a social process.

One criticism of big data projects we sometimes encounter is the point that data without context is not useful. Further, one cannot just assume that one can throw several data sets together and get meaning from this, as there may be different semantics for similar looking things, just think of 7 different definitions of blood pressure.

In its initial user community meeting, LDBC was, according to its charter, focusing mostly on cases where the data is already in existence and of sufficient quality for the application at hand.

Michael Brodie, Chief Scientist at Verizon, is a well known advocate of focusing on meaning of data, not only on processing performance. There is a piece on this matter by him, Peter Boncz, Chris Bizer, and myself on the Sigmod Record: "The Meaningful Use of Big Data: Four Perspectives – Four Challenges".

I had a conversation with Michael at a DERI meeting a couple of years ago about measuring the total cost of technology adoption, thus including socio-technical aspects such as acceptance by users, learning curves of various stakeholders, whether in fact one could demonstrate an overall gain in productivity arising from semantic technologies. [in my words, paraphrased]

"Can one measure the effectiveness of different approaches to data integration?" asked I.

"Of course one can," answered Michael, "this only involves carrying out the same task with two different technologies, two different teams and then doing a double blind test with users. However, this never happens. Nobody does this because doing the task even once in a large organization is enormously costly and nobody will even seriously consider doubling the expense."

LDBC does in fact intend to address technical aspects of data integration, i.e., schema conversion, entity resolution, and the like. Addressing the sociotechnical aspects of this (whether one should integrate in the first place, whether the integration result adds value, whether it violates privacy or security concerns, whether users will understand the result, what the learning curves are, etc.) is simply too diverse and so totally domain dependent that a general purpose metric cannot be developed, at least not in the time and budget constraints of the project. Further, adding a large human element in the experimental setting (e.g., how skilled the developers are, how well the stakeholders can explain their needs, how often these needs change, etc.) will lead to experiments that are so expensive to carry out and whose results will have so many unquantifiable factors that these will constitute an insuperable barrier to adoption.

Experience demonstrates that even agreeing on the relative importance of quantifiable metrics of database performance is hard enough. Overreaching would compromise the project's ability to deliver its core value. Let us next talk about this.

It is only a natural part of the political landscape that the EC's research funding choices are criticized by some members of the public. Some criticism is about the emphasis on big data. Big data is a fact on the ground, and research and industry need to deal with it. Of course, there have been and will be critics of technology in general on moral or philosophical grounds. Instead of opening this topic, I will refer you to an article by Michael Brodie. In a world where big data is a given, lowering the entry threshold for big data applications, thus making them available not only to government agencies and the largest businesses, seems ethical to me, as per Brodie's checklist. LDBC will contribute to this by driving greater availability, better performance, and lower cost for these technologies.

Once we accept that big data is there and is important, we arrive at the issue of deriving actionable meaning from it. A prerequisite of deriving actionable meaning from big data is the ability to flexibly process this data. LDBC is about creating metrics for this. The prerequisites for flexibly working with data are fairly independent of the specific use case, while the criteria of meaning, let alone actionable analysis, are very domain specific. Therefore, in order to provide the greatest service to the broadest constituency, LDBC focuses on measuring that which is most generic, yet will underlie any decision support or other data processing deployment that involves RDF or graph data.

I would say that LDBC is an exceptionally effective use of taxpayer money. LDBC will produce metrics that will drive technological innovation for years to come. The total money spent towards pursuing goals set forth by LDBC is likely to vastly exceed the budget of LDBC. Only think of the person-centuries or even millennia that have gone into optimizing for TPC-C and TPC-H. The vast majority of the money spent for these pursuits is paid by industry, not by research funding. It is spent worldwide, not in Europe alone.

Thus, if LDBC is successful, a limited amount of EC research money will influence how much greater product development budgets are spent in the future. This multiplier effect applies of course to highly successful research outcomes in general but is especially clear with LDBC.

European research funding has played a significant role in creating the foundations of the RDF/Linked Data scene. LDBC is a continuation of this policy, however the focus has now shifted to reflect the greater maturity of the technology. LDBC is now about making the RDF and graph database sectors into mature industries whose products can predictably tackle the challenges out there.

# PermaLink Comments [0]
12/03/2012 16:24 GMT
LDBC: A Socio-technical Perspective [ Orri Erling ]

(Originally posted to the LDBC blog.)

In recent days, cyberspace has seen some discussion concerning the relationship of the EU FP7 project LDBC (Linked Data Benchmark Council) and sociotechnical considerations. It has been suggested that LDBC, to its own and the community’s detriment, ignores sociotechnical aspects.

LDBC, as research projects go, actually has an unusually large, and as of this early date, successful and thriving sociotechnical aspect, i.e., involvement of users and vendors alike. I will here discuss why, insofar as the technical output of the project goes, sociotechnical metrics are in fact out of scope. Then yet again, to what degree the benefits potentially obtained from the use of LDBC outcomes are in fact realized does have a strong dependence on community building, a social process.

One criticism of big data projects we sometimes encounter is the point that data without context is not useful. Further, one cannot just assume that one can throw several data sets together and get meaning from this, as there may be different semantics for similar looking things, just think of 7 different definitions of blood pressure.

In its initial user community meeting, LDBC was, according to its charter, focusing mostly on cases where the data is already in existence and of sufficient quality for the application at hand.

Michael Brodie, Chief Scientist at Verizon, is a well known advocate of focusing on meaning of data, not only on processing performance. There is a piece on this matter by him, Peter Boncz, Chris Bizer, and myself on the Sigmod Record: "The Meaningful Use of Big Data: Four Perspectives – Four Challenges".

I had a conversation with Michael at a DERI meeting a couple of years ago about measuring the total cost of technology adoption, thus including socio-technical aspects such as acceptance by users, learning curves of various stakeholders, whether in fact one could demonstrate an overall gain in productivity arising from semantic technologies. [in my words, paraphrased]

"Can one measure the effectiveness of different approaches to data integration?" asked I.

"Of course one can," answered Michael, "this only involves carrying out the same task with two different technologies, two different teams and then doing a double blind test with users. However, this never happens. Nobody does this because doing the task even once in a large organization is enormously costly and nobody will even seriously consider doubling the expense."

LDBC does in fact intend to address technical aspects of data integration, i.e., schema conversion, entity resolution, and the like. Addressing the sociotechnical aspects of this (whether one should integrate in the first place, whether the integration result adds value, whether it violates privacy or security concerns, whether users will understand the result, what the learning curves are, etc.) is simply too diverse and so totally domain dependent that a general purpose metric cannot be developed, at least not in the time and budget constraints of the project. Further, adding a large human element in the experimental setting (e.g., how skilled the developers are, how well the stakeholders can explain their needs, how often these needs change, etc.) will lead to experiments that are so expensive to carry out and whose results will have so many unquantifiable factors that these will constitute an insuperable barrier to adoption.

Experience demonstrates that even agreeing on the relative importance of quantifiable metrics of database performance is hard enough. Overreaching would compromise the project's ability to deliver its core value. Let us next talk about this.

It is only a natural part of the political landscape that the EC's research funding choices are criticized by some members of the public. Some criticism is about the emphasis on big data. Big data is a fact on the ground, and research and industry need to deal with it. Of course, there have been and will be critics of technology in general on moral or philosophical grounds. Instead of opening this topic, I will refer you to an article by Michael Brodie. In a world where big data is a given, lowering the entry threshold for big data applications, thus making them available not only to government agencies and the largest businesses, seems ethical to me, as per Brodie's checklist. LDBC will contribute to this by driving greater availability, better performance, and lower cost for these technologies.

Once we accept that big data is there and is important, we arrive at the issue of deriving actionable meaning from it. A prerequisite of deriving actionable meaning from big data is the ability to flexibly process this data. LDBC is about creating metrics for this. The prerequisites for flexibly working with data are fairly independent of the specific use case, while the criteria of meaning, let alone actionable analysis, are very domain specific. Therefore, in order to provide the greatest service to the broadest constituency, LDBC focuses on measuring that which is most generic, yet will underlie any decision support or other data processing deployment that involves RDF or graph data.

I would say that LDBC is an exceptionally effective use of taxpayer money. LDBC will produce metrics that will drive technological innovation for years to come. The total money spent towards pursuing goals set forth by LDBC is likely to vastly exceed the budget of LDBC. Only think of the person-centuries or even millennia that have gone into optimizing for TPC-C and TPC-H. The vast majority of the money spent for these pursuits is paid by industry, not by research funding. It is spent worldwide, not in Europe alone.

Thus, if LDBC is successful, a limited amount of EC research money will influence how much greater product development budgets are spent in the future. This multiplier effect applies of course to highly successful research outcomes in general but is especially clear with LDBC.

European research funding has played a significant role in creating the foundations of the RDF/Linked Data scene. LDBC is a continuation of this policy, however the focus has now shifted to reflect the greater maturity of the technology. LDBC is now about making the RDF and graph database sectors into mature industries whose products can predictably tackle the challenges out there.

# PermaLink Comments [0]
12/03/2012 16:23 GMT
LDBC - the Linked Data Benchmark Council [ Virtuso Data Space Bot ]

(This posting was inadvertently delayed from the time of its writing, 2012-11-21.)

The Linked Data Benchmark Council (LDBC) project is officially starting now.

This represents a serious effort towards making relevant and well thought out metrics for RDF and graph databases and defining protocols for measurement and publishing of well documented and reproducible results. This also entails the creation of a TPC-analog for the graph and RDF domains.

The project brings together leading vendors, with OpenLink and Ontotext representing the RDF side and Neo Technology and Sparsity Technologies representing the graph database side. Peter Boncz of MonetDB and Vectorwise fame is the technical director, with participation from the Technical University of Munich with Thomas Neumann, known for RDF3X and HyPer. La Universitat Politècnica de Catalunya coordinates the project and brings strong academic expertise in graph databasing, also representing their Sparsity Technologies spinoff. FORTH (Foundation for Research and Technology - Hellas) of Crete contributes expertise in data integration and provenance. STI Innsbruck participates in community building and outreach.

The consortium has second-to-none understanding of benchmarking and has sufficient time allotted to the task for producing world class work, comparable to the TPC benchmarks. This has to date never been realized in the RDF or graph space.

History demonstrates that whenever something that is sufficiently important starts getting systematically measured, there is an improvement in the metric. The early days of the TPC saw a 40-fold increase in transaction processing speed. TPC-H continues to be, after 18 years, well used as a basis of quantifying advances in analytics databases.

A serious initiative for well-thought-out benchmarks for guiding the emerging RDF and graph database markets is nothing short of a necessary precondition for the emergence of a serious market with several vendors offering mutually comparable products.

Benchmarks are only as good as their credibility and adoption. For this reason, LDBC has been in touch with all graph and RDF vendors we could find, and has received a positive statement of intent from most, indicating that they would participate in a LDBC organization and contribute to shaping benchmarks.

There is further a Technical User Community, with its initial meeting this week, where present-day end users of RDF and graph databases will voice their wishes for benchmark development. Thus benchmarks will be grounded in use cases contributed by real users.

With these elements in place we have every reason to expect relevant benchmarks with broad adoption, with all the benefits this entails.

# PermaLink Comments [0]
11/28/2012 18:08 GMT
LDBC - the Linked Data Benchmark Council [ Orri Erling ]

(This posting was inadvertently delayed from the time of its writing, 2012-11-21.)

The Linked Data Benchmark Council (LDBC) project is officially starting now.

This represents a serious effort towards making relevant and well thought out metrics for RDF and graph databases and defining protocols for measurement and publishing of well documented and reproducible results. This also entails the creation of a TPC-analog for the graph and RDF domains.

The project brings together leading vendors, with OpenLink and Ontotext representing the RDF side and Neo Technology and Sparsity Technologies representing the graph database side. Peter Boncz of MonetDB and Vectorwise fame is the technical director, with participation from the Technical University of Munich with Thomas Neumann, known for RDF3X and HyPer. La Universitat Politècnica de Catalunya coordinates the project and brings strong academic expertise in graph databasing, also representing their Sparsity Technologies spinoff. FORTH (Foundation for Research and Technology - Hellas) of Crete contributes expertise in data integration and provenance. STI Innsbruck participates in community building and outreach.

The consortium has second-to-none understanding of benchmarking and has sufficient time allotted to the task for producing world class work, comparable to the TPC benchmarks. This has to date never been realized in the RDF or graph space.

History demonstrates that whenever something that is sufficiently important starts getting systematically measured, there is an improvement in the metric. The early days of the TPC saw a 40-fold increase in transaction processing speed. TPC-H continues to be, after 18 years, well used as a basis of quantifying advances in analytics databases.

A serious initiative for well-thought-out benchmarks for guiding the emerging RDF and graph database markets is nothing short of a necessary precondition for the emergence of a serious market with several vendors offering mutually comparable products.

Benchmarks are only as good as their credibility and adoption. For this reason, LDBC has been in touch with all graph and RDF vendors we could find, and has received a positive statement of intent from most, indicating that they would participate in a LDBC organization and contribute to shaping benchmarks.

There is further a Technical User Community, with its initial meeting this week, where present-day end users of RDF and graph databases will voice their wishes for benchmark development. Thus benchmarks will be grounded in use cases contributed by real users.

With these elements in place we have every reason to expect relevant benchmarks with broad adoption, with all the benefits this entails.

# PermaLink Comments [0]
11/28/2012 18:06 GMT
LDBC Technical User Community Meeting [ Virtuso Data Space Bot ]

The LDBC Technical User Community (TUC) had its initial meeting in Barcelona last week.

First we wish to thank the many end user organizations that were present. This clearly validates the project's mission and demonstrates that there is acute awareness of the need for better metrics in the field. In the following, I will summarize the requirements that were brought forth.

  • Scale out - There was near unanimity among users that even if present workloads could be handled on single servers, a scale-out growth path was highly desirable. On the other hand, some applications were scale-out based from the get go. Even when not actually used, a scale-out capability is felt to be an insurance against future need.

  • Making limits explicit - How far can this technology go? Benchmarks need to demonstrate at what scales the products being considered work best, and where they will grind to a halt. Also, the impact of scale-out on performance needs to be made clear. The cost of solutions at different scales must be made explicit.

    Many of these requirements will be met by simply following TPC practices. Now, vendors cannot be expected to publish numbers for cases where their products fail, but they do have incentives for publishing numbers on large data, and at least giving a price/performance point that exceeds most user needs.

  • Fault tolerance and operational characteristics - Present day benchmarks (e.g., the TPC ones) hardly address operational aspects that most enterprise deployments will encounter. This was already stated by Michael Stonebraker at the first TPC performance evaluation workshop some years back at VLDB in Lyon. Users want to know the price/performance impact of making fault-tolerant systems and wish to have metrics for things like backup and bulk load under online conditions. A need to operate across multiple geographies was present in more than one use case, thus requiring a degree of asynchronous replication such as log shipping.

  • Update-intensive workloads - Unlike one might think, RDF uses are not primarily load-once-plus-lookup. Freshness of data creates value, and databases, even if they are warehouses in character, need to be kept up to date much better than just by periodic reload. Online updates may be small, as for example refreshing news feeds or web crawls, where the unit of update is small but updates are many, but also replacing reference data sets of hundreds of millions of triples. The latter requirement exceeds what is practical in a single transaction. ACID was generally desired, with some interest also in eventual consistency. We did not get use cases with much repeatable read (e.g., updating account balances), but rather atomic and durable replacement of sets of statements.

  • Inference - Class and property hierarchies were common, followed by use of transitivity. owl:sameAs was not in much use, being too dangerous, i.e., a single statement may potentially have huge effect and produce unpredictable sets of properties for instances, for which applications are not prepared. Beyond these, the wishes for inference, with use cases ranging from medicine to forensics, were outside of the OWL domain. These typically involved probability scores adding up the joint occurrence of complex criteria with some numeric computation (e.g. time intervals, geography, etc.).

    As materialization of forward closure is the prevalent mode of implementing inference in RDF, users wished to have a measure of its cost in space and time, especially under online-update loads.

  • Text, XML, and Geospatial - There is no online application that does not have text search. In publishing, this is hardly ever provided by an RDF store, even if there is one in the mix. Even so, there is an understandable desire to consolidate systems, i.e., to not have an XML database for content and a separate RDF database for metadata. Also, many applications have a geospatial element. One wish was to combine XPATH/XQuery with SPARQL, and it was implied that query optimization should create good plans under these conditions.

    There was extensive discussion especially on benchmarking full-text. Such a benchmark would need to address the quality of relevance ranking. Doing new work in this space is clearly out of scope for LDBC, but an IR benchmark could be reused as an add-on to provide a quality score. The performance score would come from the LDBC side of the benchmark. Now, many of the applications of text (e.g., news) might not even sort on text match score, but rather by time. Also if the text search is applied to metadata like labels or URI strings, the quality of a match is a non-issue, as there is no document context.

  • Data integration - Almost all applications had some element of data integration. Indeed, if one uses RDF in the first place, the motivation usually has to do with schema flexibility. Having a relational schema for everything is often seen to be too hard to maintain and to lead to too much development time before an initial version of an application or answer of a business question. Data integration is everywhere but stays elusive for benchmarking. Every time it is different and most vendors present do not offer products for this specific need. Many ideas were presented, including using SPARQL for entity resolution, and for checking consistency of an integration result.

A central issue of benchmark design is having an understandable metric. People cannot make sense of more than a few figures. The TPC practice of throughput at scale and price per unit of throughput at scale is a successful example. However, it may be difficult to agree on relative weights of components if a metric is an aggregate of too many things. Also, if a benchmark has too many optional parts, metrics easily become too complicated. On the other hand, requiring too many features (e.g. XML, full text, geospatial) restricts the number of possible participants.

To stimulate innovation, a benchmark needs to be difficult but restricted to a specific domain. TPC-H is a good example, favoring specialized systems built for analytics alone. To be a predictor of total cost and performance in a complex application, a benchmark must include much more functionality, and will favor general purpose systems that do many things but are not necessarily outstanding in any single aspect.

After 1-1/2 days with users, the project team met to discuss actual benchmark task forces to be started. The conclusion was that work would initially proceed around two use cases: publishing, and social networks. The present use of RDF by the BBC and the Press Association provides the background scenario for the publishing benchmark, and the work carried out around the Social Intelligence Benchmark (SIB) in LOD2 will provide a starting point for the social network benchmark. Additionally, user scenarios from the DEX graph database user base will help shape the SN workload.

A data integration task force needs more clarification, but work in this direction is in progress.

In practice, driving progress needs well-focused benchmarks with special trick questions intended to stress specific aspects of a database engine. Providing an overall perspective on cost and online operations needs a broad mix of features to be covered.

These needs will be reconciled by having many metrics inside a single use case, i.e., a social network data set can be used for transactional updates, for lookup queries, for graph analytics, and for TPC-H style business intelligence questions, especially if integrated with another more-relational dataset. Thus there will be a mix of metrics, from transactions to analytics, with single and multiuser workloads. Whether these are packaged as separate benchmarks, or as optional sections of one, remains to be seen.
# PermaLink Comments [0]
11/27/2012 23:18 GMT
LDBC Technical User Community Meeting [ Orri Erling ]

The LDBC Technical User Community (TUC) had its initial meeting in Barcelona last week.

First we wish to thank the many end user organizations that were present. This clearly validates the project's mission and demonstrates that there is acute awareness of the need for better metrics in the field. In the following, I will summarize the requirements that were brought forth.

  • Scale out - There was near unanimity among users that even if present workloads could be handled on single servers, a scale-out growth path was highly desirable. On the other hand, some applications were scale-out based from the get go. Even when not actually used, a scale-out capability is felt to be an insurance against future need.

  • Making limits explicit - How far can this technology go? Benchmarks need to demonstrate at what scales the products being considered work best, and where they will grind to a halt. Also, the impact of scale-out on performance needs to be made clear. The cost of solutions at different scales must be made explicit.

    Many of these requirements will be met by simply following TPC practices. Now, vendors cannot be expected to publish numbers for cases where their products fail, but they do have incentives for publishing numbers on large data, and at least giving a price/performance point that exceeds most user needs.

  • Fault tolerance and operational characteristics - Present day benchmarks (e.g., the TPC ones) hardly address operational aspects that most enterprise deployments will encounter. This was already stated by Michael Stonebraker at the first TPC performance evaluation workshop some years back at VLDB in Lyon. Users want to know the price/performance impact of making fault-tolerant systems and wish to have metrics for things like backup and bulk load under online conditions. A need to operate across multiple geographies was present in more than one use case, thus requiring a degree of asynchronous replication such as log shipping.

  • Update-intensive workloads - Unlike one might think, RDF uses are not primarily load-once-plus-lookup. Freshness of data creates value, and databases, even if they are warehouses in character, need to be kept up to date much better than just by periodic reload. Online updates may be small, as for example refreshing news feeds or web crawls, where the unit of update is small but updates are many, but also replacing reference data sets of hundreds of millions of triples. The latter requirement exceeds what is practical in a single transaction. ACID was generally desired, with some interest also in eventual consistency. We did not get use cases with much repeatable read (e.g., updating account balances), but rather atomic and durable replacement of sets of statements.

  • Inference - Class and property hierarchies were common, followed by use of transitivity. owl:sameAs was not in much use, being too dangerous, i.e., a single statement may potentially have huge effect and produce unpredictable sets of properties for instances, for which applications are not prepared. Beyond these, the wishes for inference, with use cases ranging from medicine to forensics, were outside of the OWL domain. These typically involved probability scores adding up the joint occurrence of complex criteria with some numeric computation (e.g. time intervals, geography, etc.).

    As materialization of forward closure is the prevalent mode of implementing inference in RDF, users wished to have a measure of its cost in space and time, especially under online-update loads.

  • Text, XML, and Geospatial - There is no online application that does not have text search. In publishing, this is hardly ever provided by an RDF store, even if there is one in the mix. Even so, there is an understandable desire to consolidate systems, i.e., to not have an XML database for content and a separate RDF database for metadata. Also, many applications have a geospatial element. One wish was to combine XPATH/XQuery with SPARQL, and it was implied that query optimization should create good plans under these conditions.

    There was extensive discussion especially on benchmarking full-text. Such a benchmark would need to address the quality of relevance ranking. Doing new work in this space is clearly out of scope for LDBC, but an IR benchmark could be reused as an add-on to provide a quality score. The performance score would come from the LDBC side of the benchmark. Now, many of the applications of text (e.g., news) might not even sort on text match score, but rather by time. Also if the text search is applied to metadata like labels or URI strings, the quality of a match is a non-issue, as there is no document context.

  • Data integration - Almost all applications had some element of data integration. Indeed, if one uses RDF in the first place, the motivation usually has to do with schema flexibility. Having a relational schema for everything is often seen to be too hard to maintain and to lead to too much development time before an initial version of an application or answer of a business question. Data integration is everywhere but stays elusive for benchmarking. Every time it is different and most vendors present do not offer products for this specific need. Many ideas were presented, including using SPARQL for entity resolution, and for checking consistency of an integration result.

A central issue of benchmark design is having an understandable metric. People cannot make sense of more than a few figures. The TPC practice of throughput at scale and price per unit of throughput at scale is a successful example. However, it may be difficult to agree on relative weights of components if a metric is an aggregate of too many things. Also, if a benchmark has too many optional parts, metrics easily become too complicated. On the other hand, requiring too many features (e.g. XML, full text, geospatial) restricts the number of possible participants.

To stimulate innovation, a benchmark needs to be difficult but restricted to a specific domain. TPC-H is a good example, favoring specialized systems built for analytics alone. To be a predictor of total cost and performance in a complex application, a benchmark must include much more functionality, and will favor general purpose systems that do many things but are not necessarily outstanding in any single aspect.

After 1-1/2 days with users, the project team met to discuss actual benchmark task forces to be started. The conclusion was that work would initially proceed around two use cases: publishing, and social networks. The present use of RDF by the BBC and the Press Association provides the background scenario for the publishing benchmark, and the work carried out around the Social Intelligence Benchmark (SIB) in LOD2 will provide a starting point for the social network benchmark. Additionally, user scenarios from the DEX graph database user base will help shape the SN workload.

A data integration task force needs more clarification, but work in this direction is in progress.

In practice, driving progress needs well-focused benchmarks with special trick questions intended to stress specific aspects of a database engine. Providing an overall perspective on cost and online operations needs a broad mix of features to be covered.

These needs will be reconciled by having many metrics inside a single use case, i.e., a social network data set can be used for transactional updates, for lookup queries, for graph analytics, and for TPC-H style business intelligence questions, especially if integrated with another more-relational dataset. Thus there will be a mix of metrics, from transactions to analytics, with single and multiuser workloads. Whether these are packaged as separate benchmarks, or as optional sections of one, remains to be seen.
# PermaLink Comments [0]
11/27/2012 23:17 GMT
Developer Recruitment Exercise [ Virtuso Data Space Bot ]

The specification of the exercise referred to in the previous post may be found below.

Questions on the exercise can be sent to the email specified in the previous post. I may schedule a phone call to answer questions based on the initial email contact.

We seek to have all applicants complete the exercise before October 1.

General

The exercise consists of implementing a part of the TPC-C workload in memory, in C or C++. TPC-C is the long-time industry standard benchmark for transaction processing performance. We use this as a starting point for an exercise for assessing developer skill level in writing heavily multithreaded, performance-critical code.

The application performs a series of transactions against an in-memory database, encountering lock contention and occasional deadlocks. The application needs to provide atomicity, consistency, and isolation for transactions. The task consists of writing the low-level data structures for storing the memory-resident database and for managing concurrency, including lock queueing, deadlock detection, and commit/rollback. The solutions are evaluated based on their actual measured multithreaded performance on commodity servers, e.g., 8- or 12-cores of Intel Xeon.

OpenLink provides the code for data generation and driving the test. This is part of the TPC-C kit in Virtuoso Open Source. The task is to replace the SQL API calls with equivalent in-process function calls against the in-memory database developed as part of the exercise.

Rules

We are aware that the best solution to the problem may be running transactions single-threaded against in-memory hash tables without any concurrency control. The application data may be partitioned so that a single transaction can be in most cases assigned to a partition, which it will get for itself for the few microseconds it takes to do its job. For this exercise, this solution is explicitly ruled out. The application must demonstrate shared access to data, with a transaction holding multiple concurrent locks and being liable to deadlock.

TPC-C can be written so as to avoid deadlocks by always locking in a certain order. This is also expressly prohibited; in specific, the stock rows of a new order transaction must be locked in the order they are specified in the invocation. In application terms this makes no sense, but for purposes of the exercise this will serve as a natural source of deadlocks.

Parameters

The application needs to offer an interactive or scripted interface (command line is OK) which provides the following operations:

  • Clear and initialize a database of n warehouses.

  • Run n threads, each doing m new order transactions. Each thread has a home warehouse and occasionally accesses other warehouse's data. This reports the real time elapsed and the number of retries arising from deadlocks.

  • Check the consistency between the stock, orders, and order_line data structures.

  • Report system status such as clocks spent waiting for specific mutexes. This is supplied as part of the OpenLink library used by the data generator.

Data Structures

The transactions are written as C functions. The data is represented as C structs, and tree indices or hash tables are used for value-based access to the structures by key. The application has no persistent storage. The structures reference each other by the key values as in the database, so no direct pointers. The key values are to be translated into pointers with a hash table or other index-like structure.

The application must be thread-safe, and transactions must be able to roll back. Transactions will sometimes wait for each other in updating shared resources such as stock or district or warehouse balances. The application must be written so as to implement fine-grained locking, and each transaction must be able to hold multiple locks. The application must be able to detect deadlocks. For deadlock recovery, it is acceptable to abort the transaction that detects the deadlock.

C++ template libraries may be used but one must pay attention to their efficiency.

The new order transaction is the only required transaction.

All numbers can be represented as integers. This holds equally for key columns as for monetary amounts.

All index structures (e.g., hash tables) in the application must be thread safe, so that an insert would be safe with concurrent access or concurrent inserts. This holds also for index structures for tables which do not get inserts in the test (e.g. item, customer, stock, etc.).

A sequence object must not be used for assigning new values to the O_ID column of ORDERS. These values must come from the D_NEXT_O_ID column of the DISTRICT table. If a new order transaction rolls back, its update of D_NEXT_O_ID is also rolled back. This causes O_ID values to always be consecutive within a district.

TPC-C Functionality

The application must implement the TPC-C new order transaction in full. This must not avoid deadlocks by ordering locking on stock rows. See the rules section.

The transaction must have the semantics specified in TPC-C, except for durability.

Supporting Files

The test driver calling the transaction procedures is in tpccodbc.c. This can be reused so as to call the transaction procedure in process instead of the ODBC exec.

The user interface may be a command line menu with run options for different numbers of transactions with different thread counts and an option for integrity check.

The integrity check consists of verifying s_cnt_order against the orders and checking that max (O_ID) and D_NEXT_O_ID match within each district.

Running the application should give different statistics such as CPU%, cumulative time spent waiting for locks, etc. The rdtsc instruction can be used for getting clock counts for timing.

Points to Note

This section summarizes some of the design patterns and coding tricks we expect to see in a solution to the exercise. These may seem self-evident to some, but experience indicates that this is not universally so.

  • The TPC-C transaction profile for new order specifies a semantics for the operation. The order of locking is left to the implementation as long as the semantics are in effect. The application will be tested with many clients on the same warehouse, running as fast as they can. So lock contention is expected. Therefore, the transaction should be written so as to acquire the locks with the greatest contention as late as possible. No locks need be acquired for the item table since none of the transactions will update it.

  • For implementing locks, using a mutex to serialize access to application resources is not enough. Many locks will be acquired by each transaction, in an unpredictable order. Unless explicit queueing for locks is implemented with deadlock detection, the application will not work.

  • If waiting for a mutex causes the operating system to stop a thread, even when there are cores free, the latency is multiple microseconds, even if the mutex is released by its owner on the next cycle after the waiting thread is suspended. This will destroy any benefit from parallelism unless one is very careful. Programmers do not seem to instinctively know this.

Therefore any structure to which access must be serialized (e.g. hash tables, locks, etc.) needs to be protected by a mutex but must be partitioned so that there are tens or hundreds of mutexes depending on which section of the structure one is accessing.

Submissions that protect a hash table or other index-like structure for a whole application table with a single mutex or rw lock will be discarded off the bat.

Even while using many mutexes, one must hold them for a minimum of time. When accessing a hash table, do the invariant parts first; acquire the mutex after that. For example, if you calculate the hash number after acquiring the mutex for the hash table, the submission will be rejected.

The TPC-C application has some local and some scattered access. Orders are local, and stock and item lines are scattered. When doing scattered memory accesses, the program should be written so that the CPU will, from a single thread, have multiple concurrent cache misses in flight at all times. So, when accessing 10 stock lines, calculate the hash numbers first; then access the memory, deferring any branches based on the accessed values. In this way, out of order execution will miss the CPU cache for many independent addresses in parallel. One can use the gcc __builtin_prefetch primitive, or simply write the program so as to have mutually data-independent memory accesses in close proximity.

For detecting deadlocks, a global transaction wait graph may have to be maintained. This will need to be maintained in a serialized manner. If many threads access this, the accesses must be serialized on a global mutex. This may be very bad if the deadlock detection takes a long time. Alternately, the wait graph may be maintained on another thread. The thread will get notices of waits and transacts from worker threads with some delay. Having spotted a cycle, it may kill one or another party. This will require some inter-thread communication. The submission may address this matter in any number of ways.

However, just acquiring a lock without wait must not involve getting a global mutex. Going to wait will have to do so, were it only for queueing a notice to a monitor thread. Using a socket-to-self might appear to circumvent this, but the communication stack will have mutexes inside so this is no better.

Evaluation Criteria

The exercise will be evaluated based on the run time performance, especially multicore scalability of the result.

Extra points are not given for implementing interfaces or for being object oriented. Interfaces, templates, and objects are not forbidden as such, but their cost must not exceed the difference between getting an address from a virtual table and calling a function directly.

The locking implementation must be correct. It can be limited to exclusive locks and need not support isolation other than repeatable read. Running the application must demonstrate deadlocks and working recovery from these.

Code and Libraries To Be Used

The TPC-C data generator and test driver are in the Virtuoso Open Source distribution, in the files binsrc/tests/tpcc*.c and files included from these. You can make the exercise in the same directory and just alter the files or make script. The application is standalone and has no other relation to the Virtuoso code. The libsrc/Thread threading wrappers may be used. If not using these, make a wrapper similar to mutex_enter when MTX_METER is defined so that it counts the waits and clocks spent during wait. Also have a report like that in mutex_stat() for the mutex wait frequency and duration.

# PermaLink Comments [0]
08/16/2012 15:28 GMT
Developer Recruitment Exercise [ Orri Erling ]

The specification of the exercise referred to in the previous post may be found below.

Questions on the exercise can be sent to the email specified in the previous post. I may schedule a phone call to answer questions based on the initial email contact.

We seek to have all applicants complete the exercise before October 1.

General

The exercise consists of implementing a part of the TPC-C workload in memory, in C or C++. TPC-C is the long-time industry standard benchmark for transaction processing performance. We use this as a starting point for an exercise for assessing developer skill level in writing heavily multithreaded, performance-critical code.

The application performs a series of transactions against an in-memory database, encountering lock contention and occasional deadlocks. The application needs to provide atomicity, consistency, and isolation for transactions. The task consists of writing the low-level data structures for storing the memory-resident database and for managing concurrency, including lock queueing, deadlock detection, and commit/rollback. The solutions are evaluated based on their actual measured multithreaded performance on commodity servers, e.g., 8- or 12-cores of Intel Xeon.

OpenLink provides the code for data generation and driving the test. This is part of the TPC-C kit in Virtuoso Open Source. The task is to replace the SQL API calls with equivalent in-process function calls against the in-memory database developed as part of the exercise.

Rules

We are aware that the best solution to the problem may be running transactions single-threaded against in-memory hash tables without any concurrency control. The application data may be partitioned so that a single transaction can be in most cases assigned to a partition, which it will get for itself for the few microseconds it takes to do its job. For this exercise, this solution is explicitly ruled out. The application must demonstrate shared access to data, with a transaction holding multiple concurrent locks and being liable to deadlock.

TPC-C can be written so as to avoid deadlocks by always locking in a certain order. This is also expressly prohibited; in specific, the stock rows of a new order transaction must be locked in the order they are specified in the invocation. In application terms this makes no sense, but for purposes of the exercise this will serve as a natural source of deadlocks.

Parameters

The application needs to offer an interactive or scripted interface (command line is OK) which provides the following operations:

  • Clear and initialize a database of n warehouses.

  • Run n threads, each doing m new order transactions. Each thread has a home warehouse and occasionally accesses other warehouse's data. This reports the real time elapsed and the number of retries arising from deadlocks.

  • Check the consistency between the stock, orders, and order_line data structures.

  • Report system status such as clocks spent waiting for specific mutexes. This is supplied as part of the OpenLink library used by the data generator.

Data Structures

The transactions are written as C functions. The data is represented as C structs, and tree indices or hash tables are used for value-based access to the structures by key. The application has no persistent storage. The structures reference each other by the key values as in the database, so no direct pointers. The key values are to be translated into pointers with a hash table or other index-like structure.

The application must be thread-safe, and transactions must be able to roll back. Transactions will sometimes wait for each other in updating shared resources such as stock or district or warehouse balances. The application must be written so as to implement fine-grained locking, and each transaction must be able to hold multiple locks. The application must be able to detect deadlocks. For deadlock recovery, it is acceptable to abort the transaction that detects the deadlock.

C++ template libraries may be used but one must pay attention to their efficiency.

The new order transaction is the only required transaction.

All numbers can be represented as integers. This holds equally for key columns as for monetary amounts.

All index structures (e.g., hash tables) in the application must be thread safe, so that an insert would be safe with concurrent access or concurrent inserts. This holds also for index structures for tables which do not get inserts in the test (e.g. item, customer, stock, etc.).

A sequence object must not be used for assigning new values to the O_ID column of ORDERS. These values must come from the D_NEXT_O_ID column of the DISTRICT table. If a new order transaction rolls back, its update of D_NEXT_O_ID is also rolled back. This causes O_ID values to always be consecutive within a district.

TPC-C Functionality

The application must implement the TPC-C new order transaction in full. This must not avoid deadlocks by ordering locking on stock rows. See the rules section.

The transaction must have the semantics specified in TPC-C, except for durability.

Supporting Files

The test driver calling the transaction procedures is in tpccodbc.c. This can be reused so as to call the transaction procedure in process instead of the ODBC exec.

The user interface may be a command line menu with run options for different numbers of transactions with different thread counts and an option for integrity check.

The integrity check consists of verifying s_cnt_order against the orders and checking that max (O_ID) and D_NEXT_O_ID match within each district.

Running the application should give different statistics such as CPU%, cumulative time spent waiting for locks, etc. The rdtsc instruction can be used for getting clock counts for timing.

Points to Note

This section summarizes some of the design patterns and coding tricks we expect to see in a solution to the exercise. These may seem self-evident to some, but experience indicates that this is not universally so.

  • The TPC-C transaction profile for new order specifies a semantics for the operation. The order of locking is left to the implementation as long as the semantics are in effect. The application will be tested with many clients on the same warehouse, running as fast as they can. So lock contention is expected. Therefore, the transaction should be written so as to acquire the locks with the greatest contention as late as possible. No locks need be acquired for the item table since none of the transactions will update it.

  • For implementing locks, using a mutex to serialize access to application resources is not enough. Many locks will be acquired by each transaction, in an unpredictable order. Unless explicit queueing for locks is implemented with deadlock detection, the application will not work.

  • If waiting for a mutex causes the operating system to stop a thread, even when there are cores free, the latency is multiple microseconds, even if the mutex is released by its owner on the next cycle after the waiting thread is suspended. This will destroy any benefit from parallelism unless one is very careful. Programmers do not seem to instinctively know this.

Therefore any structure to which access must be serialized (e.g. hash tables, locks, etc.) needs to be protected by a mutex but must be partitioned so that there are tens or hundreds of mutexes depending on which section of the structure one is accessing.

Submissions that protect a hash table or other index-like structure for a whole application table with a single mutex or rw lock will be discarded off the bat.

Even while using many mutexes, one must hold them for a minimum of time. When accessing a hash table, do the invariant parts first; acquire the mutex after that. For example, if you calculate the hash number after acquiring the mutex for the hash table, the submission will be rejected.

The TPC-C application has some local and some scattered access. Orders are local, and stock and item lines are scattered. When doing scattered memory accesses, the program should be written so that the CPU will, from a single thread, have multiple concurrent cache misses in flight at all times. So, when accessing 10 stock lines, calculate the hash numbers first; then access the memory, deferring any branches based on the accessed values. In this way, out of order execution will miss the CPU cache for many independent addresses in parallel. One can use the gcc __builtin_prefetch primitive, or simply write the program so as to have mutually data-independent memory accesses in close proximity.

For detecting deadlocks, a global transaction wait graph may have to be maintained. This will need to be maintained in a serialized manner. If many threads access this, the accesses must be serialized on a global mutex. This may be very bad if the deadlock detection takes a long time. Alternately, the wait graph may be maintained on another thread. The thread will get notices of waits and transacts from worker threads with some delay. Having spotted a cycle, it may kill one or another party. This will require some inter-thread communication. The submission may address this matter in any number of ways.

However, just acquiring a lock without wait must not involve getting a global mutex. Going to wait will have to do so, were it only for queueing a notice to a monitor thread. Using a socket-to-self might appear to circumvent this, but the communication stack will have mutexes inside so this is no better.

Evaluation Criteria

The exercise will be evaluated based on the run time performance, especially multicore scalability of the result.

Extra points are not given for implementing interfaces or for being object oriented. Interfaces, templates, and objects are not forbidden as such, but their cost must not exceed the difference between getting an address from a virtual table and calling a function directly.

The locking implementation must be correct. It can be limited to exclusive locks and need not support isolation other than repeatable read. Running the application must demonstrate deadlocks and working recovery from these.

Code and Libraries To Be Used

The TPC-C data generator and test driver are in the Virtuoso Open Source distribution, in the files binsrc/tests/tpcc*.c and files included from these. You can make the exercise in the same directory and just alter the files or make script. The application is standalone and has no other relation to the Virtuoso code. The libsrc/Thread threading wrappers may be used. If not using these, make a wrapper similar to mutex_enter when MTX_METER is defined so that it counts the waits and clocks spent during wait. Also have a report like that in mutex_stat() for the mutex wait frequency and duration.

# PermaLink Comments [0]
08/16/2012 15:26 GMT
Developer Opportunities at OpenLink Software [ Virtuso Data Space Bot ]

If it is advanced database technology, you will get to do it with us.

We are looking for exceptional talent to implement some of the hardest stuff in the industry. This ranges from new approaches to query optimization; to parallel execution (both scale up and scale out); to elastic cloud deployments and self-managing, self-tuning, fault-tolerant databases. We are most familiar to the RDF world, but also have full SQL support, and the present work will serve both use cases equally.

We are best known in the realms of high-performance database connectivity middleware and massively-scalable Linked-Data-oriented graph-model DBMS technology.

We have the basics -- SQL and SPARQL, column store, vectored execution, cost based optimization, parallel execution (local and cluster), and so forth. In short, we have everything you would expect from a DBMS. We do transactions as well as analytics, but the greater challenges at present are on the analytics side.

You will be working with my team covering:

  • Adaptive query optimization -- interleaving execution and optimization, so as to always make the correct plan choices based on actual data characteristics

  • Self-managing cloud deployments for elastic big data -- clusters that can grow themselves and redistribute load, recover from failures, etc.

  • Developing and analyzing new benchmarks for RDF and graph databases

  • Embedding complex geospatial reasoning inside the database engine. We have the basic R-tree and the OGC geometry data types; now we need to go beyond this

  • Every type of SQL optimizer and execution engine trick that serves to optimize for TPC-H and DS.

What do I mean by really good? It boils down to being a smart and fast programmer. We have over the years talked to people, including many who have worked on DBMS programming, and found that they actually know next to nothing of database science. For example, they might not know what a hash join is. Or they might not know that interprocess latency is in the tens of microseconds even within one box, and that in that time one can do tens of index lookups. Or they might not know that blocking on a mutex kills.

If you do core database work, we want you to know how many CPU cache misses you will have in flight at any point of the algorithm, and how many clocks will be spent waiting for them at what points. Same for distributed execution: The only way a cluster can perform is having max messages with max payload per message in flight at all times.

These are things that can be learned. So I do not necessarily expect that you have in-depth experience of these, especially since most developer jobs are concerned with something else. You may have to unlearn the bad habit of putting interfaces where they do not belong, for example. Or to learn that if there is an interface, then it must pass as much data as possible in one go.

Talent is the key. You need to be a self-starter with a passion for technology and have competitive drive. These can be found in many guises, so we place very few limits on the rest. If you show you can learn and code fast, we don't necessarily care about academic or career histories. You can be located anywhere in the world, and you can work from home. There may be some travel but not very much.

In the context of EU FP7 projects, we are working with some of the best minds in database, including Peter Boncz of CWI and VU Amsterdam (MonetDB, VectorWise) and Thomas Neumann of Technical University of Munich (RDF3X, HYPER). This is an extra guarantee that you will be working on the most relevant problems in database, informed by the results of the very best work to date.

For more background, please see the IEEE Computer Society Bulletin of the Technical Committee on Data Engineering, Special Issue on Column Store Systems.

All articles and references therein are relevant for the job. Be sure to read the CWI work on run time optimization (ROX), cracking, and recycling. Do not miss the many papers on architecture-conscious, cache-optimized algorithms; see the VectorWise and MonetDB articles in the bulletin for extensive references.

If you are interested in an opportunity with us, we will ask you to do a little exercise in multithreaded, performance-critical coding, to be detailed in a blog post in a few days. If you have done similar work in research or industry, we can substitute the exercise with a suitable sample of this, but only if this is core database code.

There is a dual message: The challenges will be the toughest a very tough race can offer. On the other hand, I do not want to scare you away prematurely. Nobody knows this stuff, except for the handful of people who actually do core database work. So we are not limiting this call to this small crowd and will teach you on the job if you just come with an aptitude to think in algorithms and code fast. Experience has pros and cons so we do not put formal bounds on this. "Just out of high school" may be good enough, if you are otherwise exceptional. Prior work in RDF or semantic web is not a factor. Sponsorship of your M.Sc. or Ph.D. thesis, if the topic is in our line of work and implementation can be done in our environment, is a further possibility. Seasoned pros are also welcome and will know the nature of the gig from the reading list.

We are aiming to fill the position(s) between now and October.

Resumes and inquiries can be sent to Hugh Williams, hwilliams@openlinksw.com. We will contact applicants for interviews.

# PermaLink Comments [0]
08/07/2012 13:21 GMT
Developer Opportunities at OpenLink Software [ Orri Erling ]

If it is advanced database technology, you will get to do it with us.

We are looking for exceptional talent to implement some of the hardest stuff in the industry. This ranges from new approaches to query optimization; to parallel execution (both scale up and scale out); to elastic cloud deployments and self-managing, self-tuning, fault-tolerant databases. We are most familiar to the RDF world, but also have full SQL support, and the present work will serve both use cases equally.

We are best known in the realms of high-performance database connectivity middleware and massively-scalable Linked-Data-oriented graph-model DBMS technology.

We have the basics -- SQL and SPARQL, column store, vectored execution, cost based optimization, parallel execution (local and cluster), and so forth. In short, we have everything you would expect from a DBMS. We do transactions as well as analytics, but the greater challenges at present are on the analytics side.

You will be working with my team covering:

  • Adaptive query optimization -- interleaving execution and optimization, so as to always make the correct plan choices based on actual data characteristics

  • Self-managing cloud deployments for elastic big data -- clusters that can grow themselves and redistribute load, recover from failures, etc.

  • Developing and analyzing new benchmarks for RDF and graph databases

  • Embedding complex geospatial reasoning inside the database engine. We have the basic R-tree and the OGC geometry data types; now we need to go beyond this

  • Every type of SQL optimizer and execution engine trick that serves to optimize for TPC-H and DS.

What do I mean by really good? It boils down to being a smart and fast programmer. We have over the years talked to people, including many who have worked on DBMS programming, and found that they actually know next to nothing of database science. For example, they might not know what a hash join is. Or they might not know that interprocess latency is in the tens of microseconds even within one box, and that in that time one can do tens of index lookups. Or they might not know that blocking on a mutex kills.

If you do core database work, we want you to know how many CPU cache misses you will have in flight at any point of the algorithm, and how many clocks will be spent waiting for them at what points. Same for distributed execution: The only way a cluster can perform is having max messages with max payload per message in flight at all times.

These are things that can be learned. So I do not necessarily expect that you have in-depth experience of these, especially since most developer jobs are concerned with something else. You may have to unlearn the bad habit of putting interfaces where they do not belong, for example. Or to learn that if there is an interface, then it must pass as much data as possible in one go.

Talent is the key. You need to be a self-starter with a passion for technology and have competitive drive. These can be found in many guises, so we place very few limits on the rest. If you show you can learn and code fast, we don't necessarily care about academic or career histories. You can be located anywhere in the world, and you can work from home. There may be some travel but not very much.

In the context of EU FP7 projects, we are working with some of the best minds in database, including Peter Boncz of CWI and VU Amsterdam (MonetDB, VectorWise) and Thomas Neumann of Technical University of Munich (RDF3X, HYPER). This is an extra guarantee that you will be working on the most relevant problems in database, informed by the results of the very best work to date.

For more background, please see the IEEE Computer Society Bulletin of the Technical Committee on Data Engineering, Special Issue on Column Store Systems.

All articles and references therein are relevant for the job. Be sure to read the CWI work on run time optimization (ROX), cracking, and recycling. Do not miss the many papers on architecture-conscious, cache-optimized algorithms; see the VectorWise and MonetDB articles in the bulletin for extensive references.

If you are interested in an opportunity with us, we will ask you to do a little exercise in multithreaded, performance-critical coding, to be detailed in a blog post in a few days. If you have done similar work in research or industry, we can substitute the exercise with a suitable sample of this, but only if this is core database code.

There is a dual message: The challenges will be the toughest a very tough race can offer. On the other hand, I do not want to scare you away prematurely. Nobody knows this stuff, except for the handful of people who actually do core database work. So we are not limiting this call to this small crowd and will teach you on the job if you just come with an aptitude to think in algorithms and code fast. Experience has pros and cons so we do not put formal bounds on this. "Just out of high school" may be good enough, if you are otherwise exceptional. Prior work in RDF or semantic web is not a factor. Sponsorship of your M.Sc. or Ph.D. thesis, if the topic is in our line of work and implementation can be done in our environment, is a further possibility. Seasoned pros are also welcome and will know the nature of the gig from the reading list.

We are aiming to fill the position(s) between now and October.

Resumes and inquiries can be sent to Hugh Williams, hwilliams@openlinksw.com. We will contact applicants for interviews.

# PermaLink Comments [0]
08/07/2012 13:21 GMT
 <<     | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform