Description
| - LOD2's database contributions are, on one hand, Virtuoso Column Store and Elastic Cluster, and on the other, the demonstration and proof from CWI that indeed all of the relational innovations for which CWI is well known apply to graph/RDF data as well. The value is unquestionable both to Virtuoso users in the short-term, and to the state of science and to all RDF users and vendors in the mid-term. The LOD2 claim of "linking the universe" (my words) will be tested soon enough, after we first put the universe in a bucket. This refers to a real-time quad store of Sindice crawls, plus a warehouse of the LOD data sets. This effort raises a few questions that I will treat in a number of posts to follow, such as -- How do you size a real-time copy of LOD/web data? What does it cost to operate a properly provisioned warehouse of all RDF web crawls? What is done now is under-provisioned and not kept up to date. We are talking about all the RDF on the web in near real time with arbitrary queries. This is very far from the "billion triples" data sets or vertical portals, which are both easy by comparison.
- LOD2's database contributions are, on one hand, Virtuoso Column Store and Elastic Cluster, and on the other, the demonstration and proof from CWI that indeed all of the relational innovations for which CWI is well known apply to graph/RDF data as well. The value is unquestionable both to Virtuoso users in the short-term, and to the state of science and to all RDF users and vendors in the mid-term. The LOD2 claim of "linking the universe" (my words) will be tested soon enough, after we first put the universe in a bucket. This refers to a real-time quad store of Sindice crawls, plus a warehouse of the LOD data sets. This effort raises a few questions that I will treat in a number of posts to follow, such as -- How do you size a real-time copy of LOD/web data? What does it cost to operate a properly provisioned warehouse of all RDF web crawls? What is done now is under-provisioned and not kept up to date. We are talking about all the RDF on the web in near real time with arbitrary queries. This is very far from the "billion triples" data sets or vertical portals, which are both easy by comparison.
|