It is time for an update on Virtuoso developments.
We continue enhancing our hosting of the Linked Open Data (LOD) cloud at http://lod.openlinksw.com.
We have now added result ranking for both text and URIs. Text hit scores are based on word frequency and proximity; URI scores are based on link density.
We calculate each URI's rank by adding up references and weighing these by the score of the referrer. This is like in web search. Each iteration of the ranking will join every referred to each of its referrers. We do about 1.2 million such joins per second, across partitions, over 2.2 billion triples and 400M distinct subjects without any great optimization, just using SQL stored procedures and partitioned function calls. This is a sort of SQL map-reduce. We would do over twice as fast if it were all in C but this is adequate for now. The more interesting bit will be tuning the scoring based on what type of link we have. This is what the web search engines cannot do as well, since document links are untyped.
We are moving toward a decent user interface for the LOD hosting, including offering ready-made domain-specific queries, e.g., biomedical.
Things like "URI finding with autocomplete" are done and just have to be put online.
With linked data, there is the whole question of identifier choice. We will have a special page just for this. There we show reference statistics, synonyms declared by owl:sameAs, synonyms determined by shared property values, etc. In this way we become a terminology lookup service.
Copies of the LOD cluster system are available for evaluators, on a case by case basis. We will make this publicly available on EC2 also in not too long.
Otherwise, we continue working on productization, primarily things like reliability and recovery. One exercise is running TPC-C with intentionally stupid partitioning, so that almost all joins and deadlocks are distributed. Then we simulate a cluster interconnect that drops messages now and then, sometimes kill server processes, and still keep full ACID properties. Cloud capable, also in bad weather.
The open source release of Virtuoso 6 (no cluster) is basically ready to go, mostly this is a question of logistics.
I will talk about these things in greater individual detail next week.