Faceted Search: Unlimited Data in Interactive Time

Details

Why not see the whole world of data as facets? Well, we'd like to, but there is the feeling that this is not practical.

The old problem has been that it is not really practical to pre-compute counts of everything for all possible combinations of search conditions and counting/grouping/sorting. The actual matches take time.

Well, neither is in fact necessary. When there are large numbers of items matching the conditions, counting them can take time but then this is the beginning of the search, and the user is not even likely to look very closely at the counts. It is enough to see that there are many of one and few of another. If the user already knows the precise predicate or class to look for, then the top-level faceted view is not even needed. The faceted view for guiding search and precise analytics are two different problems.

There are client-side faceted views like Exhibit or our own ODE. The problem with these is that there are a few orders of magnitude difference between the actual database size and what fits on the user agent. This is compounded by the fact that one does not know what to cache on the user agent because of the open nature of the data web. If this were about a fixed workflow, then a good guess would be possible — but we are talking about the data web, the very soul of serendipity and unexpected discovery.

So we made a web service that will do faceted search on arbitrary RDF. If it does not get complete results within a timeout, it will return what it has counted so far, using Virtuoso's Anytime feature. Looking for subjects with some specific combination of properties is however a bit limited, so this will also do JOINs. Many features are one or two JOINs away; take geographical locations or social networks, for example.

Yet a faceted search should be point-and-click, and should not involve a full query construction. We put the compromise at starting with full text or property or class, then navigating down properties or classes, to arbitrary depth, tree-wise. At each step, one can see the matching instances or their classes or properties, all with counts, faceted-style.

This is good enough for queries like 'what do Harry Potter fans also like' or 'who are the authors of articles tagged semantic web and machine learning and published in 2008'. For complex grouping, sub-queries, arithmetic or such, one must write the actual query.

But one can begin with facets, and then continue refining the query by hand since the service also returns SPARQL text. We made a small web interface on top of the service with all logic server side. This proves that the web service is usable and that an interface with no AJAX, and no problems with browser interoperability or such, is possible and easy. Also, the problem of syncing between a user-agent-based store and a database is entirely gone.

If we are working with a known data structure, the user interface should choose the display by the data type and offer links to related reports. This is all easy to build as web pages or AJAX. We show how the generic interface is done in Virtuoso PL, and you can adapt that or rewrite it in PHP, Java, JavaScript, or anything else, to accommodate use-case specific navigation needs such as data format.

The web service takes an XML representation of the search, which is more restricted and easier to process by machine than the SPARQL syntax. The web service returns the results, the SPARQL query it generated, whether the results are complete or not, and some resource use statistics.

The source of the PL functions, Web Service and Virtuoso Server Page (HTML UI) will be available as part of Virtuoso 6.0 and higher. A Programmer's Guide will be available as part of the standard Virtuoso Documentation collection, including the Virtuoso Open Source Edition Website.

Comments

Re:Faceted Search: Unlimited Data in Interactive Time

Hey Kingsle,

first and foremost, great work! Your demo access to "Unlimited Data in Interactive Time" is really!!! fast.
However, adding new properties to the query for further filtering could probably be more intuitive.
Which leads me to the question: Is there a public access (e.g. a sparql endpoint) to your server? This would be great, because we are working on intuitive graphical frontends for hierarchical faceted filtering and searching for fast backends to process our queries!
If you are interested, have a look at our demo on www.gFacet.org
Look forward to your comment.

Greetings
Philipp

Posted by Philipp Heim on 01/12/2009 04:44 GMT

Comments URL for this entry: http://www.openlinksw.com/mt-tb/Http/comments?id=1515

Orri Erling's Weblog

Details

Subscribe

Tag Cloud

Post Categories

Recent Articles

Comments

Post Comment

Orri Erling's Weblog

Details

Subscribe

Tag Cloud

Post Categories

Recent Articles

Related

Comments

Post Comment