As I cannot post directly to Glenn's blog titled: This is Not the Near Future (Either), I have to basically respond to him here, in blog post form :-(

What is our "Search" and "Find" demonstration about? It is about how you use the "Description" of "Things" to unambiguously locate things in a database at Web Scale.

To our perpetual chagrin, we are trying to demonstrate an engine -- not UI prowess -- but the immediate response is to jump to the UI aesthetics.

Google, Yahoo etc.. offer a simple input form for full text search patterns, they have a processing window for completing full text searches across Web Content indexed on their servers. Once the search patterns are processed, you get a page ranked result set (collection of Web pages basically that claim/state: we found N pages out of a document corpus of about M indexed pages).

Note: the estimate aspect of traditional search results in like "advertising small print" the user lives with the illusion that all possible documents on the Web (or even Internet) have been searched whereas in reality: 25% of the possible total is a major stretch; since the Web and Internet are fractal networks and scale-free, inherently growing at exponential rates "ad infinitum" across boundless dimensions of human comprehension.

The power of Linked Data ultimately comes down to the fact that the user constructs the path to what they seek via the properties of the "Things" in question. The routes are not hardwired since URI de-referencing (follow your nose pattern) is available to Linked Data aware query engines and crawlers.

We are simply trying to demonstrate how you can combine the best of full text search with the best of structured querying while reusing familiar interaction patterns from Google/Yahoo. Thus, you start with full text search, find get all the entities associated with the pattern, then use the entity types or entity properties to find what you seek.

You state in your post:

"To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed."


"On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?".

Not exactly correct. We are performing aggregates using a configurable interactive time factor. Example: tell me how many entities of type: Person, with interest: Semantic Web, exist in this database within 2 seconds. Also understand that you could retry the same query and get different numbers within the same interactive time factor. It isn't your basic "query cut-off".

"And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can."

Yes, "Microsoft" was a poor example for sure, the example could have been pattern: "glenn mcdonald", which should demonstrate the fundamental utility of what we are trying to demonstrate i.e., entity disambiguation courtesy of entity properties and/or entity type filtering.

Compare Googles results for: Glenn McDonald with those from our demo (which dissambiguate "Glenn McDonald" via associated properties and/or types), assuming we both agree that your Web Site or Blog Home isn't the center of your entity graph or personal data space (i.e., data about you); so getting your home page at the top of the Google page rank offers limited value, in reality.

What are we bragging about? A little more than what you attempt to explain. Yes, we are showing that we can find stuff within a processing window, but understand the following:

  • Processing Time Window (or interactive time) is configurable
  • Data Corpus is a Billion+ Triples (from Billion Triples Challenge Data Set)
  • SPARQL doesn't have Aggregation capabilities by default (we have implemented SPARQL-BI to deliver aggregates for analytics against large data sets, we even handle the TPC-H industry standard benchmark with SPARQL-BI)
  • Paging isn't possible without aggregates, and doing aggregates on a Billion+ triples as part of a query processing cycle isn't trivial stuff (otherwise it would be everywhere due to inherent and obvious necessity).

I hope I've clarified what's going on with our demo? If not, pose your challenge via examples and I will respond with solutions or simply cry out loud: "no mas!".

As for your "Mac OX X Leopard" comments, I can only say this: I emphasized that this is a demo, the data is pretty old, and the input data has issues (i.e. some of the input data is bad as your example shows). The purpose of this demo is not about the text per se., it's about the size of the data corpus and faceted querying. We are going to have the entire LOD Cloud loaded into the real thing, and in addition to that our Sponger Middleware will be enabled, and then you can take issue with data quality as per your reference to "Cyndi Lauper" (btw - it takes one property filter to find information about her quickly using "dbpprop:name" after filtering for properties with text values).

Of all things, this demo had nothing to do with UI and Information presentation aesthetics. It was all about combining full text search and structured queries (sparql behind the scenes) against a huge data corpus en route to solving challenges associated with faceted browsing over large data sets. We have built a service that resides inside Virtuoso. The Service is naturally of the "Web Service" variety and can be used from any consumer / client environment that speaks HTTP (directly or indirectly).

To be continued ...