In Response to: This is Not the Future (Update #3)

Details

Kingsley Uyi Idehen

Lexington, United States

FOAF

As I cannot post directly to Glenn's blog titled: This is Not the Near Future (Either), I have to basically respond to him here, in blog post form :-(

What is our "Search" and "Find" demonstration about? It is about how you use the "Description" of "Things" to unambiguously locate things in a database at Web Scale.

To our perpetual chagrin, we are trying to demonstrate an engine -- not UI prowess -- but the immediate response is to jump to the UI aesthetics.

Google, Yahoo etc.. offer a simple input form for full text search patterns, they have a processing window for completing full text searches across Web Content indexed on their servers. Once the search patterns are processed, you get a page ranked result set (collection of Web pages basically that claim/state: we found N pages out of a document corpus of about M indexed pages).

Note: the estimate aspect of traditional search results in like "advertising small print" the user lives with the illusion that all possible documents on the Web (or even Internet) have been searched whereas in reality: 25% of the possible total is a major stretch; since the Web and Internet are fractal networks and scale-free, inherently growing at exponential rates "ad infinitum" across boundless dimensions of human comprehension.

The power of Linked Data ultimately comes down to the fact that the user constructs the path to what they seek via the properties of the "Things" in question. The routes are not hardwired since URI de-referencing (follow your nose pattern) is available to Linked Data aware query engines and crawlers.

We are simply trying to demonstrate how you can combine the best of full text search with the best of structured querying while reusing familiar interaction patterns from Google/Yahoo. Thus, you start with full text search, find get all the entities associated with the pattern, then use the entity types or entity properties to find what you seek.

You state in your post:

"To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed."

Correct.

"On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?".

Not exactly correct. We are performing aggregates using a configurable interactive time factor. Example: tell me how many entities of type: Person, with interest: Semantic Web, exist in this database within 2 seconds. Also understand that you could retry the same query and get different numbers within the same interactive time factor. It isn't your basic "query cut-off".

"And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can."

Yes, "Microsoft" was a poor example for sure, the example could have been pattern: "glenn mcdonald", which should demonstrate the fundamental utility of what we are trying to demonstrate i.e., entity disambiguation courtesy of entity properties and/or entity type filtering.

Compare Googles results for: Glenn McDonald with those from our demo (which dissambiguate "Glenn McDonald" via associated properties and/or types), assuming we both agree that your Web Site or Blog Home isn't the center of your entity graph or personal data space (i.e., data about you); so getting your home page at the top of the Google page rank offers limited value, in reality.

What are we bragging about? A little more than what you attempt to explain. Yes, we are showing that we can find stuff within a processing window, but understand the following:

Processing Time Window (or interactive time) is configurable
Data Corpus is a Billion+ Triples (from Billion Triples Challenge Data Set)
SPARQL doesn't have Aggregation capabilities by default (we have implemented SPARQL-BI to deliver aggregates for analytics against large data sets, we even handle the TPC-H industry standard benchmark with SPARQL-BI)
Paging isn't possible without aggregates, and doing aggregates on a Billion+ triples as part of a query processing cycle isn't trivial stuff (otherwise it would be everywhere due to inherent and obvious necessity).

I hope I've clarified what's going on with our demo? If not, pose your challenge via examples and I will respond with solutions or simply cry out loud: "no mas!".

As for your "Mac OX X Leopard" comments, I can only say this: I emphasized that this is a demo, the data is pretty old, and the input data has issues (i.e. some of the input data is bad as your example shows). The purpose of this demo is not about the text per se., it's about the size of the data corpus and faceted querying. We are going to have the entire LOD Cloud loaded into the real thing, and in addition to that our Sponger Middleware will be enabled, and then you can take issue with data quality as per your reference to "Cyndi Lauper" (btw - it takes one property filter to find information about her quickly using "dbpprop:name" after filtering for properties with text values).

Of all things, this demo had nothing to do with UI and Information presentation aesthetics. It was all about combining full text search and structured queries (sparql behind the scenes) against a huge data corpus en route to solving challenges associated with faceted browsing over large data sets. We have built a service that resides inside Virtuoso. The Service is naturally of the "Web Service" variety and can be used from any consumer / client environment that speaks HTTP (directly or indirectly).

To be continued ...

Comments

Re:In Response to: This is Not the Future

The comments about the user interface in this blog are indeed quite improper.
However, the point being made about the importance of ranking based on relevance is very important, and has indeed not been properly addressed in Semantic Web / LOD developments so far.

Posted by Matthias Samwald on 01/13/2009 04:48 GMT-0500

Re:In Response to: This is Not the Future

Matthias, Thanks for your comments. The ideal solution comes down to combining the best of Document oriented full text search and structured Linked Data queries (driven by entity types and property filtering). Virtuoso possesses an in-built full text search engine, and we can tweak both the demo core and its UI to also unveil this aspect of Virtuoso in more depth via a page rank demo. Of course, we can also play well with Google and Yahoo (via job delegation). The only issue right now is time, we never set out to demonstrate page rank, this demo is all about structured querying :-) Kingsley

Posted by Re: In Response to: This is Not the Future on 01/13/2009 07:20 GMT-0500

Re:In Response to: This is Not the Future (Update #3)

I'm not complaining about "aesthetics", nor even exactly about UI. You're trying to make the case that this new technology (both semantic web technology in general, and Virtuoso's in particular) can provide a better searching/finding experience. (Which is something I also believe.) But the experience of using this demo is bad. You say its drawbacks are irrelevant to what you're trying to show. But the irrelevance is in the way, and I think the problems with this demo swamp out the technical virtues. And since bad demos kind of diminish everything by association, I think you reinforce the prevalent impression that "this stuff" is esoteric gibberish.

So my challenge? Do a demo that picks some coherent, explainable thing, and is clearly and unambiguously good at that, without requiring caveats and partial apologies.

Posted by glenn mcdonald on 01/13/2009 13:25 GMT-0500

Re:In Response to: This is Not the Future (Update #3)

glenn mcdonald wrote:
<<<
I'm not complaining about "aesthetics", nor even exactly about UI. You're trying to make the case that this new technology (both semantic web technology in general, and Virtuoso's in particular) can provide a better searching/finding experience. (Which is something I also believe.) But the experience of using this demo is bad. You say its drawbacks are irrelevant to what you're trying to show. But the irrelevance is in the way, and I think the problems with this demo swamp out the technical virtues. And since bad demos kind of diminish everything by association, I think you reinforce the prevalent impression that "this stuff" is esoteric gibberish.

So my challenge? Do a demo that picks some coherent, explainable thing, and is clearly and unambiguously good at that, without requiring caveats and partial apologies.
>>

Glenn,

The reason for the disclaimer is because on the "Web" I can make a web page on viewable by a user agent of type: Programmer, for instance. This demo was all about Entity-Attribute-Value (of Subject-Predicate-Object) based navigation of entity graphs when the corpus is large with the scale-free realities of the Web in play.

The goal was to show a Web Service that can alleviate higher level developers of the grunt work associated with the low-level DBMS engine stuff.

That said, I also agree with "esoteric" concern perspective that bare-bones UI demos inadvertently perpetuate :-) Thus, we will continue to improve the basic UI of this service, and get more supporting collateral out the door.

Thus, do we have a deal? Also, I am all ears re. any UI enhancements you would like to see.

Happy New Year!

Kingsley

Posted by Kingsley Uyi Idehen on 01/13/2009 14:32 GMT-0500

Comments URL for this entry: http://www.openlinksw.com/mt-tb/Http/comments?id=1518

Post Comment

Name

OpenID

Comment

Remember my details

Notify me on future updates

Issue Semantic Pingback

Notify everybody mentioned in the post

Contains Markup

To verify your request please specify the result of

1 - 8 =

Subscribe to an RSS feed of this comment thread:

Kingsley Idehen's Blog Data Space

Details

Subscribe

Tag Cloud

Post Categories

Subscribe

Recent Articles

Comments

Post Comment

Kingsley Idehen's Blog Data Space

Details

Subscribe

Tag Cloud

Post Categories

Subscribe

Recent Articles

Related

Comments

Post Comment