<?xml version="1.0" encoding="UTF-8" ?>
<!--ATOM based XML document generated By OpenLink Virtuoso-->
<atom:feed xmlns:atom="http://www.w3.org/2005/Atom">
<atom:id>http://www.openlinksw.com/weblog/dav/dav-blog-1/</atom:id>
<atom:title>OpenLink Community Blog</atom:title>
<atom:link href="http://www.openlinksw.com/weblog/dav/dav-blog-1/" type="text/html" rel="alternate" />
<atom:link href="http://www.openlinksw.com/GData/dav-blog-1" type="application/atom+xml" rel="self" />
<atom:subtitle>A Collection of blogs by OpenLink Staff</atom:subtitle>
 <atom:author>
  <atom:name>OpenLink Software</atom:name>
  <atom:email>kidehen@openlinksw.com</atom:email>
  </atom:author>
<atom:updated>2008-05-17T11:36:24Z</atom:updated>
<atom:generator>Virtuoso Universal Server 05.00.3031</atom:generator>
<atom:logo>http://www.openlinksw.com/weblog/public/images/vbloglogo.gif</atom:logo>
 <atom:entry>
  <atom:title>Commercializing the Semantic Web</atom:title>
  <atom:id>http://www.openlinksw.com/blog/~kidehen/?id=1363</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/~kidehen/?id=1363" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1363/1" rel="edit" />
  <atom:published>2008-05-16T22:04:01Z</atom:published>
  <atom:content type="html">&lt;p&gt;Unfortunately, I could only spend 4 days at the recent &lt;a href=&quot;http://www2008.org/&quot; id=&quot;link-id196acf60&quot;&gt;WWW2008&lt;/a&gt; event in &lt;a href=&quot;http://dbpedia.org/resource/Beijing&quot; id=&quot;link-id1974fe28&quot;&gt;Beijing&lt;/a&gt; (I departed the morning following the &lt;a href=&quot;http://events.linkeddata.org/ldow2008/&quot; id=&quot;link-id1863f858&quot;&gt;Linked Data Workshop&lt;/a&gt;), so I couldn&amp;#39;t take my slot on the &amp;quot;Commercializing the &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id18990f90&quot;&gt;Semantic Web&lt;/a&gt; panel&amp;quot; etc.. Anyway, thanks to the &lt;a href=&quot;http://dbpedia.org/resource/World_Wide_Web&quot; id=&quot;link-id0x18f29310&quot;&gt;Web&lt;/a&gt; I can still inject my points of view in the broad &lt;a href=&quot;http://dbpedia.org/resource/World_Wide_Web&quot;&gt;Web&lt;/a&gt; based discourse. Well so I hoped, when I attempted to post a comment to Paul Miller&amp;#39;s ZDNet domain hosted &lt;a href=&quot;http://dbpedia.org/resource/Blog&quot; id=&quot;link-id180d6750&quot;&gt;blog&lt;/a&gt; thread titled: &lt;a href=&quot;http://blogs.zdnet.com/semantic-web/?p=132&quot; id=&quot;link-id12d206c0&quot;&gt;Commercialising the Semantic Web&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Unfortunately, the cost of completing ZDNet&amp;#39;s unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I&amp;#39;ll settle for a trackback ping instead.&lt;/p&gt; &lt;p&gt;What follows is the cut and paste of my intended comment contributions to Paul&amp;#39;s post.&lt;/p&gt; &lt;p&gt;Paul,&lt;/p&gt; &lt;p&gt; As discussed earlier this week during &lt;a href=&quot;http://blogs.talis.com/nodalities/2008/05/kingsley-idehen-talks-about-openlink-software-linked-data-and-the-semantic-web.php&quot; id=&quot;link-id1332fb48&quot;&gt;our podcast session&lt;/a&gt;, commercialization of &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id17382338&quot;&gt;Semantic Web&lt;/a&gt; technology shouldn&amp;#39;t be a mercurial matter at this stage in the game :-) It&amp;#39;s all about looking at how it provides value :-)&lt;/p&gt; &lt;p&gt;From the &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id10d4f4a8&quot;&gt;Linked Data&lt;/a&gt; angle, the ability to produce, dispatch, and exploit &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Context_%28language_use%29&quot; id=&quot;link-id13bed160&quot;&gt;Context&lt;/a&gt;&amp;quot; across an array of &amp;quot;Perspectives&amp;quot; from a plethora of disparate &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id1731e5f0&quot;&gt;data&lt;/a&gt; sources on the Web and/or behind corporate firewalls, offers immense commercial value.&lt;/p&gt; &lt;p&gt; &lt;a href=&quot;http://developer.yahoo.com/searchmonkey/&quot; id=&quot;link-id1975d248&quot;&gt;Yahoo&amp;#39;s Searchmonkey&lt;/a&gt; effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as &amp;quot;value consumption tickets&amp;quot; (&lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id173eb7b0&quot;&gt;Data&lt;/a&gt; Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0x1c7e7f60&quot;&gt;data&lt;/a&gt; encountered on the Web. Yahoo! is about to put this light on in a big way (imho).&lt;/p&gt; &lt;p&gt;The &amp;quot;self annotating&amp;quot; nature of the Web is what ultimately drives the manifestation of the long awaited &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id0xa18a83e8&quot;&gt;Semantic Web&lt;/a&gt;. I believe I postulated about &lt;a href=&quot;http://www.openlinksw.com/weblog/public/search.vspx?blogid=127&amp;q=self%20annotation&amp;type=text&amp;output=html&quot; id=&quot;link-id173d7458&quot;&gt;&amp;quot;Self Annotation &amp;amp; the Semantic Web&amp;quot; in a number of prior posts&lt;/a&gt; which, by the way, should be &lt;a href=&quot;http://www.openlinksw.com/weblog/public/search.vspx?blogid=127&amp;type=text&amp;kwds=self%20annotation&amp;amp;OpenSearch&quot; id=&quot;link-id10b12208&quot;&gt;DataRSS compatible right now&lt;/a&gt; due to Yahoo&amp;#39;s support of OpenSearch &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0x1b8412e8&quot;&gt;Data&lt;/a&gt; Providers (which this &lt;a href=&quot;http://dbpedia.org/resource/Blog&quot; id=&quot;link-id170b8df8&quot;&gt;Blog&lt;/a&gt; Space has been for eons).&lt;/p&gt; &lt;p&gt;Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, &lt;a href=&quot;http://dbpedia.org/resource/Tag&quot; id=&quot;link-id183d5178&quot;&gt;Tag&lt;/a&gt;, Weblog, Shared Bookmark, &lt;a href=&quot;http://dbpedia.org/resource/WikiWord&quot; id=&quot;link-id10c5e758&quot;&gt;Wikiword&lt;/a&gt;, Microformat, Microformat++ (&lt;a href=&quot;http://dbpedia.org/resource/Embedded_RDF&quot; id=&quot;link-id16d8ee40&quot;&gt;eRDF&lt;/a&gt; or &lt;a href=&quot;http://dbpedia.org/resource/RDFa&quot; id=&quot;link-id1059a688&quot;&gt;RDFa&lt;/a&gt;), &lt;a href=&quot;http://dbpedia.org/resource/GRDDL&quot; id=&quot;link-id1090ae10&quot;&gt;GRDDL&lt;/a&gt; stylesheet, and RDFizer etc.. is a piece of structured &lt;a href=&quot;http://dbpedia.org/resource/Data&quot;&gt;data&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)&lt;/p&gt; &lt;p&gt;Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (&lt;a href=&quot;http://dbpedia.org/resource/Information&quot; id=&quot;link-id180e5648&quot;&gt;information&lt;/a&gt; resource), and then you add Structure to the &lt;a href=&quot;http://dbpedia.org/resource/Information&quot; id=&quot;link-id103801e0&quot;&gt;information&lt;/a&gt; resource (RSS, Atom, &lt;a href=&quot;http://dbpedia.org/resource/Microformats&quot; id=&quot;link-id17825e40&quot;&gt;microformats&lt;/a&gt;, &lt;a href=&quot;http://dbpedia.org/resource/RDFa&quot; id=&quot;link-id189a8738&quot;&gt;RDFa&lt;/a&gt;, &lt;a href=&quot;http://dbpedia.org/resource/Embedded_RDF&quot; id=&quot;link-id1933d5c0&quot;&gt;eRDF&lt;/a&gt;, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id19744878&quot;&gt;Linked Data&lt;/a&gt;) is a synch thanks to &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id180dde30&quot;&gt;RDF&lt;/a&gt; Middleware (as per &lt;a href=&quot;http://www.openlinksw.com/weblog/public/search.vspx?blogid=127&amp;type=text&amp;kwds=self%20annotation&amp;amp;OpenSearch&quot; id=&quot;link-id16dc3130&quot;&gt;earlier RDF middleware posts&lt;/a&gt;).&lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Kingsley Uyi Idehen</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:updated>2008-05-16T18:04:03.000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>Commercializing the Semantic Web</atom:title>
  <atom:id>http://www.openlinksw.com/blog/~kidehen/?id=1362</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/~kidehen/?id=1362" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1362/1" rel="edit" />
  <atom:published>2008-05-16T20:02:45Z</atom:published>
  <atom:content type="html">&lt;p&gt;Unfortunately, I could only spend 4 days at the recent &lt;a href=&quot;http://www2008.org/&quot; id=&quot;link-id196acf60&quot;&gt;WWW2008&lt;/a&gt; event in &lt;a href=&quot;http://dbpedia.org/resource/Beijing&quot; id=&quot;link-id1974fe28&quot;&gt;Beijing&lt;/a&gt; (I departed the morning following the &lt;a href=&quot;http://events.linkeddata.org/ldow2008/&quot; id=&quot;link-id1863f858&quot;&gt;Linked Data Workshop&lt;/a&gt;), so I couldn&amp;#39;t take my slot on the &amp;quot;Commercializing the &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id18990f90&quot;&gt;Semantic Web&lt;/a&gt; panel&amp;quot; etc.. Anyway, thanks to the &lt;a href=&quot;http://dbpedia.org/resource/World_Wide_Web&quot;&gt;Web&lt;/a&gt; I can still inject my points of view in the broad Web based discourse. Well so I hoped, when I attempted to post a comment to Paul Miller&amp;#39;s ZDNet domain hosted &lt;a href=&quot;http://dbpedia.org/resource/Blog&quot; id=&quot;link-id180d6750&quot;&gt;blog&lt;/a&gt; thread titled: &lt;a href=&quot;http://blogs.zdnet.com/semantic-web/?p=132&quot; id=&quot;link-id12d206c0&quot;&gt;Commercialising the Semantic Web&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Unfortunately, the cost of completing ZDNet&amp;#39;s unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I&amp;#39;ll settle for a trackback ping instead.&lt;/p&gt; &lt;p&gt;What follows is the cut and paste of my intended comment contributions to Paul&amp;#39;s post.&lt;/p&gt; &lt;p&gt;Paul,&lt;/p&gt; &lt;p&gt; As discussed earlier this week during &lt;a href=&quot;http://blogs.talis.com/nodalities/2008/05/kingsley-idehen-talks-about-openlink-software-linked-data-and-the-semantic-web.php&quot; id=&quot;link-id1332fb48&quot;&gt;our podcast session&lt;/a&gt;, commercialization of &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id17382338&quot;&gt;Semantic Web&lt;/a&gt; technology shouldn&amp;#39;t be a mercurial matter at this stage in the game :-) It&amp;#39;s all about looking at how it provides value :-)&lt;/p&gt; &lt;p&gt;From the &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id10d4f4a8&quot;&gt;Linked Data&lt;/a&gt; angle, the ability to produce, dispatch, and exploit &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Context_%28language_use%29&quot; id=&quot;link-id13bed160&quot;&gt;Context&lt;/a&gt;&amp;quot; across an array of &amp;quot;Perspectives&amp;quot; from a plethora of disparate &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id1731e5f0&quot;&gt;data&lt;/a&gt; sources on the Web and/or behind corporate firewalls, offers immense commercial value.&lt;/p&gt; &lt;p&gt; &lt;a href=&quot;http://developer.yahoo.com/searchmonkey/&quot; id=&quot;link-id1975d248&quot;&gt;Yahoo&amp;#39;s Searchmonkey&lt;/a&gt; effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as &amp;quot;value consumption tickets&amp;quot; (&lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id173eb7b0&quot;&gt;Data&lt;/a&gt; Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0x1c7e7f60&quot;&gt;data&lt;/a&gt; encountered on the Web. Yahoo! is about to put this light on in a big way (imho).&lt;/p&gt; &lt;p&gt;The &amp;quot;self annotating&amp;quot; nature of the Web is what ultimately drives the manifestation of the long awaited &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id0xa18a83e8&quot;&gt;Semantic Web&lt;/a&gt;. I believe I postulated about &lt;a href=&quot;http://www.openlinksw.com/weblog/public/search.vspx?blogid=127&amp;q=self%20annotation&amp;type=text&amp;output=html&quot; id=&quot;link-id173d7458&quot;&gt;&amp;quot;Self Annotation &amp;amp; the Semantic Web&amp;quot; in a number of prior posts&lt;/a&gt; which, by the way, should be &lt;a href=&quot;http://www.openlinksw.com/weblog/public/search.vspx?blogid=127&amp;type=text&amp;kwds=self%20annotation&amp;amp;OpenSearch&quot; id=&quot;link-id10b12208&quot;&gt;DataRSS compatible right now&lt;/a&gt; due to Yahoo&amp;#39;s support of OpenSearch &lt;a href=&quot;http://dbpedia.org/resource/Data&quot;&gt;Data&lt;/a&gt; Providers (which this &lt;a href=&quot;http://dbpedia.org/resource/Blog&quot; id=&quot;link-id170b8df8&quot;&gt;Blog&lt;/a&gt; Space has been for eons).&lt;/p&gt; &lt;p&gt;Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, &lt;a href=&quot;http://dbpedia.org/resource/Tag&quot; id=&quot;link-id183d5178&quot;&gt;Tag&lt;/a&gt;, Weblog, Shared Bookmark, &lt;a href=&quot;http://dbpedia.org/resource/WikiWord&quot; id=&quot;link-id10c5e758&quot;&gt;Wikiword&lt;/a&gt;, Microformat, Microformat++ (&lt;a href=&quot;http://dbpedia.org/resource/Embedded_RDF&quot; id=&quot;link-id16d8ee40&quot;&gt;eRDF&lt;/a&gt; or &lt;a href=&quot;http://dbpedia.org/resource/RDFa&quot; id=&quot;link-id1059a688&quot;&gt;RDFa&lt;/a&gt;), &lt;a href=&quot;http://dbpedia.org/resource/GRDDL&quot; id=&quot;link-id1090ae10&quot;&gt;GRDDL&lt;/a&gt; stylesheet, and RDFizer etc.. is a piece of structured data.&lt;/p&gt; &lt;p&gt;Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)&lt;/p&gt; &lt;p&gt;Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (&lt;a href=&quot;http://dbpedia.org/resource/Information&quot; id=&quot;link-id180e5648&quot;&gt;information&lt;/a&gt; resource), and then you add Structure to the &lt;a href=&quot;http://dbpedia.org/resource/Information&quot; id=&quot;link-id103801e0&quot;&gt;information&lt;/a&gt; resource (RSS, Atom, &lt;a href=&quot;http://dbpedia.org/resource/Microformats&quot; id=&quot;link-id17825e40&quot;&gt;microformats&lt;/a&gt;, &lt;a href=&quot;http://dbpedia.org/resource/RDFa&quot; id=&quot;link-id189a8738&quot;&gt;RDFa&lt;/a&gt;, &lt;a href=&quot;http://dbpedia.org/resource/Embedded_RDF&quot; id=&quot;link-id1933d5c0&quot;&gt;eRDF&lt;/a&gt;, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id19744878&quot;&gt;Linked Data&lt;/a&gt;) is a synch thanks to &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id180dde30&quot;&gt;RDF&lt;/a&gt; Middleware (as per &lt;a href=&quot;http://www.openlinksw.com/weblog/public/search.vspx?blogid=127&amp;type=text&amp;kwds=self%20annotation&amp;amp;OpenSearch&quot; id=&quot;link-id16dc3130&quot;&gt;earlier RDF middleware posts&lt;/a&gt;).&lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Kingsley Uyi Idehen</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="atom" />
  <atom:category term="rdf" />
  <atom:category term="rss" />
  <atom:category term="semanticweb" />
  <atom:category term="lnkeddata" />
  <atom:category term="foaf" />
  <atom:category term="sioc" />
  <atom:category term="socialnetworking" />
  <atom:category term="openlink" />
  <atom:updated>2008-05-16T16:15:29.1000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>My Talis Podcast re. Semantic Web, Linked Data, and OpenLink Software</atom:title>
  <atom:id>http://www.openlinksw.com/blog/~kidehen/?id=1361</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/~kidehen/?id=1361" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1361/1" rel="edit" />
  <atom:published>2008-05-16T00:10:23Z</atom:published>
  <atom:content type="html">&lt;p&gt; &lt;a href=&quot;http://blogs.talis.com/nodalities/2008/05/kingsley-idehen-talks-about-openlink-software-linked-data-and-the-semantic-web.php&quot; id=&quot;link-id1036b118&quot;&gt;My podcast interview&lt;/a&gt; with &lt;a href=&quot;http://www.linkedin.com/in/pau1mi11er&quot; id=&quot;link-id1026ed10&quot;&gt;Paul Miller&lt;/a&gt; of &lt;a href=&quot;http://www.talis.com&quot; id=&quot;link-id12d210d8&quot;&gt;Talis&lt;/a&gt; is out. As I listened to the podcast (naturally awkward affair) I got a first hand sense of Paul&amp;#39;s mastery of the art of interviewing, even when dealing with a fast talking &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id180e1208&quot;&gt;data&lt;/a&gt; blitzers like me. Personally, I think I still talk a little too fast (the Nigerian in me), especially when the subject matter hones right into the epicenter of my professional passions: Open &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id1737a258&quot;&gt;Data&lt;/a&gt; Access and Heterogeneous &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id180f0668&quot;&gt;Data&lt;/a&gt; Integration (aka. &lt;a href=&quot;http://dbpedia.org/resource/Virtual_Database&quot; id=&quot;link-id10c62348&quot;&gt;Virtual Database&lt;/a&gt; Technology) -- so you may need to rewind every now and then during the interview :-)&lt;/p&gt; &lt;p&gt;During this particular podcast interview, I deliberately wanted to have an conversation about the practical value of &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id180c9f88&quot;&gt;Linked Data&lt;/a&gt;, rather than the technical innards. The fundamental utility of &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id17387618&quot;&gt;Linked Data&lt;/a&gt; remains somewhat mercurial, and I am certainly hoping to do my bit at the upcoming &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id183ec288&quot;&gt;Linked Data&lt;/a&gt; Planet conference re. demonstrating and articulating &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id1401f250&quot;&gt;linked data&lt;/a&gt; value across the blurring realms of &amp;quot;the individual&amp;quot; and &amp;quot;the enterprise&amp;quot;.&lt;/p&gt; &lt;p&gt; &lt;strong&gt;Note to my old schoolmates on Facebook&lt;/strong&gt;: when you listen to this podcast you will at least reconcile &amp;quot;Uyi Idehen&amp;quot; with &amp;quot;&lt;a href=&quot;http://myopenlink.net/dataspace/person/kidehen#this&quot; id=&quot;link-id180a7060&quot;&gt;Kingsley Idehen&lt;/a&gt;&amp;quot;. Unfortunately, Facebook refuses to let me Identify myself in the manner I choose. Ideally, I would like to have the name: &amp;quot;Kingsley (Uyi) Idehen&amp;quot; associated with my Facebook ID since this is the Identifier known to my personal network of friends, family, and old schoolmates. This Identity predicament is a long running Identity case study in the making.&lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Kingsley Uyi Idehen</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="semanticweb" />
  <atom:category term="openlink" />
  <atom:category term="virtual_database" />
  <atom:category term="DataSpace" />
  <atom:updated>2008-05-16T12:53:49.2000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>On &quot;Semantic&quot;, &quot;Semantic Web&quot;, and &quot;Linked Data Web&quot;</atom:title>
  <atom:id>http://www.openlinksw.com/blog/~kidehen/?id=1360</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/~kidehen/?id=1360" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1360/1" rel="edit" />
  <atom:published>2008-05-15T14:11:13Z</atom:published>
  <atom:content type="html">&lt;p&gt; &lt;a href=&quot;http://novaspivack.typepad.com/&quot; id=&quot;link-id102f4e00&quot;&gt;Nova Spivack&lt;/a&gt; has just penned a post titled: &lt;a href=&quot;http://novaspivack.typepad.com/nova_spivacks_weblog/2008/05/on-the-differen.html&quot; id=&quot;link-id101a2300&quot;&gt;On the Difference Between &amp;quot;Semantic&amp;quot; and &amp;quot;Semantic Web&lt;/a&gt;&amp;quot;, where he covers the fundamental difference between &amp;quot;Semantic&amp;quot; (what I call &amp;quot;Semantics Inside&amp;quot;) and &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id11dd0578&quot;&gt;Semantic Web&lt;/a&gt;&amp;quot; applications. I would like to extend the distinctions further by adding the &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id10b54ca0&quot;&gt;Linked Data&lt;/a&gt; &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id106f73d0&quot;&gt;Web&lt;/a&gt;&amp;quot; distinctions to the developing discourse. &lt;/p&gt; &lt;p&gt;The &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id1089ff48&quot;&gt;Linked Data Web&lt;/a&gt; (aka. &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id10653828&quot;&gt;Linked Data&lt;/a&gt;) describes &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id134abfb0&quot;&gt;RDF&lt;/a&gt; &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id140283a8&quot;&gt;data&lt;/a&gt; injected into the Web, where the &lt;a href=&quot;http://dbpedia.org/resource/Identity_(object-oriented_programming)&quot; id=&quot;link-id1029ebf0&quot;&gt;Data Object Identifiers&lt;/a&gt; (URIs) in an &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id1011b180&quot;&gt;RDF&lt;/a&gt; graph (collection of &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id103a4960&quot;&gt;RDF&lt;/a&gt; triples) are endowed with &lt;a href=&quot;http://dbpedia.org/resource/Hypertext_Transfer_Protocol&quot; id=&quot;link-id104362d8&quot;&gt;HTTP&lt;/a&gt; based URIs. The net effect of this approach to &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id107963a0&quot;&gt;Data&lt;/a&gt; Object Identity is that it facilitates &amp;quot;Open &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id1331f640&quot;&gt;Data&lt;/a&gt; Access by Reference&amp;quot; on the Web (aka &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id10a3c608&quot;&gt;data&lt;/a&gt; dereferencing).&lt;/p&gt; &lt;p&gt;If you recall pre Web ubiquity, in the enterprise realm for instance, Open Database Connectivity (&lt;a href=&quot;http://dbpedia.org/resource/Open_Database_Connectivity&quot; id=&quot;link-id12c6dd40&quot;&gt;ODBC&lt;/a&gt;) emerged as a mechanism for separating &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id13d6a5b0&quot;&gt;Data&lt;/a&gt; Access and &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id10b29488&quot;&gt;Data&lt;/a&gt; Management in the database oriented Client-Sever model. Although &lt;a href=&quot;http://dbpedia.org/resource/Open_Database_Connectivity&quot; id=&quot;link-id106a8bd8&quot;&gt;ODBC&lt;/a&gt; gave you access to &lt;a href=&quot;http://dbpedia.org/resource/Data&quot;&gt;data&lt;/a&gt;, the data access entry point took the form of a data access specific naming mechanism called a &amp;quot;Data Source Name&amp;quot; (DSN). &lt;a href=&quot;http://dbpedia.org/resource/Open_Database_Connectivity&quot; id=&quot;link-id106eef18&quot;&gt;ODBC&lt;/a&gt; DSNs typically exposed Tables or Views. The same thing applies to &lt;a href=&quot;http://dbpedia.org/resource/Java_Database_Connectivity&quot; id=&quot;link-id12c6dfe8&quot;&gt;JDBC&lt;/a&gt; where a non &lt;a href=&quot;http://dbpedia.org/resource/Hypertext_Transfer_Protocol&quot; id=&quot;link-id104cb620&quot;&gt;HTTP&lt;/a&gt; based URN scheme applies.&lt;/p&gt; &lt;p&gt;Zip forward to where we are today on the Web; the Web is evolving from a Document centric Database to a Distributed &lt;a href=&quot;http://dbpedia.org/resource/Object_database&quot; id=&quot;link-id12d15268&quot;&gt;Object Database&lt;/a&gt;, and you should see that in &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id10716bb8&quot;&gt;Linked Data&lt;/a&gt; we are now truly looking at the best of all worlds: Web Open Database Connectivity (WODBC) with the following advantages:&lt;/p&gt; &lt;ul&gt;- direct Access to a single Record (an &lt;a href=&quot;http://dbpedia.org/resource/Entity&quot; id=&quot;link-id1037d530&quot;&gt;Entity&lt;/a&gt;) or Record Sets (&lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id10d48e98&quot;&gt;RDF&lt;/a&gt; based &lt;a href=&quot;http://dbpedia.org/resource/Entity&quot; id=&quot;link-id1402c8f0&quot;&gt;Entity&lt;/a&gt; Sets) by reference over &lt;a href=&quot;http://dbpedia.org/resource/Hypertext_Transfer_Protocol&quot; id=&quot;link-id10bae7a8&quot;&gt;HTTP&lt;/a&gt; across disparate Data Spaces on the Web&lt;/ul&gt; &lt;ul&gt;- the ability to mesh disparate data sources without being impeded by back-end DBMS engine model, vendor, host operating development frameworks, or host operating system specificity&lt;/ul&gt; &lt;ul&gt;- an opportunity to learn from the enterprise DBMS market and Client-Server markets of yore with regards to the shape and form of next generation &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id10fe4558&quot;&gt;Linked Data&lt;/a&gt; &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id10153c98&quot;&gt;Web&lt;/a&gt; oriented solutions.&lt;/ul&gt; &lt;p&gt;To conclude, we now have &amp;quot;Semantics Inside&amp;quot; (&lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id109d1280&quot;&gt;RDF&lt;/a&gt; or non &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot;&gt;RDF&lt;/a&gt;), &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id106741a8&quot;&gt;Semantic Web&lt;/a&gt;&amp;quot; (RDF graphs with Object Identifiers that may or may not be &lt;a href=&quot;http://dbpedia.org/resource/Hypertext_Transfer_Protocol&quot; id=&quot;link-id1011cc28&quot;&gt;HTTP&lt;/a&gt; based), and &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id10793f70&quot;&gt;Linked Data&lt;/a&gt; &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id149ecc10&quot;&gt;Web&lt;/a&gt;&amp;quot; (RDF graphs with Object Identifiers that must be &lt;a href=&quot;http://dbpedia.org/resource/Hypertext_Transfer_Protocol&quot; id=&quot;link-id10a3b860&quot;&gt;HTTP&lt;/a&gt; based and dereferencable) oriented applications, in the emerging landscape associated with the &amp;quot;Semantics&amp;quot; moniker.&lt;/p&gt; &lt;p&gt;As per usual, this post is a record in my &lt;a href=&quot;http://dbpedia.org/resource/Blog&quot; id=&quot;link-id1020e240&quot;&gt;Blog&lt;/a&gt; oriented &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_Spaces&quot; id=&quot;link-id105cbf90&quot;&gt;Data Space&lt;/a&gt; on the Web. The permalink of this post is a &lt;a href=&quot;http://dbpedia.org/resource/Uniform_Resource_Identifier&quot; id=&quot;link-id10ce53a8&quot;&gt;URI&lt;/a&gt; constructed with &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id1082f0f8&quot;&gt;Giant Global Graph&lt;/a&gt; enrichment in mind :-)&lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Kingsley Uyi Idehen</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="rdf" />
  <atom:category term="jdbc" />
  <atom:category term="sql" />
  <atom:category term="odbc" />
  <atom:category term="semanticweb" />
  <atom:category term="web30" />
  <atom:category term="DataSpace" />
  <atom:updated>2008-05-15T14:31:38.4000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>DBpedia Benchmark Revisited</atom:title>
  <atom:id>http://www.openlinksw.com/blog/vdb/blog/?id=1359</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/vdb/blog/?id=1359" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1359/2" rel="edit" />
  <atom:published>2008-05-09T19:33:42Z</atom:published>
  <atom:content type="html">&lt;div&gt; &lt;div style=&quot;display:none;&quot;&gt;DBpedia Benchmark Revisited&lt;/div&gt; &lt;p&gt;We ran the &lt;a href=&quot;http://dbpedia.org/resource/DBpedia&quot; id=&quot;link-id0x1cd6d0c8&quot;&gt;DBpedia&lt;/a&gt; benchmark queries again with different configurations of &lt;a href=&quot;http://virtuoso.openlinksw.com&quot; id=&quot;link-id0x1bf01048&quot;&gt;Virtuoso&lt;/a&gt;. I had not studied the details of the matter previously but now did have a closer look at the queries.&lt;/p&gt; &lt;p&gt;Comparing numbers given by different parties is a constant problem. In the case reported here, we loaded the full DBpedia 3, all languages, with about 198M triples, onto Virtuoso v5 and Virtuoso Cluster v6, all on the same 4 core 2GHz Xeon with 8G RAM. All databases were striped on 6 disks. The Cluster configuration was with 4 processes in the same box.&lt;/p&gt; &lt;p&gt;We ran the queries in two variants:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;With graph specified in the &lt;a href=&quot;http://dbpedia.org/resource/SPARQL&quot; id=&quot;link-id0x1b9d3ca0&quot;&gt;SPARQL&lt;/a&gt; &lt;code&gt;FROM&lt;/code&gt; clause, using the default indices.&lt;/li&gt; &lt;li&gt;With no graph specified anywhere, using an alternate indexing scheme.&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;The times below are for the sequence of 5 queries; individual query times are not reported. I did not do a line-by-line review of the execution plans since they seem to run well enough. We could get some extra mileage from cost model tweaks, especially for the numeric range conditions, but we will do this when somebody comes up with better times.&lt;/p&gt; &lt;p&gt;First, about Virtuoso v5: Because there is a query in the set that specifies no condition on S or O and only P, this simply cannot be done with the default indices. With Virtuoso Cluster v6 it sort-of can, because v6 is more space efficient.&lt;/p&gt; &lt;p&gt;So we added the index:&lt;/p&gt; &lt;blockquote&gt; &lt;code&gt; create bitmap index &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id0x1c364a58&quot;&gt;rdf&lt;/a&gt;_quad_pogs on rdf_quad (p, o, g, s); &lt;/code&gt; &lt;/blockquote&gt; &lt;table&gt; &lt;tr&gt; &lt;td&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt;Virtuoso v5 with&lt;br /&gt; gspo, ogps, pogs&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt;Virtuoso Cluster v6 with &lt;br /&gt;gspo, ogps&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt;Virtuoso Cluster v6 with &lt;br /&gt;gspo, ogps, pogs&lt;/b&gt; &lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&lt;b&gt;cold&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;210 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;136 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;33.4 s&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&lt;b&gt;warm&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;0.600 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;4.01 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;0.628 s&lt;/td&gt; &lt;/tr&gt; &lt;/table&gt; &lt;p&gt;OK, so now let us do it without a graph being specified. For all platforms, we drop any existing indices, and --&lt;/p&gt; &lt;blockquote&gt; &lt;code&gt; create table r2 (g iri_id_8, s, iri_id_8, p iri_id_8, o any, primary key (s, p, o, g)) &lt;br /&gt; alter index R2 on R2 partition (s int (0hexffff00)); &lt;br /&gt; &lt;br /&gt; log_enable (2); &lt;br /&gt; insert into r2 (g, s, p, o) select g, s, p, o from rdf_quad; &lt;br /&gt; &lt;br /&gt; drop table rdf_quad; &lt;br /&gt; alter table r2 rename RDF_QUAD; &lt;br /&gt; create bitmap index rdf_quad_opgs on rdf_quad (o, p, g, s) partition (o varchar (-1, 0hexffff)); &lt;br /&gt; create bitmap index rdf_quad_pogs on rdf_quad (p, o, g, s) partition (o varchar (-1, 0hexffff)); &lt;br /&gt; create bitmap index rdf_quad_gpos on rdf_quad (g, p, o, s) partition (o varchar (-1, 0hexffff)); &lt;/code&gt; &lt;/blockquote&gt; &lt;p&gt;The code is identical for v5 and v6, except that with v5 we use &lt;code&gt;iri_id (32 bit)&lt;/code&gt; for the type, not &lt;code&gt;iri_id_8 (64 bit)&lt;/code&gt;. We note that we run out of IDs with v5 around a few billion triples, so with v6 we have double the ID length and still manage to be vastly more space efficient.&lt;/p&gt; &lt;p&gt;With the above 4 indices, we can query the &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0x1bae4cd8&quot;&gt;data&lt;/a&gt; pretty much in any combination without hitting a full scan of any index. We note that all indices that do not begin with s end with s as a bitmap. This takes about 60% of the space of a non-bitmap index for data such as DBpedia.&lt;/p&gt; &lt;p&gt;If you intend to do completely arbitrary RDF queries in Virtuoso, then chances are you are best off with the above index scheme.&lt;/p&gt; &lt;table&gt; &lt;tr&gt; &lt;td&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt; Virtuoso v5 with&lt;br /&gt; gspo, ogps, pogs&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt; Virtuoso Cluster v6 with &lt;br /&gt; spog, pogs, opgs, gpos &lt;/b&gt; &lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&lt;b&gt;warm&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;0.595 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;0.617 s&lt;/td&gt; &lt;/tr&gt; &lt;/table&gt; &lt;p&gt;The cold times were about the same as above, so not reproduced.&lt;/p&gt; &lt;h3&gt;Graph or No Graph?&lt;/h3&gt; &lt;p&gt;It is in the SPARQL spirit to specify a graph and for pretty much any application, there are entirely sensible ways of keeping the data in graphs and specifying which ones are concerned by queries. This is why Virtuoso is set up for this by default.&lt;/p&gt; &lt;p&gt;On the other hand, for the open web scenario, dealing with an unknown large number of graphs, enumerating graphs is not possible and questions like which graph of which source asserts x become relevant. We have two distinct use cases which warrant different setups of the database, simple as that.&lt;/p&gt; &lt;p&gt;The latter use case is not really within the SPARQL spec, so implementations may or may not support this. For example &lt;a href=&quot;http://dbpedia.org/resource/Oracle_Database&quot; id=&quot;link-id0x1cd2db78&quot;&gt;Oracle&lt;/a&gt; or Vertica would not do this well since they partition data according to graph or predicate, respectively. On the other hand, stores that work with one quad table, which is most of the ones out there, should do it maybe with some configuring, as shown above.&lt;/p&gt; &lt;p&gt;Frameworks like Jena are not to my &lt;a href=&quot;http://dbpedia.org/resource/Knowledge&quot; id=&quot;link-id0x1b300390&quot;&gt;knowledge&lt;/a&gt; geared towards having a wildcard for graph, although I would suppose this can be arranged by adding some &amp;quot;super-graph&amp;quot; object, a graph of all graphs. I don&amp;#39;t think this is directly supported and besides most apps would not need it.&lt;/p&gt; &lt;p&gt;Once the indices are right, there is no difference between specifying a graph and not specifying a graph with the queries considered. With more complex queries, specifying a graph or set of graphs does allow some optimizations that cannot be done with no graph specified. For example, bitmap intersections are possible only when all leading key parts are given.&lt;/p&gt; &lt;h3&gt;Conclusions&lt;/h3&gt; &lt;p&gt;The best warm cache time is with v5; the five queries run under 600 ms after the first go. This is noted to show that all-in-memory with a single thread of execution is hard to beat.&lt;/p&gt; &lt;p&gt;Cluster v6 performs the same queries in 623 ms. What is gained in parallelism is lost in latency if all operations complete in microseconds. On the other hand, Cluster v6 leaves v5 in the dust in any situation that has less than 100% hit rate. This is due to actual benefit from parallelism if operations take longer than a few microseconds, such as in the case of disk reads. Cluster v6 has substantially better data layout on disk, as well as fewer pages to load for the same content.&lt;/p&gt; &lt;p&gt;This makes it possible to run the queries without the pogs index on Cluster v6 even when v5 takes prohibitively long.&lt;/p&gt; &lt;p&gt;The morale of the story is to have a lot of RAM and space-efficient data representation.&lt;/p&gt; &lt;p&gt;The DBpedia benchmark does not specify any random access pattern that would give a measure of sustained throughput under load, so we are left with the extremes of cold and warm cache of which neither is quite realistic.&lt;/p&gt; &lt;p&gt;Chris Bizer and I have talked on and off about benchmarks and I have made suggestions that we will see incorporated into the Berlin SPARQL benchmark, which will, I believe, be much more informative.&lt;/p&gt; &lt;h3&gt;Appendix: Query Text&lt;/h3&gt; &lt;p&gt;For reference, the query texts specifying the graph are below. To run without specifying the graph, just drop the &lt;code&gt;FROM &amp;lt;&lt;a href=&quot;http://dbpedia.org/resource/Hypertext_Transfer_Protocol&quot; id=&quot;link-id0x1c371db0&quot;&gt;http&lt;/a&gt;://dbpedia.org&amp;gt;&lt;/code&gt; from each query. The returned row counts are indicated below each query&amp;#39;s text.&lt;/p&gt; &lt;blockquote&gt; &lt;code&gt;&lt;pre&gt; sparql SELECT ?p ?o FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { &amp;lt;http://dbpedia.org/resource/Metropolitan_Museum_of_Art&amp;gt; ?p ?o }; -- 1337 rows sparql PREFIX p: &amp;lt;http://dbpedia.org/property/&amp;gt; SELECT ?film1 ?actor1 ?film2 ?actor2 FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { ?film1 p:starring &amp;lt;http://dbpedia.org/resource/Kevin_Bacon&amp;gt; . ?film1 p:starring ?actor1 . ?film2 p:starring ?actor1 . ?film2 p:starring ?actor2 . }; -- 23910 rows sparql PREFIX p: &amp;lt;http://dbpedia.org/property/&amp;gt; SELECT ?artist ?artwork ?museum ?director FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { ?artwork p:artist ?artist . ?artwork p:museum ?museum . ?museum p:director ?director }; -- 303 rows sparql PREFIX geo: &amp;lt;http://www.w3.org/2003/01/geo/wgs84_pos#&amp;gt; PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt; PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; SELECT ?s ?homepage FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { &amp;lt;http://dbpedia.org/resource/Berlin&amp;gt; geo:lat ?berlinLat . &amp;lt;http://dbpedia.org/resource/Berlin&amp;gt; geo:long ?berlinLong . ?s geo:lat ?lat . ?s geo:long ?long . ?s foaf:homepage ?homepage . FILTER ( ?lat &amp;lt;= ?berlinLat + 0.03190235436 &amp;amp;&amp;amp; ?long &amp;gt;= ?berlinLong - 0.08679199218 &amp;amp;&amp;amp; ?lat &amp;gt;= ?berlinLat - 0.03190235436 &amp;amp;&amp;amp; ?long &amp;lt;= ?berlinLong + 0.08679199218) }; -- 56 rows sparql PREFIX geo: &amp;lt;http://www.w3.org/2003/01/geo/wgs84_pos#&amp;gt; PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt; PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; PREFIX p: &amp;lt;http://dbpedia.org/property/&amp;gt; SELECT ?s ?a ?homepage FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { &amp;lt;http://dbpedia.org/resource/New_York_City&amp;gt; geo:lat ?nyLat . &amp;lt;http://dbpedia.org/resource/New_York_City&amp;gt; geo:long ?nyLong . ?s geo:lat ?lat . ?s geo:long ?long . ?s p:architect ?a . ?a foaf:homepage ?homepage . FILTER ( ?lat &amp;lt;= ?nyLat + 0.3190235436 &amp;amp;&amp;amp; ?long &amp;gt;= ?nyLong - 0.8679199218 &amp;amp;&amp;amp; ?lat &amp;gt;= ?nyLat - 0.3190235436 &amp;amp;&amp;amp; ?long &amp;lt;= ?nyLong + 0.8679199218) }; -- 13 rows &lt;/pre&gt; &lt;/code&gt; &lt;/blockquote&gt; &lt;/div&gt;</atom:content>
  <atom:author>
    <atom:name>Virtuso Data Space Bot</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="database" />
  <atom:category term="databases" />
  <atom:category term="benchmarking" />
  <atom:category term="scalability" />
  <atom:category term="rdf" />
  <atom:category term="oracle" />
  <atom:category term="foaf" />
  <atom:category term="semanticweb" />
  <atom:category term="sparql" />
  <atom:category term="socialnetworking" />
  <atom:category term="virtuoso" />
  <atom:updated>2008-05-12T11:24:43.000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>DBpedia Benchmark Revisited</atom:title>
  <atom:id>http://www.openlinksw.com/weblog/oerling/?id=1358</atom:id>
  <atom:link href="http://www.openlinksw.com/weblog/oerling/?id=1358" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1358/2" rel="edit" />
  <atom:published>2008-05-09T19:27:00Z</atom:published>
  <atom:content type="html">&lt;p&gt;We ran the &lt;a href=&quot;http://dbpedia.org/resource/DBpedia&quot; id=&quot;link-id0x1b7f9688&quot;&gt;DBpedia&lt;/a&gt; benchmark queries again with different configurations of &lt;a href=&quot;http://virtuoso.openlinksw.com&quot; id=&quot;link-id0x1cca2e00&quot;&gt;Virtuoso&lt;/a&gt;. I had not studied the details of the matter previously but now did have a closer look at the queries.&lt;/p&gt; &lt;p&gt;Comparing numbers given by different parties is a constant problem. In the case reported here, we loaded the full DBpedia 3, all languages, with about 198M triples, onto Virtuoso v5 and Virtuoso Cluster v6, all on the same 4 core 2GHz Xeon with 8G RAM. All databases were striped on 6 disks. The Cluster configuration was with 4 processes in the same box.&lt;/p&gt; &lt;p&gt;We ran the queries in two variants:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;With graph specified in the &lt;a href=&quot;http://dbpedia.org/resource/SPARQL&quot; id=&quot;link-id0x1b77f758&quot;&gt;SPARQL&lt;/a&gt; &lt;code&gt;FROM&lt;/code&gt; clause, using the default indices.&lt;/li&gt; &lt;li&gt;With no graph specified anywhere, using an alternate indexing scheme.&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;The times below are for the sequence of 5 queries; individual query times are not reported. I did not do a line-by-line review of the execution plans since they seem to run well enough. We could get some extra mileage from cost model tweaks, especially for the numeric range conditions, but we will do this when somebody comes up with better times.&lt;/p&gt; &lt;p&gt;First, about Virtuoso v5: Because there is a query in the set that specifies no condition on S or O and only P, this simply cannot be done with the default indices. With Virtuoso Cluster v6 it sort-of can, because v6 is more space efficient.&lt;/p&gt; &lt;p&gt;So we added the index:&lt;/p&gt; &lt;blockquote&gt; &lt;code&gt; create bitmap index &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id0x1cb0b180&quot;&gt;rdf&lt;/a&gt;_quad_pogs on rdf_quad (p, o, g, s); &lt;/code&gt; &lt;/blockquote&gt; &lt;table&gt; &lt;tr&gt; &lt;td&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt;Virtuoso v5 with&lt;br /&gt; gspo, ogps, pogs&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt;Virtuoso Cluster v6 with &lt;br /&gt;gspo, ogps&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt;Virtuoso Cluster v6 with &lt;br /&gt;gspo, ogps, pogs&lt;/b&gt; &lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&lt;b&gt;cold&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;210 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;136 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;33.4 s&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&lt;b&gt;warm&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;0.600 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;4.01 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;0.628 s&lt;/td&gt; &lt;/tr&gt; &lt;/table&gt; &lt;p&gt;OK, so now let us do it without a graph being specified. For all platforms, we drop any existing indices, and --&lt;/p&gt; &lt;blockquote&gt; &lt;code&gt; create table r2 (g iri_id_8, s, iri_id_8, p iri_id_8, o any, primary key (s, p, o, g)) &lt;br /&gt; alter index R2 on R2 partition (s int (0hexffff00)); &lt;br /&gt; &lt;br /&gt; log_enable (2); &lt;br /&gt; insert into r2 (g, s, p, o) select g, s, p, o from rdf_quad; &lt;br /&gt; &lt;br /&gt; drop table rdf_quad; &lt;br /&gt; alter table r2 rename RDF_QUAD; &lt;br /&gt; create bitmap index rdf_quad_opgs on rdf_quad (o, p, g, s) partition (o varchar (-1, 0hexffff)); &lt;br /&gt; create bitmap index rdf_quad_pogs on rdf_quad (p, o, g, s) partition (o varchar (-1, 0hexffff)); &lt;br /&gt; create bitmap index rdf_quad_gpos on rdf_quad (g, p, o, s) partition (o varchar (-1, 0hexffff)); &lt;/code&gt; &lt;/blockquote&gt; &lt;p&gt;The code is identical for v5 and v6, except that with v5 we use &lt;code&gt;iri_id (32 bit)&lt;/code&gt; for the type, not &lt;code&gt;iri_id_8 (64 bit)&lt;/code&gt;. We note that we run out of IDs with v5 around a few billion triples, so with v6 we have double the ID length and still manage to be vastly more space efficient.&lt;/p&gt; &lt;p&gt;With the above 4 indices, we can query the &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0x6339b80&quot;&gt;data&lt;/a&gt; pretty much in any combination without hitting a full scan of any index. We note that all indices that do not begin with s end with s as a bitmap. This takes about 60% of the space of a non-bitmap index for data such as DBpedia.&lt;/p&gt; &lt;p&gt;If you intend to do completely arbitrary RDF queries in Virtuoso, then chances are you are best off with the above index scheme.&lt;/p&gt; &lt;table&gt; &lt;tr&gt; &lt;td&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt; Virtuoso v5 with&lt;br /&gt; gspo, ogps, pogs&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;&lt;b&gt; Virtuoso Cluster v6 with &lt;br /&gt; spog, pogs, opgs, gpos &lt;/b&gt; &lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&lt;b&gt;warm&lt;/b&gt; &lt;/td&gt; &lt;td align=&quot;center&quot;&gt;0.595 s&lt;/td&gt; &lt;td align=&quot;center&quot;&gt;0.617 s&lt;/td&gt; &lt;/tr&gt; &lt;/table&gt; &lt;p&gt;The cold times were about the same as above, so not reproduced.&lt;/p&gt; &lt;h3&gt;Graph or No Graph?&lt;/h3&gt; &lt;p&gt;It is in the SPARQL spirit to specify a graph and for pretty much any application, there are entirely sensible ways of keeping the data in graphs and specifying which ones are concerned by queries. This is why Virtuoso is set up for this by default.&lt;/p&gt; &lt;p&gt;On the other hand, for the open web scenario, dealing with an unknown large number of graphs, enumerating graphs is not possible and questions like which graph of which source asserts x become relevant. We have two distinct use cases which warrant different setups of the database, simple as that.&lt;/p&gt; &lt;p&gt;The latter use case is not really within the SPARQL spec, so implementations may or may not support this. For example &lt;a href=&quot;http://dbpedia.org/resource/Oracle_Database&quot; id=&quot;link-id0x11ed7028&quot;&gt;Oracle&lt;/a&gt; or Vertica would not do this well since they partition data according to graph or predicate, respectively. On the other hand, stores that work with one quad table, which is most of the ones out there, should do it maybe with some configuring, as shown above.&lt;/p&gt; &lt;p&gt;Frameworks like Jena are not to my &lt;a href=&quot;http://dbpedia.org/resource/Knowledge&quot; id=&quot;link-id0x1a49ded0&quot;&gt;knowledge&lt;/a&gt; geared towards having a wildcard for graph, although I would suppose this can be arranged by adding some &amp;quot;super-graph&amp;quot; object, a graph of all graphs. I don&amp;#39;t think this is directly supported and besides most apps would not need it.&lt;/p&gt; &lt;p&gt;Once the indices are right, there is no difference between specifying a graph and not specifying a graph with the queries considered. With more complex queries, specifying a graph or set of graphs does allow some optimizations that cannot be done with no graph specified. For example, bitmap intersections are possible only when all leading key parts are given.&lt;/p&gt; &lt;h3&gt;Conclusions&lt;/h3&gt; &lt;p&gt;The best warm cache time is with v5; the five queries run under 600 ms after the first go. This is noted to show that all-in-memory with a single thread of execution is hard to beat.&lt;/p&gt; &lt;p&gt;Cluster v6 performs the same queries in 623 ms. What is gained in parallelism is lost in latency if all operations complete in microseconds. On the other hand, Cluster v6 leaves v5 in the dust in any situation that has less than 100% hit rate. This is due to actual benefit from parallelism if operations take longer than a few microseconds, such as in the case of disk reads. Cluster v6 has substantially better data layout on disk, as well as fewer pages to load for the same content.&lt;/p&gt; &lt;p&gt;This makes it possible to run the queries without the pogs index on Cluster v6 even when v5 takes prohibitively long.&lt;/p&gt; &lt;p&gt;The morale of the story is to have a lot of RAM and space-efficient data representation.&lt;/p&gt; &lt;p&gt;The DBpedia benchmark does not specify any random access pattern that would give a measure of sustained throughput under load, so we are left with the extremes of cold and warm cache of which neither is quite realistic.&lt;/p&gt; &lt;p&gt;Chris Bizer and I have talked on and off about benchmarks and I have made suggestions that we will see incorporated into the Berlin SPARQL benchmark, which will, I believe, be much more informative.&lt;/p&gt; &lt;h3&gt;Appendix: Query Text&lt;/h3&gt; &lt;p&gt;For reference, the query texts specifying the graph are below. To run without specifying the graph, just drop the &lt;code&gt;FROM &amp;lt;&lt;a href=&quot;http://dbpedia.org/resource/Hypertext_Transfer_Protocol&quot; id=&quot;link-id0x1905bfd0&quot;&gt;http&lt;/a&gt;://dbpedia.org&amp;gt;&lt;/code&gt; from each query. The returned row counts are indicated below each query&amp;#39;s text.&lt;/p&gt; &lt;blockquote&gt; &lt;code&gt;&lt;pre&gt; sparql SELECT ?p ?o FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { &amp;lt;http://dbpedia.org/resource/Metropolitan_Museum_of_Art&amp;gt; ?p ?o }; -- 1337 rows sparql PREFIX p: &amp;lt;http://dbpedia.org/property/&amp;gt; SELECT ?film1 ?actor1 ?film2 ?actor2 FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { ?film1 p:starring &amp;lt;http://dbpedia.org/resource/Kevin_Bacon&amp;gt; . ?film1 p:starring ?actor1 . ?film2 p:starring ?actor1 . ?film2 p:starring ?actor2 . }; -- 23910 rows sparql PREFIX p: &amp;lt;http://dbpedia.org/property/&amp;gt; SELECT ?artist ?artwork ?museum ?director FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { ?artwork p:artist ?artist . ?artwork p:museum ?museum . ?museum p:director ?director }; -- 303 rows sparql PREFIX geo: &amp;lt;http://www.w3.org/2003/01/geo/wgs84_pos#&amp;gt; PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt; PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; SELECT ?s ?homepage FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { &amp;lt;http://dbpedia.org/resource/Berlin&amp;gt; geo:lat ?berlinLat . &amp;lt;http://dbpedia.org/resource/Berlin&amp;gt; geo:long ?berlinLong . ?s geo:lat ?lat . ?s geo:long ?long . ?s foaf:homepage ?homepage . FILTER ( ?lat &amp;lt;= ?berlinLat + 0.03190235436 &amp;amp;&amp;amp; ?long &amp;gt;= ?berlinLong - 0.08679199218 &amp;amp;&amp;amp; ?lat &amp;gt;= ?berlinLat - 0.03190235436 &amp;amp;&amp;amp; ?long &amp;lt;= ?berlinLong + 0.08679199218) }; -- 56 rows sparql PREFIX geo: &amp;lt;http://www.w3.org/2003/01/geo/wgs84_pos#&amp;gt; PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt; PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; PREFIX p: &amp;lt;http://dbpedia.org/property/&amp;gt; SELECT ?s ?a ?homepage FROM &amp;lt;http://dbpedia.org&amp;gt; WHERE { &amp;lt;http://dbpedia.org/resource/New_York_City&amp;gt; geo:lat ?nyLat . &amp;lt;http://dbpedia.org/resource/New_York_City&amp;gt; geo:long ?nyLong . ?s geo:lat ?lat . ?s geo:long ?long . ?s p:architect ?a . ?a foaf:homepage ?homepage . FILTER ( ?lat &amp;lt;= ?nyLat + 0.3190235436 &amp;amp;&amp;amp; ?long &amp;gt;= ?nyLong - 0.8679199218 &amp;amp;&amp;amp; ?lat &amp;gt;= ?nyLat - 0.3190235436 &amp;amp;&amp;amp; ?long &amp;lt;= ?nyLong + 0.8679199218) }; -- 13 rows &lt;/pre&gt; &lt;/code&gt; &lt;/blockquote&gt;</atom:content>
  <atom:author>
    <atom:name>Orri Erling</atom:name>
    <atom:email>oerling@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="database" />
  <atom:category term="databases" />
  <atom:category term="benchmarking" />
  <atom:category term="scalability" />
  <atom:category term="rdf" />
  <atom:category term="oracle" />
  <atom:category term="foaf" />
  <atom:category term="semanticweb" />
  <atom:category term="sparql" />
  <atom:category term="socialnetworking" />
  <atom:category term="virtuoso" />
  <atom:updated>2008-05-12T11:24:36.000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>Comments about recent Semantic Gang Podcast</atom:title>
  <atom:id>http://www.openlinksw.com/blog/~kidehen/?id=1357</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/~kidehen/?id=1357" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1357/1" rel="edit" />
  <atom:published>2008-05-02T21:44:31Z</atom:published>
  <atom:content type="html">&lt;p&gt;After listening to the &lt;a href=&quot;http://semanticgang.talis.com/2008/05/02/april-2008-the-semantic-web-gang-discuss-a-wikipedia-for-data/&quot; id=&quot;link-id1089e218&quot;&gt;latest Semantic Web Gang podcast&lt;/a&gt;, I found myself agreeing with some of the points made by &lt;a href=&quot;http://www.linkedin.com/in/iskold&quot; id=&quot;link-id10b91e58&quot;&gt;Alex Iskold&lt;/a&gt;, specifically: &lt;/p&gt; &lt;ul&gt;-- &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id106e24e0&quot;&gt;Linked Data&lt;/a&gt; does not implicitly imply making all your &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id17ab3d48&quot;&gt;data&lt;/a&gt; public&lt;/ul&gt; &lt;ul&gt;-- &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id11fdcef0&quot;&gt;Linked Data&lt;/a&gt; principles benefit &lt;a href=&quot;http://dbpedia.org/resource/Intranet&quot; id=&quot;link-id109756e8&quot;&gt;Intranet&lt;/a&gt; and &lt;a href=&quot;http://dbpedia.org/resource/Extranet&quot; id=&quot;link-id1099cfd8&quot;&gt;Extranet&lt;/a&gt; style &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id10cd25b0&quot;&gt;data&lt;/a&gt; integration (trumps alternative &lt;a href=&quot;http://dbpedia.org/resource/federated_database_system&quot; id=&quot;link-id14f29940&quot;&gt;distributed database&lt;/a&gt; integration approaches any day)&lt;/ul&gt; &lt;ul&gt;-- Business exploitation of &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id0xca51940&quot;&gt;Linked Data&lt;/a&gt; on the &lt;a href=&quot;http://dbpedia.org/resource/World_Wide_Web&quot;&gt;Web&lt;/a&gt; will certainly be driven by the correlation of opportunity costs (which is more than likely what Alex meant by &amp;quot;use cases&amp;quot;) associated with the lack of URIs originating from the domain of a given business (Tom Heath: also effectively alluded to this via his &lt;a href=&quot;http://dbpedia.org/resource/BBC&quot; id=&quot;link-id16f33348&quot;&gt;BBC&lt;/a&gt; and &lt;a href=&quot;http://dbpedia.org/resource/Uniform_Resource_Identifier&quot; id=&quot;link-id10decf38&quot;&gt;URI&lt;/a&gt; land grab anecdotes; same applies Georgi&amp;#39;s examples)&lt;/ul&gt; &lt;ul&gt;-- History is a great tutor, answers to many of today&amp;#39;s problems always lie somewhere in plain sight of the past.&lt;/ul&gt; &lt;p&gt;Of course, I also believe that &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot;&gt;Linked Data&lt;/a&gt; serves Web &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0x1afebd58&quot;&gt;Data&lt;/a&gt; Integration across the &lt;a href=&quot;http://dbpedia.org/resource/Internet&quot; id=&quot;link-id10aa5668&quot;&gt;Internet&lt;/a&gt; very well too, and the fact that it will be beneficial to businesses in a big way. No individual or organization is an island, I think the &lt;a href=&quot;http://dbpedia.org/resource/Internet&quot; id=&quot;link-id0xb25fbd0&quot;&gt;Internet&lt;/a&gt; and Web have done a good job of demonstrating that thus far :-) We&amp;#39;re all &lt;a href=&quot;http://dbpedia.org/resource/Data&quot;&gt;data&lt;/a&gt; nodes in a &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id5d8a3a8&quot;&gt;Giant Global Graph&lt;/a&gt;.&lt;/p&gt; &lt;p&gt; &lt;a href=&quot;http://myopenlink.net/dataspace/person/danieljohnlewis#this&quot; id=&quot;link-id17cac8a0&quot;&gt;Daniel lewis&lt;/a&gt; did shed light on the read-write aspects of the Linked Data &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id10be8590&quot;&gt;Web&lt;/a&gt;, which is actually very close to the callout for a Wikipedia for Data. &lt;a href=&quot;http://www.w3.org/People/Berners-Lee/card#i&quot; id=&quot;link-id10a810c0&quot;&gt;TimBL&lt;/a&gt; has been working on this via &lt;a href=&quot;http://dig.csail.mit.edu/2005/ajar/release/tabulator/0.8/tab.html&quot; id=&quot;link-id184b7108&quot;&gt;Tabulator&lt;/a&gt; (see &lt;a href=&quot;http://dig.csail.mit.edu/2007/tab/tutorial/editing.mov&quot; id=&quot;link-id1416f1e8&quot;&gt;Tabulator Editing Screencast&lt;/a&gt;), &lt;a href=&quot;http://bnode.org/about&quot; id=&quot;link-id17e33750&quot;&gt;Bengamin Nowack&lt;/a&gt; also added &lt;a href=&quot;http://arc.semsol.org/download/plugins/data_wiki&quot; id=&quot;link-id1688cc40&quot;&gt;similar functionality to ARC&lt;/a&gt;, and of course we support the same &lt;a href=&quot;http://dbpedia.org/resource/SPARQL&quot; id=&quot;link-id10bff7c8&quot;&gt;SPARQL&lt;/a&gt; UPDATE into an &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id168ace08&quot;&gt;RDF&lt;/a&gt; &lt;a href=&quot;http://dbpedia.org/resource/Information&quot; id=&quot;link-id10641878&quot;&gt;information&lt;/a&gt; resource via the &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id0xddb5240&quot;&gt;RDF&lt;/a&gt; Sink feature of our WebDAV and &lt;a href=&quot;http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/OdsBriefcase&quot; id=&quot;link-id0x11199310&quot;&gt;ODS&lt;/a&gt;-Briefcase implementations.&lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Kingsley Uyi Idehen</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="rdf" />
  <atom:category term="semanticweb" />
  <atom:category term="sparql" />
  <atom:category term="howto" />
  <atom:category term="screencast" />
  <atom:category term="history" />
  <atom:category term="ods" />
  <atom:category term="DataSpace" />
  <atom:category term="unified_storage" />
  <atom:updated>2008-05-05T20:06:42.4000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>In Perpetual Pursuit of Context</atom:title>
  <atom:id>http://www.openlinksw.com/blog/~kidehen/?id=1356</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/~kidehen/?id=1356" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1356/1" rel="edit" />
  <atom:published>2008-05-02T19:18:33Z</atom:published>
  <atom:content type="html">&lt;p&gt;I&amp;#39;ve always been of the opinion that concise value proposition articulation shouldn&amp;#39;t be the achilles of the &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id158efe90&quot;&gt;Semantic Web&lt;/a&gt;. As the &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id13a2db40&quot;&gt;Linked Data&lt;/a&gt; wave climbs up the &amp;quot;value Appreciation and Comprehension chain&amp;quot;, it&amp;#39;s getting clearer by the second that &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Context_%28language_use%29&quot; id=&quot;link-id109316f0&quot;&gt;Context&lt;/a&gt;&amp;quot; is a point of confluence for &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id108daa60&quot;&gt;Semantic Web&lt;/a&gt; Technologies and easy to comprehend value, from the perspectives of those outside the core community.&lt;/p&gt; &lt;p&gt;In today&amp;#39;s primarily Document centric &lt;a href=&quot;http://dbpedia.org/resource/World_Wide_Web&quot;&gt;Web&lt;/a&gt;, the pursuit of &lt;a href=&quot;http://dbpedia.org/resource/Context_%28language_use%29&quot; id=&quot;link-id14edadd0&quot;&gt;Context&lt;/a&gt; is akin to pursuing a mirage in a desert of user generated content. The quest is labor intensive, and you ultimaely end up without water at the end of the pursuit :-)&lt;/p&gt; &lt;p&gt;Listening to the &lt;a href=&quot;http://blogs.talis.com/nodalities/2008/05/christine-connors-talks-about-semantic-technologies-at-dow-jones.php&quot; id=&quot;link-id12d5e1c0&quot;&gt;Christine Connor&amp;#39;s podcast interview with Talis&lt;/a&gt; simply reinforces my strong belief that &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Context_%28language_use%29&quot; id=&quot;link-id0x1ec69518&quot;&gt;Context&lt;/a&gt;, Context, Context&amp;quot; is the &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id0xa279438&quot;&gt;Semantic Web&lt;/a&gt;&amp;#39;s equivalent of Real Estate&amp;#39;s &amp;quot;Location, Location, Location&amp;quot; (ignore the &lt;a href=&quot;http://dbpedia.org/resource/Subprime_lending&quot; id=&quot;link-id140b8098&quot;&gt;subprime&lt;/a&gt; loans mess for now). The critical thing to note is that you cannot unravel &amp;quot;Context&amp;quot; from existing Web content without incorporating powerful disambiguation technology into an &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Entity&quot; id=&quot;link-id15a2f380&quot;&gt;Entity&lt;/a&gt; Extraction&amp;quot; process. Of course, you cannot even consider seriously pursing any &lt;a href=&quot;http://dbpedia.org/resource/Entity&quot; id=&quot;link-id10868a18&quot;&gt;entity&lt;/a&gt; extraction and disambiguation endeavor without a lookup backbone that exposes &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Named_entity_recognition&quot; id=&quot;link-id168dc230&quot;&gt;Named Entities&lt;/a&gt;&amp;quot; and their relationships to &amp;quot;&lt;a href=&quot;http://dbpedia.org/resource/Topic&quot; id=&quot;link-id17cb1950&quot;&gt;Subject matter Concepts&lt;/a&gt;&amp;quot; (BTW - this is what &lt;a href=&quot;http://umbel.org/about/&quot; id=&quot;link-id14f406a0&quot;&gt;UMBEL&lt;/a&gt; is all about). Thus, when looking at the broad subject of the &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot;&gt;Semantic Web&lt;/a&gt;, we can also look at &amp;quot;Context&amp;quot; as the vital point of confluence for the &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id12d67e38&quot;&gt;Data&lt;/a&gt; oriented (&lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id14f8daf0&quot;&gt;Linked Data&lt;/a&gt;) and the &amp;quot;Linguistic Meaning&amp;quot; oriented perspectives.&lt;/p&gt; &lt;p&gt;I am even inclined to state publicly that &amp;quot;Context&amp;quot; may ultimately be the foundation for &lt;a href=&quot;http://www.openlinksw.com/weblog/public/search.vspx?blogid=127&amp;q=dimension%20web%204.0%20&amp;type=text&amp;output=html&quot; id=&quot;link-id17cb0708&quot;&gt;4th &amp;quot;Web Interaction Dimension&amp;quot;&lt;/a&gt; where practical use of &lt;a href=&quot;http://dbpedia.org/resource/Artificial_intelligence&quot; id=&quot;link-id10b15088&quot;&gt;AI&lt;/a&gt; leverages a &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id0x1ebf9310&quot;&gt;Linked Data&lt;/a&gt; &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id10b27018&quot;&gt;Web&lt;/a&gt; substrate en route to exposing new kinds of value :-)&lt;/p&gt; &lt;p&gt;&amp;quot;Context&amp;quot; may also be the focal point of concise value proposition articulation to &lt;a href=&quot;http://dbpedia.org/resource/Venture_Capital&quot; id=&quot;link-id10837578&quot;&gt;VCs&lt;/a&gt; as in: &amp;quot;My solution offers the ability to discover and exploit &amp;quot;Context&amp;quot; iteratively, at the rate of $X.XX per iteration, across a variety of market segments :-)&lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Kingsley Uyi Idehen</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="semanticweb" />
  <atom:category term="venture_capital" />
  <atom:updated>2008-05-03T15:07:32.000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>XTech Talks covering Linked Data</atom:title>
  <atom:id>http://www.openlinksw.com/blog/~kidehen/?id=1355</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/~kidehen/?id=1355" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1355/4" rel="edit" />
  <atom:published>2008-05-02T14:53:08Z</atom:published>
  <atom:content type="html">&lt;p&gt;Courtesy a post by &lt;a href=&quot;http://community.linkeddata.org/dataspace/person/bizer#this&quot; id=&quot;link-id10868548&quot;&gt;Chris Bizer&lt;/a&gt; to the &lt;a href=&quot;http://community.linkeddata.org/dataspace/organization/lod#this&quot; id=&quot;link-id15739748&quot;&gt;LOD&lt;/a&gt; community &lt;a href=&quot;http://lists.w3.org/Archives/Public/public-lod/&quot; id=&quot;link-id10fae0f8&quot;&gt;mailing list&lt;/a&gt;, here is a list of &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id140a0880&quot;&gt;Linked Data&lt;/a&gt; oriented talks at the upcoming &lt;a href=&quot;http://2008.xtech.org&quot; id=&quot;link-id12801f00&quot;&gt;XTech&lt;/a&gt; 2008 event (also see the &lt;a href=&quot;http://2008.xtech.org/public/schedule/grid&quot; id=&quot;link-id10f65940&quot;&gt;XTech 2008 Schedule&lt;/a&gt; which is &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id1258a4c8&quot;&gt;Linked Data&lt;/a&gt; friendly). Of course, I am posting this to my &lt;a href=&quot;http://dbpedia.org/resource/Blog&quot; id=&quot;link-id140a29c0&quot;&gt;Blog&lt;/a&gt; &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id12d5a640&quot;&gt;Data&lt;/a&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/Data_Spaces&quot; id=&quot;link-id10979b80&quot;&gt;Space&lt;/a&gt; with the sole purpose of adding &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id176be078&quot;&gt;data&lt;/a&gt; to the rapidly growing &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id1099aec8&quot;&gt;Giant Global Graph&lt;/a&gt; of &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id10d72d88&quot;&gt;Linked Data&lt;/a&gt;, basically adding to my collection of live &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id11014000&quot;&gt;Linked Data&lt;/a&gt; utility demos :-)&lt;/p&gt; &lt;p&gt;Here is the list:&lt;/p&gt; &lt;ol&gt; &lt;li&gt; &lt;a href=&quot;http://2008.xtech.org/public/schedule/detail/561&quot; id=&quot;link-id17df4d78&quot;&gt;Linked Data Deployment&lt;/a&gt; (&lt;a href=&quot;http://myopenlink.net/dataspace/person/danieljohnlewis#this&quot; id=&quot;link-id17c47d28&quot;&gt;Daniel Lewis&lt;/a&gt;, &lt;a href=&quot;http://www.openlinksw.com/dataspace/organization/openlink#this&quot; id=&quot;link-id108fce00&quot;&gt;OpenLink Software&lt;/a&gt;)&lt;/li&gt; &lt;li&gt; &lt;a href=&quot;http://2008.xtech.org/public/schedule/detail/524&quot; id=&quot;link-id1068c0e0&quot;&gt;The Programmes Ontology&lt;/a&gt; (Tom Scott, &lt;a href=&quot;http://dbpedia.org/resource/BBC&quot; id=&quot;link-id1566da50&quot;&gt;BBC&lt;/a&gt; and all) &lt;/li&gt; &lt;li&gt; &lt;a href=&quot;http://2008.xtech.org/public/schedule/detail/528&quot; id=&quot;link-id1072be40&quot;&gt;SemWebbing the London Gazette&lt;/a&gt; (Jeni Tennison, The Stationery Office) &lt;/li&gt; &lt;li&gt; &lt;a href=&quot;http://2008.xtech.org/public/schedule/detail/583&quot; id=&quot;link-id1099e4e0&quot;&gt;Searching, publishing and remixing a Web of Semantic Data&lt;/a&gt; (&lt;a href=&quot;http://community.linkeddata.org/dataspace/person/cygri#this&quot; id=&quot;link-id17e25b78&quot;&gt;Richard Cyganiak&lt;/a&gt;, DERI Galway) &lt;/li&gt; &lt;li&gt; &lt;a href=&quot;http://2008.xtech.org/public/schedule/detail/477&quot; id=&quot;link-idf9764c8&quot;&gt;Building a Semantic Web Search Engine: Challenges and Solutions&lt;/a&gt; (Aidan Hogan, DERI Galway) &lt;/li&gt; &lt;li&gt;&amp;#39;&lt;a href=&quot;http://2008.xtech.org/public/schedule/detail/550&quot; id=&quot;link-id140a3c50&quot;&gt;That&amp;#39;s not what you said yesterday!&lt;/a&gt;&amp;#39; - evolving your &lt;a href=&quot;http://dbpedia.org/resource/World_Wide_Web&quot;&gt;Web&lt;/a&gt; API (&lt;a href=&quot;http://iandavis.com/id/me&quot; id=&quot;link-id14f8d498&quot;&gt;Ian Davis&lt;/a&gt;, Talis) &lt;/li&gt; &lt;li&gt; &lt;a href=&quot;http://2008.xtech.org/public/schedule/detail/527&quot; id=&quot;link-id10c5a9c8&quot;&gt;Representing, indexing and mining scientific data using XML and RDF: Golem and CrystalEye&lt;/a&gt; (&lt;a href=&quot;http://wwmm.ch.cam.ac.uk/blogs/walkingshaw/&quot; id=&quot;link-id108c5e28&quot;&gt;Andrew Walkingshaw&lt;/a&gt;, &lt;a href=&quot;http://dbpedia.org/resource/University_of_Cambridge&quot; id=&quot;link-id10891560&quot;&gt;University of Cambridge&lt;/a&gt;)&lt;/li&gt; &lt;/ol&gt; &lt;p&gt;For the time challenged (i.e. those unable to view this post using it&amp;#39;s permalink / &lt;a href=&quot;http://dbpedia.org/resource/Uniform_Resource_Identifier&quot; id=&quot;link-id10db39f0&quot;&gt;URI&lt;/a&gt; as a &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id10f29bb8&quot;&gt;data&lt;/a&gt; source via the &lt;a href=&quot;http://demo.openlinksw.com/rdfbrowser&quot; id=&quot;link-id10f72778&quot;&gt;OpenLink RDF Browser&lt;/a&gt;, &lt;a href=&quot;http://zitgist.com/about/&quot; id=&quot;link-id107b73b0&quot;&gt;Zitgist&lt;/a&gt; &lt;a href=&quot;http://dataviewer.zitgist.com&quot; id=&quot;link-id1686d528&quot;&gt;Data Viewer&lt;/a&gt;, &lt;a href=&quot;http://www4.wiwiss.fu-berlin.de/rdf_browser&quot; id=&quot;link-id110479e8&quot;&gt;DISCO Hyperdata Browser&lt;/a&gt;, or &lt;a href=&quot;http://dig.csail.mit.edu/2005/ajar/release/tabulator/0.8/tab.html&quot; id=&quot;link-id140ba0e8&quot;&gt;Tabulator&lt;/a&gt;), the benefits of this post are as follows:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;automatic &lt;a href=&quot;http://dbpedia.org/resource/Uniform_Resource_Identifier&quot; id=&quot;link-id172d2fc8&quot;&gt;URI&lt;/a&gt; generation for all linked items in this post&lt;/li&gt; &lt;li&gt;automatic propagation of tags to &lt;a href=&quot;http://del.icio.us&quot; id=&quot;link-id10547380&quot;&gt;del&lt;/a&gt;.&lt;a href=&quot;http://del.icio.us&quot; id=&quot;link-id1093cc10&quot;&gt;icio&lt;/a&gt;.&lt;a href=&quot;http://del.icio.us&quot; id=&quot;link-id168ce3a0&quot;&gt;us&lt;/a&gt;, &lt;a href=&quot;http://www.technorati.com&quot; id=&quot;link-id17aa8af0&quot;&gt;Technorati&lt;/a&gt;, and &lt;a href=&quot;http://www.pingthesemanticweb.com/about/&quot; id=&quot;link-id10868ad8&quot;&gt;PingTheSemanticWeb&lt;/a&gt; &lt;/li&gt; &lt;li&gt;automatic association of formal meanings to my Tags using the &lt;a href=&quot;http://moat-project.org/ontology&quot; id=&quot;link-id10c98608&quot;&gt;MOAT Ontology&lt;/a&gt; &lt;/li&gt; &lt;li&gt;automatic collation and generation of statistical &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id10a4d1d8&quot;&gt;data&lt;/a&gt; about my tags using the SCOT Ontology (*missing link is a callout to SCOT &lt;a href=&quot;http://dbpedia.org/resource/Tag&quot; id=&quot;link-id168b7c10&quot;&gt;Tag&lt;/a&gt; Ontology folks to sort the project&amp;#39;s home page &lt;a href=&quot;http://dbpedia.org/resource/Uniform_Resource_Locator&quot; id=&quot;link-id11fd4118&quot;&gt;URL&lt;/a&gt; at the very least*) &lt;/li&gt; &lt;li&gt;explicit typing of my Tags as &lt;a href=&quot;http://dbpedia.org/resource/SKOS&quot; id=&quot;link-id10940eb8&quot;&gt;SKOS&lt;/a&gt; Concepts. &lt;/li&gt; &lt;/ul&gt; &lt;p&gt;Put differently, I cost-effectively contribute to the &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id10a081a8&quot;&gt;GGG&lt;/a&gt; across all &lt;a href=&quot;http://www.openlinksw.com/weblog/public/search.vspx?blogid=127&amp;q=web%20dimensions&amp;type=text&amp;output=html&quot; id=&quot;link-id10597530&quot;&gt;Web interaction dimensions&lt;/a&gt; (1.0, 2.0, 3.0) :-)&lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Kingsley Uyi Idehen</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="rdf" />
  <atom:category term="xml" />
  <atom:category term="semanticweb" />
  <atom:category term="skos" />
  <atom:category term="openlink" />
  <atom:category term="DataSpace" />
  <atom:updated>2008-05-05T17:07:17.000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>SPARQL at WWW 2008</atom:title>
  <atom:id>http://www.openlinksw.com/blog/vdb/blog/?id=1354</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/vdb/blog/?id=1354" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1354/2" rel="edit" />
  <atom:published>2008-04-30T16:28:10Z</atom:published>
  <atom:content type="html">&lt;div&gt; &lt;div style=&quot;display:none;&quot;&gt;SPARQL at WWW 2008&lt;/div&gt; &lt;p&gt;Andy Seaborne and Eric Prud&amp;#39;hommeaux, editors of the &lt;a href=&quot;http://dbpedia.org/resource/SPARQL&quot; id=&quot;link-id0x1501d1a0&quot;&gt;SPARQL&lt;/a&gt; recommendation, convened a SPARQL birds of a feather session at &lt;a href=&quot;http://www2008.org/&quot; id=&quot;link-id0xb9d6c10&quot;&gt;WWW 2008&lt;/a&gt;. The administrative outcome was that implementors could now experiment with extensions, hopefully keeping each other current about their efforts and that towards the end of 2008, a new W3C working group might begin formalizing the experiences into a new SPARQL spec.&lt;/p&gt; &lt;p&gt;The session drew a good crowd, including many users and developers. The wishes were largely as expected, with a few new ones added. Many of the wishes already had diverse implementations, however most often without interop. I will below give some comments on the main issues discussed.&lt;/p&gt; &lt;/div&gt; &lt;li&gt; &lt;p&gt; &lt;b&gt;SPARQL Update&lt;/b&gt; - This is likely the most universally agreed upon extension. Implementations exist, largely along the lines of Andy Seaborne&amp;#39;s SPARUL spec, which is also likely material for a W3C member submission. The issue is without much controversy; transactions fall outside the scope, which is reasonable enough. With triple stores, we can define things as combinations of inserts and deletes, and isolation we just leave aside. If anything, operating on a transactional platform such as &lt;a href=&quot;http://virtuoso.openlinksw.com&quot; id=&quot;link-id0xc13fe98&quot;&gt;Virtuoso&lt;/a&gt;, one wishes to disable transactions for any operations such as bulk loads and long-running inserts and deletes. Transactionality has pretty much no overhead for a few hundred rows, but for a few hundred million rows the cost of locking and rollback is prohibitive. With Virtuoso, we have a row auto-commit mode which we recommend for use with &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id0xd7bff00&quot;&gt;RDF&lt;/a&gt;: It commits by itself now and then, optionally keeping a roll forward log, and is transactional enough not to leave half triples around, i.e., inserted in one index but not another.&lt;/p&gt; &lt;p&gt;As far as we are concerned, updating physical triples along the SPARUL lines is pretty much a done deal.&lt;/p&gt; &lt;p&gt;The matter of updating relational &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0x140ea538&quot;&gt;data&lt;/a&gt; mapped to RDF is a whole other kettle of fish. On this, I should say that RDF has no special virtues for expressing transactions but rather has a special genius for integration. Updating is best left to web service interfaces that use &lt;a href=&quot;http://dbpedia.org/resource/SQL&quot; id=&quot;link-id0xa24e9558&quot;&gt;SQL&lt;/a&gt; on the inside. Anyway, updating union views, which most mappings will be, is complicated. Besides, for transactions, one usually knows exactly what one wishes to update.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt; &lt;b&gt;Full Text&lt;/b&gt; - Many people expressed a desire for full text access. Here we run into a deplorable confusion with regexps. The closest SPARQL has to full text in its native form is regexps, but these are not really mappable to full text except in rare special cases and I would despair of explaining to an end user what exactly these cases are. So, in principle, some regexps are equivalent to full text but in practice I find it much preferable to keep these entirely separate.&lt;/p&gt; &lt;p&gt;It was noted that what the users want is a text box for search words. This is a front end to the CONTAINS predicate of most SQL implementations. Ours is MS SQL Server compatible and has a SPARQL version called &lt;code&gt;bif:contains&lt;/code&gt;. One must still declare which triples one wants indexed for full text, though. This admin overhead seems inevitable, as text indexing is a large overhead and not needed by all applications.&lt;/p&gt; &lt;p&gt;Also, text hits are not boolean; usually they come with a hit score. Thus, a SPARQL extension for this could look like &lt;/p&gt; &lt;blockquote&gt; &lt;code&gt;select * where { ?thing has_description ?d . ?d ftcontains &amp;quot;gizmo&amp;quot; ftand &amp;quot;widget&amp;quot; score ?score . }&lt;/code&gt; &lt;/blockquote&gt; &lt;p&gt;This would return all the subjects, descriptions, and scores, from subjects with a has_description property containing widget and gizmo. Extending the basic pattern is better than having the match in a filter, since the match binds a variable.&lt;/p&gt; &lt;p&gt;The &lt;a href=&quot;http://dbpedia.org/resource/XQuery&quot; id=&quot;link-id0x9ddb7240&quot;&gt;XQuery&lt;/a&gt;/&lt;a href=&quot;http://dbpedia.org/resource/XPath&quot; id=&quot;link-id0x9d84e070&quot;&gt;XPath&lt;/a&gt; groups have recently come up with a full-text spec, so I used their style of syntax above. We already have a full-text extension, as do some others. but for standardization, it is probably most appropriate to take the XQuery work as a basis. The XQuery full-text spec is quite complex, but I would expect most uses to get by with a small subset, and the structure seems better thought out, at first glance, than the more ad-hoc implementations in diverse SQLs.&lt;/p&gt; &lt;p&gt;Again, declaring any text index to support the search, as well as its timeliness or transactionality, are best left to implementations.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt; &lt;b&gt;Federation&lt;/b&gt; - This is a tricky matter. ARQ has a SPARQL extension for sending a nested set of triple patterns to a specific end-point. The DARQ project has something more, including a selectivity model for SPARQL.&lt;/p&gt; &lt;p&gt;With federated SQL, life is simpler since after the views are expanded, we have a query where each table is at a known server and has more or less known statistics. Generally, execution plans where as much work as possible is pushed to the remote servers are preferred, and modeling the latencies is not overly hard. With SPARQL, each triple pattern could in principle come from any of the federated servers. Associating a specific end-point to a fragment of the query just passes the problem to the user. It is my guess that this is the best we can do without getting very elaborate, and possibly buggy, end-point content descriptions for routing federated queries.&lt;/p&gt; &lt;p&gt;Having said this, there remains the problem of join order. I suggested that we enhance the protocol by allowing asking an end-point for the query cost for a given SPARQL query. Since they all must have a cost model for optimization, this should not be an impossible request. A time cost and estimated cardinality would be enough. Making statistics available &lt;i&gt;à la&lt;/i&gt; DARQ was also discussed. Being able to declare cardinalities expected of a remote end-point is probably necessary anyway, since not all will implement the cost model interface. For standardization, agreeing on what is a proper description of content and cardinality and how fine grained this must be will be so difficult that I would not wait for it. A cost model interface would nicely hide this within the end-point itself.&lt;/p&gt; &lt;p&gt;With Virtuoso, we do not have a federated SPARQL scheme but we could have the ARQ-like service construct. We&amp;#39;d use our own cost model with explicit declarations of cardinalities of the remote data for guessing a join order. Still, this is a bit of work. We&amp;#39;ll see.&lt;/p&gt; &lt;p&gt;For practicality, the service construct coupled with join order hints is the best short term bet. Making this pretty enough for standardization is not self-evident, as it requires end-point description and/or cost model hooks for things to stay declarative.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt; &lt;b&gt;End-point description&lt;/b&gt; - This question has been around for a while; I have &lt;a href=&quot;http://www.openlinksw.com/weblog/oerling/?id=1085&quot; id=&quot;link-id10fa7da8&quot;&gt;blogged about it earlier&lt;/a&gt;, but we are not really at a point where there would be even rough consensus about an end-point ontology. We should probably do something on our own to demonstrate some application of this, as we host lots of &lt;a href=&quot;http://community.linkeddata.org/dataspace/organization/lod#this&quot; id=&quot;link-id0xd048c68&quot;&gt;linked open data&lt;/a&gt; sets.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt; &lt;b&gt;SQL equivalence&lt;/b&gt; - There were many requests for aggregation, some for subqueries and nesting, expressions in select, negation, existence and so on. I would call these all SQL equivalence. One use case was taking all the teams in the database and for all with over 5 members, add the big_team class and a property for member count.&lt;/p&gt; &lt;p&gt;With Virtuoso, we could write this as -- &lt;/p&gt; &lt;blockquote&gt; &lt;code&gt;construct { ?team a big_team . ?team member_count ?ct } from ... where {?team a team . { select ?team2 count (*) as ?ct where { ?m member_of ?team2 } . filter (?team = ?team2 and ? ct &amp;gt; 5) }}&lt;/code&gt; &lt;/blockquote&gt; &lt;p&gt;We have pretty much all the SQL equivalence features, as we have been working for some time at translating the &lt;a href=&quot;http://dbpedia.org/resource/TPC-H&quot; id=&quot;link-id0xb9d5200&quot;&gt;TPC-H&lt;/a&gt; workload into SPARQL.&lt;/p&gt; &lt;p&gt;The usefulness of these things is uncontested but standardization could be hard as there are subtle questions about variable scope and the like.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt; &lt;b&gt;Inference&lt;/b&gt; - The SPARQL spec does not deal with transitivity or such matters because it is assumed that these are handled by an underlying inference layer. This is however most often not so. There was interest in more fine grained control of inference, for example declaring that just one property in a query would be transitive or that subclasses should be taken into account in only one triple pattern. As far as I am concerned, this is very reasonable, and we even offer extensions for this sort of thing in Virtuoso&amp;#39;s SPARQL. This however only makes sense if the inference is done at query time and pattern by pattern. For instance, if forward chaining is used, this no longer makes sense. Specifying that some forward chaining ought to be done at query time is impractical, as the operation can be very large and time consuming and it is the DBA&amp;#39;s task to determine what should be stored and for how long, how changes should be propagated, and so on. All these are application dependent and standardizing will be difficult.&lt;/p&gt; &lt;p&gt;Support for RDF features like lists and bags would all fall into the functions an underlying inference layer should perform. These things are of special interest when querying OWL models, for example.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt; &lt;b&gt;Path expressions&lt;/b&gt; - Path expressions were requested by a few people. We have implemented some, as in &lt;/p&gt; &lt;blockquote&gt; &lt;code&gt;?product+?has_supplier+&amp;gt;s_name = &amp;quot;Gizmos, Inc.&amp;quot;.&lt;/code&gt; &lt;/blockquote&gt; This means that one supplier of product has name &amp;quot;Gizmo, Inc.&amp;quot;. This is a nice shorthand but we run into problems if we start supporting repetitive steps, optional steps, and the like.&lt;/li&gt; &lt;p&gt;In conclusion, update, full text, and basic counting and grouping would seem straightforward at this point. Nesting queries, value subqueries, views, and the like should not be too hard if an agreement is reached on scope rules. Inference and federation will probably need more experimentation but a lot can be had already with very simple fine grained control of backward chaining, if such applies, or with explicit end-point references and explicit join order. These are practical but not pretty enough for committee consensus, would be my guess. Anyway, it will be a few months before anything formal will happen.&lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Virtuso Data Space Bot</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="database" />
  <atom:category term="databases" />
  <atom:category term="rdf" />
  <atom:category term="xpath" />
  <atom:category term="xml" />
  <atom:category term="xquery" />
  <atom:category term="sql_server" />
  <atom:category term="semanticweb" />
  <atom:category term="sparql" />
  <atom:category term="virtuoso" />
  <atom:updated>2008-04-30T13:48:12.000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>SPARQL at WWW 2008</atom:title>
  <atom:id>http://www.openlinksw.com/weblog/oerling/?id=1353</atom:id>
  <atom:link href="http://www.openlinksw.com/weblog/oerling/?id=1353" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1353/1" rel="edit" />
  <atom:published>2008-04-30T15:59:15Z</atom:published>
  <atom:content type="html">&lt;div&gt; &lt;div&gt; Andy Seaborne and Eric Prud&amp;#39;hommeaux, editors of the &lt;a href=&quot;http://dbpedia.org/resource/SPARQL&quot; id=&quot;link-id10830dd8&quot;&gt;SPARQL&lt;/a&gt; recommendation, convened a &lt;a href=&quot;http://dbpedia.org/resource/SPARQL&quot; id=&quot;link-id0xa3028e8&quot;&gt;SPARQL&lt;/a&gt; birds of a feather session at &lt;a href=&quot;http://www2008.org/&quot; id=&quot;link-id16b82f18&quot;&gt;WWW 2008&lt;/a&gt;. The administrative outcome was that implementors could now experiement with extensions, hopefully keeping each other current about their efforts and that towards the end of 2008, a new W3C working group might begin formalizing the experiences into a new SPARQL spec. &lt;/div&gt; &lt;div&gt; The session drew a good crowd, including many users and developers. The wishes were largely as expected, with a few new ones added. Many of the wishes already had diverse implementations, however most often without interop. I will below give some comments on the main issues discussed. &lt;/div&gt; &lt;div&gt; - SPARQL Update - This is likely the most universally agreed upon extension. Implementations exist, largely along the lines of Andy Seaborne&amp;#39;s SPARUL spec, which is also likely material for a W3C member submission. The issue is without much controverse, transactions fall outside the scope, which is reasonable enough. With triple stores, we can define things as combinations of inserts and deletes and isolation we just leave aside. If anything, operating on a transactional platform such as &lt;a href=&quot;http://virtuoso.openlinksw.com&quot; id=&quot;link-id103979f0&quot;&gt;Virtuoso&lt;/a&gt;, one wishes to disable transactions for any operations such as bulk loads and long running inserts and deletes. Transactionality has pretty much no overhead for a few hundred rows but for a few hundred million rows the cost of locking and rollback is prohibitive. With &lt;a href=&quot;http://virtuoso.openlinksw.com&quot; id=&quot;link-id0x15b9ff70&quot;&gt;Virtuoso&lt;/a&gt;, we have a row autocommit mode which we recommend for use with &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id109a9d50&quot;&gt;RDF&lt;/a&gt;: It commits by itself now and then, optionally keeping a roll forward log and is transactional enough not to leave half triples around,i.e. inserted in one index but not another. &lt;/div&gt; &lt;div&gt; As far as we are concerned, updating physical triples along the SPARUL lines is pretty much a done deal. &lt;/div&gt; &lt;div&gt; The matter of updating relational &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id106f05e0&quot;&gt;data&lt;/a&gt; mapped to &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id0xa00f0a70&quot;&gt;RDF&lt;/a&gt; is a whole other kettle of fish. On this, I should say that RDF has no special virtues for expressing transactions but rather has a special genius for integration. Updating is best left to web service interfaces that use &lt;a href=&quot;http://dbpedia.org/resource/SQL&quot; id=&quot;link-id10aab0a8&quot;&gt;SQL&lt;/a&gt; on the inside. Anyway, updating union views, which most mappings will be, is complicated. Besides, for transactions, one usually knows exactly what one wishes to update. &lt;/div&gt; &lt;div&gt; Full Text - Many people expressed a desire for full text access. Here we run into a deplorable confusion with regexps. The closest SPARQL has to full text in its native form is regexps, but these are not really mappable to full text except in rare special cases and I would despair of explaining to an end user what exactly these cases are. So, in principle, some regexps are equivalent to full text but in practice I find it much preferrable to keep these entirely separate. &lt;/div&gt; &lt;div&gt; It was noted that what the users want is a text box for search words. This is a front end to the CONTAINS predicate of most &lt;a href=&quot;http://dbpedia.org/resource/SQL&quot; id=&quot;link-id0x14d45478&quot;&gt;SQL&lt;/a&gt; implementations. Ours is MS SQL Server compatible and has a SPARQL version called bif:contains. One must still declare which triples one wants indexed for full text, though. This admin overhead seems inevitable, as text indexing is a large overhead and not needed by all applications. &lt;/div&gt; &lt;div&gt; Also, text hits are not boolean, usually they come with a hit score. Thus, aa SPARQL extension for this could look like select * where { ?thing has_description ?d . ?d ftcontains &amp;quot;gizmo&amp;quot; ftand &amp;quot;widget&amp;quot; score ?score . } &lt;/div&gt; &lt;div&gt; This would return all the subjects, descriptions and scores from subjects with a has_description property containing widget and gizmo. Extending the basic pattern is better than having the match in a filter, since the match binds a variable. &lt;/div&gt; &lt;div&gt; The &lt;a href=&quot;http://dbpedia.org/resource/XQuery&quot; id=&quot;link-id106517a0&quot;&gt;XQuery&lt;/a&gt;/&lt;a href=&quot;http://dbpedia.org/resource/XPath&quot; id=&quot;link-id10d04ae0&quot;&gt;XPATH&lt;/a&gt; groups have recently come up with a full text spec, so I used their style of syntax above. We already have a full text extension, as do some others. but for standardization, it is probably most appropriate to take the &lt;a href=&quot;http://dbpedia.org/resource/XQuery&quot; id=&quot;link-id0xa27b3a98&quot;&gt;XQuery&lt;/a&gt; work as a basis. The XQuery full text spec is quite complex but I would expect most uses to get by with a small subset and the structure seems better thought out, at first glance, than the more ad hoc implementations in diverse SQL&amp;#39;s. &lt;/div&gt; &lt;div&gt; Again, declaring any text index to support the search, as well as its timeliness or transactionality, are best left to implementations. &lt;/div&gt; &lt;div&gt; Federation - This is a tricky matter. ARQ has a SPARQL extension for sending a nested set of triple patterns to a specific end point. The DARQ project has something more, including a selectivity model for SPARQL. &lt;/div&gt; &lt;div&gt; With federated SQL, life is simpler since after the views are expanded, we have a query where each table is at a known server and has more or less known statistics. Generally, execution plans where as much work as possible is pushed to the remote servers are preferred and modeling the latencies is not overly hard. With SPARQL, each triple pattern could in principle come from any of the federated servers. Associating a specific end point to a fragment of the query just passes the problem to the user. It is my guess that this is the best we can do without getting very elaborate, and possibly buggy, end point content descriptions for routing federated queries. &lt;/div&gt; &lt;div&gt; Having said this, there remains the problem of join order. I suggested that we enhance the protocol by allowing asking an end point for the query cost for a given SPARQL query. Since they all must have a cost model for optimization, this should not be an imposssible request. A time cost and estimated cardinality would be enough. Making statistics available a la DARQ was also discussed. Being able to declare cardinalities expected of a remote end point is probably necessary anyway, since not all will implement the cost model interface. For standardization, agreeing of what is a proper description of content and cardinality and how fine grained this must be will be so difficult that I would not wait for it. A cost model interface would nicely hide this within the end point itself. &lt;/div&gt; &lt;div&gt; With Virtuoso, we do not have a federated SPARQL scheme but we could have the ARQ-like service construct. We&amp;#39;d use our own cost model with explicit declarations of cardinalities of the remote &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0xa496840&quot;&gt;data&lt;/a&gt; for guessing a join order. Still, this is a bit of work. We&amp;#39;ll see. &lt;/div&gt; &lt;div&gt; For practicality, the service construct coupled with join order hints is the best short term bet. Making this pretty enough for standardization is not self-evident, as it requires end point description and/or cost model hooks for things to stay declarative. &lt;/div&gt; &lt;div&gt; - End point description - This question has been around for a while, I have blogged about it earlier but we are not really at a point where there would be even rough consensus about an end point ontology. We should probably do something on our own to demonstrate some application of this, as we host lots of &lt;a href=&quot;http://community.linkeddata.org/dataspace/organization/lod#this&quot; id=&quot;link-id10cdd138&quot;&gt;linked open data&lt;/a&gt; sets. &lt;/div&gt; &lt;div&gt; - SQL equivalence - There were many requests for aggregation, some for subqueries and nesting, expressions in select, negation, existence and so on. I would call these all SQL equivalence. One use case was taking all the teams in the database and for all with over 5 members, add the big_team class and a property for member count. &lt;/div&gt; &lt;div&gt; With Virtuoso, we could write this as &lt;/div&gt; &lt;pre&gt; construct { ?team a big_team . ?team member_count ?ct } from ... where {?team a team . { select ?team2 count (*) as ?ct where { ?m member_of ?team2 } . filter (?team = ?team2 and ? ct &amp;gt; 5) }} &lt;/pre&gt; &lt;div&gt; We have pretty much all the SQL equivalence features, as we have been working for some time at translating the &lt;a href=&quot;http://dbpedia.org/resource/TPC-H&quot; id=&quot;link-id13a7ad70&quot;&gt;TPC H&lt;/a&gt; workload into SPARQL. &lt;/div&gt; &lt;div&gt; The usefulness of these things is uncontested but standardization could be hard as there are subtle questions about variable scope and the like. &lt;/div&gt; &lt;div&gt; - Inference - The SPARQLL spec does not deal with transitivity or such matters because it is assumed that these are handled by an underlying inference layer. This is however most often not so. There was interest in more fine grained control of inference, for example declaring that just one property in a query would be transitive or that subclasses should be taken into account in only one triple pattern. As far as I am concerned, this is very reasonable and we even offer extensions for this sort of thing in Virtuoso&amp;#39;s SPARQL. This however only makes sense if the inference is done at query time and pattern by pattern. For instance, if forward chaining is used, this no longer makes sense. Specifying that some forward chaining ought to be done at query time is impractical, as the operation can be very large and time consuming and it is the dba&amp;#39;s task to determine what should be stored and for how long, how changes should be propagated and so on. All these are application dependent and standardizing will be difficult. &lt;/div&gt; &lt;div&gt; Support for RDF features like lists and bags would all fall under the functions an underlying inference layer should perform. These thiings are of special interest when querying OWL models, for example. &lt;/div&gt; &lt;div&gt; Path expressions - Path expressions were requested by a few people. We have implemented some, as in ?product+?has_supplier+&amp;gt;s_name = &amp;quot;Gizmos, Inc.&amp;quot;. This means that one supplier of product has name &amp;quot;Gizmo, Inc.&amp;quot;. This is a nice shorthand but we run into problems if we start supporting repetitive steps, optional steps and the like. &lt;/div&gt; &lt;div&gt; In conclusion, update, full text and basic counting and grouping would seem straightforward at this point. Nesting queries, value subqueries, views and the like should not be too hard if an agreement is reached on scope rules. Inference and federation will probably need more experimentation but a lot can be had already with very simple fine grained control of backward chaining, if such applies or with explicit end point refernces and explicit join order. These are practical butr not pretty enough for committee consensus, would be my guess. Anyway, it will be a few months before anything formal will happen. &lt;/div&gt; &lt;/div&gt;</atom:content>
  <atom:author>
    <atom:name>Orri Erling</atom:name>
    <atom:email>oerling@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="database" />
  <atom:category term="databases" />
  <atom:category term="rdf" />
  <atom:category term="xpath" />
  <atom:category term="xml" />
  <atom:category term="xquery" />
  <atom:category term="sql_server" />
  <atom:category term="semanticweb" />
  <atom:category term="sparql" />
  <atom:category term="virtuoso" />
  <atom:category term="dataspace" />
  <atom:updated>2008-04-30T12:28:09.12000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>Clearing Up RDF misrepresentation once again!</atom:title>
  <atom:id>http://www.openlinksw.com/blog/~kidehen/?id=1352</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/~kidehen/?id=1352" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1352/1" rel="edit" />
  <atom:published>2008-04-30T15:51:17Z</atom:published>
  <atom:content type="html">&lt;p&gt; &lt;a href=&quot;http://myopenlink.net/dataspace/person/danieljohnlewis#this&quot; id=&quot;link-id12d57690&quot;&gt;Daniel Lewis&lt;/a&gt; has penned a post titled: &lt;a href=&quot;http://vanirsystems.com/danielsblog/2008/04/30/clearing-up-some-misconceptions-again/&quot; id=&quot;link-id10c99f18&quot;&gt;Clearing up some misconceptions..again&lt;/a&gt;, in response to &lt;a href=&quot;http://elgg.org/bwerdmuller/foaf#elgg2&quot; id=&quot;link-id14fe1bc8&quot;&gt;Ben Werdmuller&lt;/a&gt;&amp;#39;s post titled: &lt;a href=&quot;http://blogs.zdnet.com/social/?p=477&quot; id=&quot;link-id141cee58&quot;&gt;Introducing the Open Data Definition&lt;/a&gt;. &lt;/p&gt; &lt;p&gt;The great thing about the &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id105991a8&quot;&gt;Linked Data&lt;/a&gt; &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id10a6ec78&quot;&gt;Web&lt;/a&gt; is that it&amp;#39;s much easier to discovery and respond to these points of view before the ink dries :-) Ben certainly needs to take a look at the &lt;a href=&quot;http://www.w3.org/RDF/FAQ&quot; id=&quot;link-id10f78958&quot;&gt;Semantic Web FAQ&lt;/a&gt; pre or post assimilation of Daniel&amp;#39;s response.&lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Kingsley Uyi Idehen</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="rdf" />
  <atom:category term="semanticweb" />
  <atom:category term="foaf" />
  <atom:category term="socialnetworking" />
  <atom:category term="DataSpace" />
  <atom:updated>2008-04-30T12:07:58.1000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>Linked Data enters state of Evoluation</atom:title>
  <atom:id>http://www.openlinksw.com/blog/~kidehen/?id=1351</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/~kidehen/?id=1351" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1351/1" rel="edit" />
  <atom:published>2008-04-29T19:56:14Z</atom:published>
  <atom:content type="html">&lt;p&gt;During a brief chat with &lt;a href=&quot;http://community.linkeddata.org/dataspace/person/mhausenblas#this&quot; id=&quot;link-idfeb0100&quot;&gt;Michael Hausenblas&lt;/a&gt; about a new &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id1049feb0&quot;&gt;Linked Data&lt;/a&gt; project he is championing called: &lt;a href=&quot;http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/LForum&quot; id=&quot;link-id16a857d8&quot;&gt;LForum&lt;/a&gt;, I made a freudian slip, in the form of the typo: &lt;strong&gt;Evoluation&lt;/strong&gt;, which at the time was supposed to have been: &lt;strong&gt;Evolution&lt;/strong&gt;. Anyway, we had a chuckle and realized we were on to something, so I proceeded to formalize the definition: &lt;/p&gt; &lt;blockquote&gt; &lt;cite&gt;Evoluation is evolution devoid of the randomness of mutation. A state of being in which it is possible to evaluate and choose evolutionary paths.&lt;/cite&gt; &lt;/blockquote&gt; &lt;p&gt; &lt;strong&gt;Evoluation&lt;/strong&gt; actually describes where we are today in relation to the &lt;a href=&quot;http://dbpedia.org/resource/World_Wide_Web&quot; id=&quot;link-id105c1518&quot;&gt;World Wide Web&lt;/a&gt;; to the &lt;a href=&quot;http://community.linkeddata.org/dataspace/organization/lod#this&quot; id=&quot;link-id103f9d00&quot;&gt;Linking Open Data community&lt;/a&gt; (&lt;a href=&quot;http://community.linkeddata.org/dataspace/organization/lod#this&quot; id=&quot;link-id1048c210&quot;&gt;LOD&lt;/a&gt;), it&amp;#39;s taking the path towards becoming a &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id104c3a20&quot;&gt;Giant Global Graph&lt;/a&gt; of &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id104968e0&quot;&gt;Linked Data&lt;/a&gt;; to the &lt;a href=&quot;http://dbpedia.org/resource/World_Wide_Web&quot;&gt;Web&lt;/a&gt; 2.0 community, it&amp;#39;s simply a collection of Web Services and associated APIs; and to many others, it remains an opaque collection of interlinked documents.&lt;/p&gt; &lt;p&gt;The great thing about the Web is that it allows netizens to explore a plethora of paths without adversely affecting the paths of others. That said, controlling one&amp;#39;s path may take mutation out of evolution, but we are still left with the requirement to adapt and eventually survive in a competitive environment. Thus, although we can evaluate and choose from the many paths the Web&amp;#39;s evolution offers us, the path that delivers the most benefits ultimately dominates. :-) &lt;/p&gt;</atom:content>
  <atom:author>
    <atom:name>Kingsley Uyi Idehen</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="webservices" />
  <atom:category term="web2.0" />
  <atom:category term="web20" />
  <atom:category term="semanticweb" />
  <atom:category term="DataSpace" />
  <atom:updated>2008-04-29T16:25:47.000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>Linked Data and Information Architecture</atom:title>
  <atom:id>http://www.openlinksw.com/blog/vdb/blog/?id=1350</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/vdb/blog/?id=1350" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1350/2" rel="edit" />
  <atom:published>2008-04-29T14:37:22Z</atom:published>
  <atom:content type="html">&lt;div&gt; &lt;div style=&quot;display:none;&quot;&gt;Linked Data and Information Architecture&lt;/div&gt; &lt;p&gt;We had a workshop on &lt;a href=&quot;http://community.linkeddata.org/dataspace/organization/lod#this&quot; id=&quot;link-id0x1437ac70&quot;&gt;Linked Open Data&lt;/a&gt; (&lt;a href=&quot;http://community.linkeddata.org/dataspace/organization/lod#this&quot; id=&quot;link-id0x1315f788&quot;&gt;LOD&lt;/a&gt;) last week in &lt;a href=&quot;http://www2008.org/&quot; id=&quot;link-id0x13737468&quot;&gt;Beijing&lt;/a&gt;. You can see the papers in &lt;a href=&quot;http://events.linkeddata.org/ldow2008/#program&quot; id=&quot;link-id10651ab8&quot;&gt;the program&lt;/a&gt;. The event was a success with plenty of good talks and animated conversation. I will not go into every paper here but will comment a little on the conversation and draw some technology requirements going forward.&lt;/p&gt; &lt;p&gt;Tim Berners-Lee showed a read-write version of &lt;a href=&quot;http://dig.csail.mit.edu/2005/ajar/release/tabulator/0.8/tab.html&quot; id=&quot;link-id0x15633520&quot;&gt;Tabulator&lt;/a&gt;. This raises the question of updating on the &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0x1350a178&quot;&gt;Data&lt;/a&gt; Web. The consensus was that one could assert what one wanted in one&amp;#39;s own space but that others&amp;#39; spaces would be read-only. What spaces one considered relevant would be the user&amp;#39;s or developer&amp;#39;s business, as in the document web.&lt;/p&gt; &lt;p&gt;It seems to me that a significant use case of LOD is an open-web situation where the user picks a broad read-only &amp;quot;data wallpaper&amp;quot; or backdrop of assertions, and then uses this combined with a much smaller, local, writable data set. This is certainly the case when editing data for publishing, as in Tim&amp;#39;s demo. This will also be the case when developing mesh-ups combining multiple distinct data sets bound together by sets of SameAs assertions, for example. Questions like, &amp;quot;What is the minimum subset of n data sets needed for deriving the result?&amp;quot; will be common. This will also be the case in applications using proprietary data combined with open data.&lt;/p&gt; &lt;p&gt;This means that databases will have to deal with queries that specify large lists of included graphs, all graphs in the store or all graphs with an exclusion list. All this is quite possible but again should be considered when architecting systems for an open &lt;a href=&quot;http://dbpedia.org/resource/Linked_Data&quot; id=&quot;link-id0xa27bae8&quot;&gt;linked data&lt;/a&gt; &lt;a href=&quot;http://dbpedia.org/resource/Giant_Global_Graph&quot; id=&quot;link-id0x155c3f18&quot;&gt;web&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&amp;quot;There is data but what can we really do with it? How far can we trust it, and what can we confidently decide based on it?&amp;quot;&lt;/p&gt; &lt;p&gt;As an answer to this question, &lt;a href=&quot;http://zitgist.com/about/&quot; id=&quot;link-id0xd447580&quot;&gt;Zitgist&lt;/a&gt; has compiled the &lt;a href=&quot;http://umbel.org/about/&quot; id=&quot;link-id0x14735008&quot;&gt;UMBEL&lt;/a&gt; taxonomy using &lt;a href=&quot;http://dbpedia.org/resource/SKOS&quot; id=&quot;link-id0x15ab1c48&quot;&gt;SKOS&lt;/a&gt;. This draws on Wikipedia, Open CYC, Wordnet, and &lt;a href=&quot;http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/&quot; id=&quot;link-id0x15d5aa88&quot;&gt;YAGO&lt;/a&gt;, hence the acronym WOWY. UMBEL is both a taxononmy and a set of instance data, containing a large set of &lt;a href=&quot;http://dbpedia.org/resource/Named_entity_recognition&quot; id=&quot;link-id0x9fe45d98&quot;&gt;named entities&lt;/a&gt;, including persons, organizations, geopolitical entities, and so forth. By extracting references to this set of named entities from documents and correlating this to the taxonomy, one gets a good idea of what a document (or part thereof) is about.&lt;/p&gt; &lt;p&gt;Kingsley presented this in the Zitgist demo. This is our answer to the criticism about &lt;a href=&quot;http://dbpedia.org/resource/DBpedia&quot; id=&quot;link-id0xa1920800&quot;&gt;DBpedia&lt;/a&gt; having errors in classification. DBpedia, as a bootstrap stage, is about giving names to all things. Subsequent efforts like UMBEL are about refining the relationships.&lt;/p&gt; &lt;p&gt;&amp;quot;Should there be a global &lt;a href=&quot;http://dbpedia.org/resource/Uniform_Resource_Identifier&quot; id=&quot;link-id0x12cd5290&quot;&gt;URI&lt;/a&gt; dictionary?&amp;quot;&lt;/p&gt; &lt;p&gt;There was a talk by Paolo Bouquet about &lt;a href=&quot;http://dbpedia.org/resource/Entity&quot; id=&quot;link-id0x12d03400&quot;&gt;Entity&lt;/a&gt; Name System, a a sort of data DNS, with the purpose of associating some description and rough classification to URIs. This would allow discovering URIs for reuse. I&amp;#39;d say that this is good if it can cut down on the SameAs proliferation and if this can be widely distributed and replicated for resilience, &lt;i&gt;à la&lt;/i&gt; DNS. On the other hand, it was pointed out that this was not quite in the LOD spirit, where parties would mint their own dereferenceable URIs, in their own domains. We&amp;#39;ll see.&lt;/p&gt; &lt;p&gt;&amp;quot;What to do when identity expires?&amp;quot;&lt;/p&gt; &lt;p&gt;Giovanni of Sindice said that a document should be removed from search if it was no longer available. Kingsley pointed out that resilience of reference requires some way to recover data. The data web cannot be less resilient than the document web, and there is a point to having access to history. He recommended hooking up with the &lt;a href=&quot;http://dbpedia.org/resource/Internet&quot; id=&quot;link-id0x143e4130&quot;&gt;Internet&lt;/a&gt; Archive, since they make long term persistence their business. In this way, if an application depends on data, and the URIs on which it depends are no longer dereferenceable or or provide content from a new owner of the domain, those who need the old version can still get it and host it themselves.&lt;/p&gt; &lt;p&gt;It is increasingly clear that OWL SameAs is both the blessing and bane of linked data. We can easily have tens of URIs for the same thing, especially with people. Still, these should be considered the same.&lt;/p&gt; &lt;p&gt;Returning every synonym in a query answer hardly makes sense but accepting them as input seems almost necessary. This is what we do with &lt;a href=&quot;http://virtuoso.openlinksw.com&quot; id=&quot;link-id0x15a2a930&quot;&gt;Virtuoso&lt;/a&gt;&amp;#39;s SameAs support. Even so, this can easily double query times even when there are no synonyms.&lt;/p&gt; &lt;p&gt;Be that as it may, SameAs is here to stay; just consider the mapping of DBpedia to Geonames, for example.&lt;/p&gt; &lt;p&gt;Also, making aberrant SameAs statements can completely poison a data set and lead to absurd query results. Hence choosing which SameAs assertions from which source will be considered seems necessary. In an open web scenario, this leads inevitably to multi-graph queries that can be complex to write with regular &lt;a href=&quot;http://dbpedia.org/resource/SPARQL&quot; id=&quot;link-id0x12bb8ce8&quot;&gt;SPARQL&lt;/a&gt;. By extension, it seems that a good query would also include the graphs actually used for deriving each result row. This is of course possible but has some implications on how databases should be organized.&lt;/p&gt; &lt;p&gt;Yves Raymond gave a talk about deriving identity between Musicbrainz and Jamendo. I see the issue as a core question of linked data in general. The algorithm Yves presented started with attribute value similarities and then followed related entities. Artists would be the same if they had similar names and similar names of albums with similar song titles, for example. We can find the same basic question in any analysis, for example, looking at how news reporting differs between media, supposing there is adequate entity extraction.&lt;/p&gt; &lt;p&gt;There is basic graph diffing in &lt;a href=&quot;http://data.semanticweb.org/conference/iswc-aswc/2007/tracks/research/papers/533/html&quot; id=&quot;link-id0x153c1fa8&quot;&gt;RDFSync&lt;/a&gt;, for example. But here we are expanding the context significantly. We will traverse references to some depth, allow similarity matches, SameAs, and so forth. Having presumed identity of two URIs, we can then look at the difference in their environment to produce a human readable summary. This could then be evaluated for purposes of analysis or of combining content.&lt;/p&gt; &lt;p&gt;At first sight, these algorithms seem well parallelizable, as long as all threads have access to all data. For scaling, this means a probably message-bound distributed algorithm. This is something to look into for the next stage of linked data.&lt;/p&gt; &lt;p&gt;Some inference is needed, but if everybody has their own choice of data sets to query, then everybody would also have their own entailed triples. This will make for an explosion of entailed graphs if forward chaining is used. Forward chaining is very nice because it keeps queries simple and easy to optimize. With Virtuoso, we still favor backward chaining since we expect a great diversity of graph combinations and near infinite volume in the open web scenario. With private repositories of slowly changing data put together for a special application, the situation is different.&lt;/p&gt; &lt;p&gt;In conclusion, we have a real LOD movement with actual momentum and a good idea of what to do next. The next step is promoting this to the broader community, starting with &lt;a href=&quot;http://www.linkeddataplanet.com/&quot; id=&quot;link-id0x155d1d00&quot;&gt;Linked Data Planet&lt;/a&gt; in New York in June.&lt;/p&gt; &lt;/div&gt;</atom:content>
  <atom:author>
    <atom:name>Virtuso Data Space Bot</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="database" />
  <atom:category term="databases" />
  <atom:category term="semanticweb" />
  <atom:category term="web30" />
  <atom:category term="skos" />
  <atom:category term="sparql" />
  <atom:category term="history" />
  <atom:category term="visionary" />
  <atom:category term="virtuoso" />
  <atom:updated>2008-04-29T17:18:21.48000-04:00</atom:updated>
 </atom:entry>
 <atom:entry>
  <atom:title>On Sem Web Search</atom:title>
  <atom:id>http://www.openlinksw.com/blog/vdb/blog/?id=1349</atom:id>
  <atom:link href="http://www.openlinksw.com/blog/vdb/blog/?id=1349" type="text/html" rel="alternate" />
  <atom:link href="http://www.openlinksw.com/GData/dav-blog-1/1349/2" rel="edit" />
  <atom:published>2008-04-29T14:37:21Z</atom:published>
  <atom:content type="html">&lt;div&gt; &lt;div style=&quot;display:none;&quot;&gt;On Sem Web Search&lt;/div&gt; &lt;p&gt;&amp;quot;I give the search keywords and you give me a &lt;a href=&quot;http://dbpedia.org/resource/SPARQL&quot; id=&quot;link-id0xdc0c248&quot;&gt;SPARQL&lt;/a&gt; end-point and a query that will get the &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0xbd54be0&quot;&gt;data&lt;/a&gt;.&amp;quot;&lt;/p&gt; &lt;p&gt;Thus did one SPARQL user describe the task of a semantic/data web search engine.&lt;/p&gt; &lt;p&gt;In &lt;a href=&quot;http://www.openlinksw.com/weblog/oerling/?id=1336&quot; id=&quot;link-idff98750&quot;&gt;a previous post&lt;/a&gt;, I suggested that if the data web were the size of the document web, we&amp;#39;d be looking at two orders of magnitude more search complexity. It just might be so.&lt;/p&gt; &lt;p&gt;In the conversation, I pointed out that a search engine might have a copy of everything and even a capability to do SPARQL and full text on it all, yet still the users would be better off doing the queries against the SPARQL end-points of the data publishers. It is a bit like the fact that not all web browsing runs off Google&amp;#39;s cache. With the data web, the point is even more pronounced, as serving a hit from Google&amp;#39;s cache is a small operation but a complex query might be a very large one.&lt;/p&gt; &lt;p&gt;Yet, the data web is about ad-hoc joining between data sets of different origins. Thus a search engine of the data web ought to be capable of joining also, even if large queries ought to be run against individual publishers&amp;#39; end-points or the user&amp;#39;s own data warehouse.&lt;/p&gt; &lt;p&gt;For ranking, the general consensus was that no single hit-ranking would be good for the data web. Thus word frequency-based hit-scores are OK for text hits but more is not obvious. I would think that some link analysis could apply but this will take some more experimentation.&lt;/p&gt; &lt;p&gt;For search summaries, if we have splitting of data sets into small fragments &lt;i&gt;à la&lt;/i&gt; Sindice, search summaries are pretty much the same as with just text search. If we store triples, then we can give text style summaries of text hits in literals and Fresnel lens views of the structured data around the literal. For showing a page of hits, the lenses must abbreviate heavily but this is still feasible. The engine would know about the most common ontologies and summarize instance data accordingly.&lt;/p&gt; &lt;p&gt;Chris Bizer pointed out that trust and provenance are critical, especially if an answer is arrived at by joining multiple data sets. The trust of the conclusion is no greater than that of the weakest participating document. Different users will have different trusted sources.&lt;/p&gt; &lt;p&gt;A mature data web search engine would combine a provenance/trust specification, a search condition consisting of SPARQL or full text or both, and a specification for hit rank. Again, most searches would use defaults, but these three components should in principle be orthogonally specifiable.&lt;/p&gt; &lt;p&gt;Many places may host the same data set either for download or SPARQL access. The &lt;a href=&quot;http://dbpedia.org/resource/Uniform_Resource_Identifier&quot; id=&quot;link-id0xac4d6c8&quot;&gt;URI&lt;/a&gt; of the data set is not its &lt;a href=&quot;http://dbpedia.org/resource/Uniform_Resource_Locator&quot; id=&quot;link-id0xdd50478&quot;&gt;URL&lt;/a&gt;. Different places may further host multiple data sets on one end-point. Thus the search engine ought to return all end-points where the set is to be found. The end-points themselves ought to be able to say what data sets they contain, under what graph IRIs. Since there is no consensus about end-point self description, this too would be left to the search engine. In practice, this could be accomplished by extending Sindice&amp;#39;s semantic site map specification. A possible query would be to find an end-point containing a set of named data sets. If none were found, the search engine itself could run a query joining all the sets since it at least would hold them all.&lt;/p&gt; &lt;p&gt;Since many places will host sets like Wordnet or Uniprot, indexing these once for each copy hardly makes sense. Thus a site should identify its data by the data set&amp;#39;s URI and not the copy&amp;#39;s URL.&lt;/p&gt; &lt;p&gt;It came up in the discussion that search engines should share a ping format so that a single message format would be enough to notify any engine about data being updated. This is already partly the case with Sindice and &lt;a href=&quot;http://www.pingthesemanticweb.com/&quot; id=&quot;link-id0xb0c8078&quot;&gt;PTSW&lt;/a&gt; (Ping The &lt;a href=&quot;http://dbpedia.org/resource/Semantic_Web&quot; id=&quot;link-id0xc28f9f0&quot;&gt;Semantic Web&lt;/a&gt;) sharing a ping format. &lt;/p&gt; &lt;p&gt;Further, since it is no trouble to publish a copy of the 45G Uniprot file but a fair amount of work to index it, search engines should be smart about processing requests to index things, since these can amount to a denial of service attack. &lt;/p&gt; &lt;p&gt;Probably very large data sets should be indexed only in the form supplied by their publisher, and others hosting copies would just state that they hold a copy. If the claim to the copy proved false, users could complain and the search engine administrator would remove the listing. It seems that some manual curating cannot be avoided here. &lt;/p&gt; &lt;h2&gt;On Data Web Search Business Model&lt;/h2&gt; &lt;p&gt;It seems there can be an overlap between the data web search and the data web hosting businesses. For example, Talis rents space for hosting &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id0xbececa0&quot;&gt;RDF&lt;/a&gt; data with SPARQL access. A search engine should offer basic indexing of everything for free, but could charge either data publishers or end users for running SPARQL queries across data sets. These do not have the nicely anticipatable and fairly uniform resource consumption of text lookups. In this manner, a search provider could cost-justify the capacity for allowing arbitrary queries. &lt;/p&gt; &lt;p&gt;The value of the data web consists of unexpected joining. Such joining takes place most efficiently if the sources are at least in some proximity, for example in the same data center. Thus the search provider could monetize functioning as the database provider for mesh-ups. In the document web, publishing pages is very simple and there is no great benefit from co-locating search and pages, rather the opposite. For the data web, the hosting with SPARQL and all is more complex and resembles providing search. Thus providing search can combine with providing SPARQL hosting, once we accept in principle that search should have arbitrary inter-document joining, even if it is at an extra premium.&lt;/p&gt; &lt;p&gt;The present search business model is advertising. If the data web is to be accessed by automated agents such as mesh-up code, display of ads is not self-evident. This is quite separate from the fact that semantics can lead to better ad targeting.&lt;/p&gt; &lt;p&gt;One model would be to do text lookups for free from a regular web page but show ads, just a la Google search ads. Using the service via web services for text or SPARQL would have a cost paid by the searching or publishing party and would not be financed by advertising.&lt;/p&gt; &lt;p&gt;In the case of data used in value-add data products (mesh-ups) that have financial value to their users, the original publisher of the data could even be paid for keeping the data up-to-date. This would hold for any time-sensitive feeds like news or financial feeds. Thus the hosting/search provider would be a broker of data-use fees and the data producer would be in the position of an AdSense inventory owner, i.e., a web site which shows AdSense ads. Organizing this under a hub providing back-office functions similar to an ad network could make sense even if the actual processing were divided among many sites.&lt;/p&gt; &lt;p&gt;Kingsley has repeatedly formulated the core value proposition of the semantic web in terms of dealing with &lt;a href=&quot;http://dbpedia.org/resource/Information&quot; id=&quot;link-id0xa951148&quot;&gt;information&lt;/a&gt; overload: There is the real-time enterprise and the real-time individual and both are beasts of perception. Their image is won and lost in the Internet online conversation space. We know that allegations, even if later proven false, will stick if left unchallenged. The function of semantics on the web is to allow one to track and manage where one stands. In fact, Garlic has made a business of just this, but now from a privacy and security angle. Garlic&amp;#39;s Data Patrol harvests data from diverse sources and allows assessing vulnerability to identity theft, for example.&lt;/p&gt; &lt;p&gt;If one is in the business of collating all the structured data in the world, as a data web search engine is, then providing custom alerts for both security or public image management is quite natural. This can be a very valuable service if it works well.&lt;/p&gt; &lt;p&gt;At OpenLink, we will now experiment with the Sindice/&lt;a href=&quot;http://zitgist.com/about/&quot; id=&quot;link-id0x14606ec8&quot;&gt;Zitgist&lt;/a&gt;/&lt;a href=&quot;http://www.pingthesemanticweb.com/&quot; id=&quot;link-id0xa08174e0&quot;&gt;PingTheSemanticWeb&lt;/a&gt; content. This is a regular part of the productization of &lt;a href=&quot;http://virtuoso.openlinksw.com&quot; id=&quot;link-id0xa127710&quot;&gt;Virtuoso&lt;/a&gt;&amp;#39;s cluster edition. We expect to release some results in the next 4 weeks. &lt;/p&gt; &lt;/div&gt;</atom:content>
  <atom:author>
    <atom:name>Virtuso Data Space Bot</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
   </atom:author>
  <atom:category term="database" />
  <atom:category term="databases" />
  <atom:category term="infomania" />
  <atom:category term="webservices" />
  <atom:category term="rdf" />
  <atom:category term="semanticweb" />
  <atom:category term="web30" />
  <atom:category term="sparql" />
  <atom:category term="openlink" />
  <atom:category term="virtuoso" />
  <atom:updated>2008-04-29T16:06:09.000-04:00</atom:updated>
 </atom:entry>
</atom:feed>