<?xml version="1.0" encoding="UTF-8" ?>
<!--ATOM based XML document generated By OpenLink Virtuoso-->
<atom:feed xmlns:atom="http://purl.org/atom/ns#" version="0.3">
<atom:title>OpenLink Virtuoso (Product Blog)</atom:title>
<atom:link href="http://www.openlinksw.com/GData/136/1588/4" type="text/html" rel="alternate" />
<atom:modified>2012-02-11T04:29:06Z</atom:modified>
 <atom:author>
  <atom:name>Virtuso Data Space Bot</atom:name>
  <atom:email>kidehen@openlinksw.com</atom:email>
 </atom:author>
<atom:subtitle>About</atom:subtitle>
 <atom:entry>
  <atom:content type="text/html" mode="escaped">&lt;p&gt;We have just added a geometry &lt;a href=&quot;http://dbpedia.org/resource/Data&quot; id=&quot;link-id0x1c0e02b0&quot;&gt;data&lt;/a&gt; type and corresponding &lt;a href=&quot;http://dbpedia.org/resource/R-tree&quot; id=&quot;link-id0x1e093220&quot;&gt;R&lt;/a&gt;-tree index to &lt;a href=&quot;http://virtuoso.openlinksw.com&quot; id=&quot;link-id0x1ddccfe8&quot;&gt;Virtuoso&lt;/a&gt;. This follows the general scheme of &lt;a href=&quot;http://dbpedia.org/resource/SQL&quot; id=&quot;link-id0x1b88a580&quot;&gt;SQL&lt;/a&gt;/MM, as is implemented by &lt;a href=&quot;http://dbpedia.org/resource/PostGIS&quot; id=&quot;link-id0x1d271a90&quot;&gt;PostGIS&lt;/a&gt; and many others. We have all the engine-side stuff, including optimizer support for geometry cardinality sampling and good execution plans for combinations of spatial and other joins. We have however not yet implemented all the different geometry types and library function support for them, like shortest distance between two arbitrary shapes.&lt;/p&gt; &lt;p&gt;The geometry support is for both SQL and &lt;a href=&quot;http://dbpedia.org/resource/SPARQL&quot; id=&quot;link-id0x1b8d4ca8&quot;&gt;SPARQL&lt;/a&gt;. On the SQL side, it works with the ISO/IEC 13249 SQL/MM API; with &lt;a href=&quot;http://dbpedia.org/resource/Resource_Description_Framework&quot; id=&quot;link-id0x1ed69318&quot;&gt;RDF&lt;/a&gt;, a geometry can occur as the object of a quad. If the object is a typed-literal of the &lt;code&gt;virtrdf:Geometry&lt;/code&gt; type, it gets indexed in a geometry index over all geometries in quads; no special declarations are needed. After this, SQL MM predicates and functions can be used with SPARQL, like this:&lt;/p&gt; &lt;blockquote&gt; &lt;pre&gt;&lt;code&gt; PREFIX geo: &amp;lt;&lt;a href=&quot;http://dbpedia.org/resource/Hypertext_Transfer_Protocol&quot; id=&quot;link-id0x1d2d0ae0&quot;&gt;http&lt;/a&gt;://www.w3.org/2003/01/geo/wgs84_pos#&amp;gt; SELECT ?class COUNT (*) WHERE { ?m geo:geometry ?geo . ?m a ?class . FILTER ( &amp;lt;bif:st_intersects&amp;gt; ( ?geo, &amp;lt;bif:st_point&amp;gt; (0, 52), 100 ) ) } GROUP BY ?class ORDER BY DESC 2 &lt;/code&gt; &lt;/pre&gt;&lt;/blockquote&gt; &lt;p&gt;This returns the counts of objects of each class occurring within 100 km of (0, 52), a point near London.&lt;/p&gt; &lt;p&gt;For any data set with &lt;a href=&quot;http://dbpedia.org/resource/World_Geodetic_System&quot; id=&quot;link-id0x1ec00578&quot;&gt;WGS 84&lt;/a&gt; &lt;code&gt;geo:long&lt;/code&gt; and &lt;code&gt;geo:lat&lt;/code&gt; values, a simple SQL function makes a point geometry for each such coordinate pair and adds it as the &lt;code&gt;geo:geometry&lt;/code&gt; property of the subject with the long/lat. This then enables fast spatial access to arbitrary location data in RDF.&lt;/p&gt; &lt;p&gt;Right now, we hardly see any geometries other than points in RDF data, even though there are some efforts for vocabularies for more complex entities. As these get adopted we will support them.&lt;/p&gt; &lt;p&gt;For scalability, we tried the implementation with &lt;a href=&quot;http://www.openstreetmap.org/&quot; id=&quot;link-id0x1c781e68&quot;&gt;OpenStreetMap&lt;/a&gt;&amp;#39;s 350 million or so points. The geometry implementation partitions well over a cluster, similarly to a full text index, i.e., every server has its slice of the geometries, partitioned by the geometry object&amp;#39;s key, thus not by range of coordinates or such. Like this, the items are evenly spread even though the coordinate distribution is highly uneven.&lt;/p&gt; &lt;p&gt;We can do spatial joins like —&lt;/p&gt; &lt;blockquote&gt; &lt;pre&gt;&lt;code&gt; SELECT ?s ( &amp;lt;sql:num_or_null&amp;gt; (?p) ) COUNT (*) WHERE { ?s &amp;lt;http://&lt;a href=&quot;http://dbpedia.org/resource/DBpedia&quot; id=&quot;link-id0x1f885868&quot;&gt;dbpedia&lt;/a&gt;.org/ontology/populationTotal&amp;gt; ?p . FILTER ( &amp;lt;sql:num_or_null&amp;gt; (?p) &amp;gt; 1000000 ) . ?s geo:geometry ?geo . FILTER ( &amp;lt;bif:st_intersects&amp;gt; ( ?pt, ?geo, 5 ) ) . ?xx geo:geometry ?pt } GROUP BY ?s ( &amp;lt;sql:num_or_null&amp;gt; (?p) ) ORDER BY DESC 3 LIMIT 20 &lt;/code&gt; &lt;/pre&gt;&lt;/blockquote&gt; &lt;p&gt;This takes the DBpedia subjects that have a population over 1 million and a geometry. We then count all the geometries within 5 km of the point location of the first geometry. With DBpedia (about 5 million points), &lt;a href=&quot;http://www.geonames.org/&quot; id=&quot;link-id0x1d4279b0&quot;&gt;GeoNames&lt;/a&gt; (7 million points), and OpenStreetMap (350 million points), we get the result:&lt;/p&gt; &lt;blockquote&gt; &lt;pre&gt;&lt;code&gt;http://dbpedia.org/resource/Munich 1356594 117280 http://dbpedia.org/resource/London 7355400 81486 http://dbpedia.org/resource/Davao_City 1363337 58640 http://dbpedia.org/resource/Belo_Horizonte 2412937 58640 http://dbpedia.org/resource/Chengde 3610000 58640 http://dbpedia.org/resource/Hamburg 1769117 51664 http://dbpedia.org/resource/San_Diego%2C_California 1266731 47685 http://dbpedia.org/resource/Bursa 1562828 47685 http://dbpedia.org/resource/Port-au-Prince 1082800 47685 http://dbpedia.org/resource/Oakland_County%2C_Michigan 1194156 45636 http://dbpedia.org/resource/Sana%27a 1747627 40923 http://dbpedia.org/resource/Milan 1303437 40923 http://dbpedia.org/resource/Campinas 1059420 40923 http://dbpedia.org/resource/Hohhot 2580000 40923 http://dbpedia.org/resource/Brussels 1031215 40923 http://dbpedia.org/resource/Bogra_District 2988567 40923 http://dbpedia.org/resource/Cort%C3%A9s_Department 1202510 40923 http://dbpedia.org/resource/Berlin 3416300 35668 http://dbpedia.org/resource/New_York_City 8274527 30810 http://dbpedia.org/resource/Los_Angeles%2C_California 3849378 25614&lt;br /&gt; 20 Rows. -- 1733 msec.&lt;br /&gt; Cluster 8 nodes, 1 s. 358 m/s 1596 KB/s 664% &lt;a href=&quot;http://dbpedia.org/resource/Central_processing_unit&quot; id=&quot;link-id0x1e6403b0&quot;&gt;cpu&lt;/a&gt; 2% read 16% clw threads 1r 0w 0i buffers 1124351 0 d 0 w 0 pfs &lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt; &lt;p&gt;This takes 1.7 seconds on a Virtuoso Cluster configured with 8 processes on a single dual-Xeon 5520 box, running at about 664% CPU with warm &lt;a href=&quot;http://dbpedia.org/resource/Cache&quot; id=&quot;link-id0x1e81f610&quot;&gt;cache&lt;/a&gt;. Fair enough for a first crack, this can obviously be optimized further. Still, the geo part of the processing is already as good as instantaneous.&lt;/p&gt; &lt;p&gt;We will shortly have the geography features installed on DBpedia and the other data sets we host. As these come online we will show more demo queries.&lt;/p&gt; &lt;p&gt;For more about SQL/MM, you can look to a couple of PDFs:&lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;a href=&quot;http://www.fer.hr/_download/repository/SQLMM_Spatial-_The_Standard_to_Manage_Spatial_Data_in_Relational_Database_Systems.pdf&quot; id=&quot;link-id133775f0&quot;&gt;SQL/MM Spatial: The Standard to Manage Spatial Data in Relational Database Systems&lt;/a&gt; by Knut Stolze&lt;/li&gt; &lt;li&gt; &lt;a href=&quot;http://www.sigmod.org/record/issues/0112/standards.pdf&quot; id=&quot;link-id1433c5e0&quot;&gt;SQL Multimedia and Application Packages (SQL/MM)&lt;/a&gt; by Jim Melton and Andrew Eisenberg&lt;/li&gt; &lt;/ul&gt;</atom:content>
  <atom:title>RDF Geography With Virtuoso</atom:title>
  <atom:link href="http://www.openlinksw.comhttp://www.openlinksw.com/dataspace/vdb/weblog/vdb%27s%20BLOG%20%5B136%5D/1588" type="text/html" rel="alternate" />
  <atom:created>2010-02-01T14:14:29Z</atom:created>
  <atom:issued>2010-02-01T14:14:29Z</atom:issued>
  <atom:modified>2010-02-01T09:14:29.000012-05:00</atom:modified>
  <atom:author>
    <atom:name>Virtuso Data Space Bot</atom:name>
    <atom:email>kidehen@openlinksw.com</atom:email>
  </atom:author>
 </atom:entry>
</atom:feed>
