A simple guide usable by any Perl developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.
# # Demonstrating use of a single query to populate a # Virtuoso Quad Store via Perl. # # # HTTP URL is constructed accordingly with CSV query results format as the default via mime type. # use CGI qw/:standard/; use LWP::UserAgent; use Data::Dumper; use Text::CSV_XS; sub sparqlQuery(@args) { my $query=shift; my $baseURL=shift; my $format=shift; %params=( "default-graph" => "", "should-sponge" => "soft", "query" => $query, "debug" => "on", "timeout" => "", "format" => $format, "save" => "display", "fname" => "" ); @fragments=(); foreach $k (keys %params) { $fragment="$k=".CGI::escape($params{$k}); push(@fragments,$fragment); } $query=join("&", @fragments); $sparqlURL="${baseURL}?$query"; my $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); my $req = HTTP::Request->new(GET => $sparqlURL); my $res = $ua->request($req); $str=$res->content; $csv = Text::CSV_XS->new(); foreach $line ( split(/^/, $str) ) { $csv->parse($line); @bits=$csv->fields(); push(@rows, [ @bits ] ); } return \@rows; } # Setting Data Source Name (DSN) $dsn="http://dbpedia.org/resource/DBpedia"; # Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET using the IRI in # FROM clause as Data Source URL en route to DBMS # record Inserts. $query="DEFINE get:soft \"replace\"\n # Generic (non Virtuoso specific SPARQL # Note: this will not add records to the # DBMS SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; $data=sparqlQuery($query, "http://localhost:8890/sparql/", "text/csv"); print "Retrieved data:\n"; print Dumper($data);
Retrieved data: $VAR1 = [ [ 's', 'p', 'o' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2002/07/owl#Thing' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://dbpedia.org/ontology/Work' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://dbpedia.org/class/yago/Software106566077' ], ...
CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Perl developer that already knows how to use Perl for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Javascript developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.
/* Demonstrating use of a single query to populate a # Virtuoso Quad Store via Javascript. */ /* HTTP URL is constructed accordingly with JSON query results format as the default via mime type. */ function sparqlQuery(query, baseURL, format) { if(!format) format="application/json"; var params={ "default-graph": "", "should-sponge": "soft", "query": query, "debug": "on", "timeout": "", "format": format, "save": "display", "fname": "" }; var querypart=""; for(var k in params) { querypart+=k+"="+encodeURIComponent(params[k])+"&"; } var queryURL=baseURL + '?' + querypart; if (window.XMLHttpRequest) { xmlhttp=new XMLHttpRequest(); } else { xmlhttp=new ActiveXObject("Microsoft.XMLHTTP"); } xmlhttp.open("GET",queryURL,false); xmlhttp.send(); return JSON.parse(xmlhttp.responseText); } /* setting Data Source Name (DSN) */ var dsn="http://dbpedia.org/resource/DBpedia"; /* Virtuoso pragma "DEFINE get:soft "replace" instructs Virtuoso SPARQL engine to perform an HTTP GET using the IRI in FROM clause as Data Source URL with regards to DBMS record inserts */ var query="DEFINE get:soft \"replace\"\nSELECT DISTINCT * FROM <"+dsn+"> WHERE {?s ?p ?o}"; var data=sparqlQuery(query, "/sparql/");
Place the snippet above into the <script/> section of an HTML document to see the query result.
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Javascript developer that already knows how to use Javascript for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any PHP developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. PHP.
#!/usr/bin/env php <?php # # Demonstrating use of a single query to populate a # Virtuoso Quad Store via PHP. # # HTTP URL is constructed accordingly with JSON query results format in mind. function sparqlQuery($query, $baseURL, $format="application/json") { $params=array( "default-graph" => "", "should-sponge" => "soft", "query" => $query, "debug" => "on", "timeout" => "", "format" => $format, "save" => "display", "fname" => "" ); $querypart="?"; foreach($params as $name => $value) { $querypart=$querypart . $name . '=' . urlencode($value) . "&"; } $sparqlURL=$baseURL . $querypart; return json_decode(file_get_contents($sparqlURL)); }; # Setting Data Source Name (DSN) $dsn="http://dbpedia.org/resource/DBpedia"; #Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET #using the IRI in FROM clause as Data Source URL $query="DEFINE get:soft \"replace\" SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; $data=sparqlQuery($query, "http://localhost:8890/sparql/"); print "Retrieved data:\n" . json_encode($data); ?>
Retrieved data: {"head": {"link":[],"vars":["s","p","o"]}, "results": {"distinct":false,"ordered":true, "bindings":[ {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/www.w3.org\/2002\/07\/owl#Thing"}}, {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/dbpedia.org\/ontology\/Work"}}, {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/dbpedia.org\/class\/yago\/Software106566077"}}, ...
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a PHP developer that already knows how to use PHP for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Python developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Python.
#!/usr/bin/env python # # Demonstrating use of a single query to populate a # Virtuoso Quad Store via Python. # import urllib, json # HTTP URL is constructed accordingly with JSON query results format in mind. def sparqlQuery(query, baseURL, format="application/json"): params={ "default-graph": "", "should-sponge": "soft", "query": query, "debug": "on", "timeout": "", "format": format, "save": "display", "fname": "" } querypart=urllib.urlencode(params) response = urllib.urlopen(baseURL,querypart).read() return json.loads(response) # Setting Data Source Name (DSN) dsn="http://dbpedia.org/resource/DBpedia" # Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET # using the IRI in FROM clause as Data Source URL query="""DEFINE get:soft "replace" SELECT DISTINCT * FROM <%s> WHERE {?s ?p ?o}""" % dsn data=sparqlQuery(query, "http://localhost:8890/sparql/") print "Retrieved data:\n" + json.dumps(data, sort_keys=True, indent=4) # # End
Retrieved data: { "head": { "link": [], "vars": [ "s", "p", "o" ] }, "results": { "bindings": [ { "o": { "type": "uri", "value": "http://www.w3.org/2002/07/owl#Thing" }, "p": { "type": "uri", "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" }, "s": { "type": "uri", "value": "http://dbpedia.org/resource/DBpedia" } }, ...
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Python developer that already knows how to use Python for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Ruby developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Ruby.
#!/usr/bin/env ruby # # Demonstrating use of a single query to populate a # Virtuoso Quad Store. # require 'net/http' require 'cgi' require 'csv' # # We opt for CSV based output since handling this format is straightforward in Ruby, by default. # HTTP URL is constructed accordingly with CSV as query results format in mind. def sparqlQuery(query, baseURL, format="text/csv") params={ "default-graph" => "", "should-sponge" => "soft", "query" => query, "debug" => "on", "timeout" => "", "format" => format, "save" => "display", "fname" => "" } querypart="" params.each { |k,v| querypart+="#{k}=#{CGI.escape(v)}&" } sparqlURL=baseURL+"?#{querypart}" response = Net::HTTP.get_response(URI.parse(sparqlURL)) return CSV::parse(response.body) end # Setting Data Source Name (DSN) dsn="http://dbpedia.org/resource/DBpedia" #Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET #using the IRI in FROM clause as Data Source URL query="DEFINE get:soft \"replace\" SELECT DISTINCT * FROM <#{dsn}> WHERE {?s ?p ?o} " #Assume use of local installation of Virtuoso #otherwise you can change URL to that of a public endpoint #for example DBpedia: http://dbpedia.org/sparql data=sparqlQuery(query, "http://localhost:8890/sparql/") puts "Got data:" p data # # End
Got data: [["s", "p", "o"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://www.w3.org/2002/07/owl#Thing"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://dbpedia.org/ontology/Work"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://dbpedia.org/class/yago/Software106566077"], ...
CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Ruby developer that already knows how to use Ruby for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
The problems typically take the following form:
To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.
Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.
Many options allow you to easily and quickly generate RDF data from other data sources:
Install the Faceted Browser VAD package (fct_dav.vad
) which delivers the following:
Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --
http://<cname>[:<port>]/describe/?uri=<entity-uri>
<cname>[:<port>]
gets replaced by the host and port of your Virtuoso instance<entity-uri>
gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
How Does Linked Data Address This Problem? It provides critical infrastructure for the WebID Protocol that enables an innovative tweak of SSL/TLS.
What about OpenID? The WebID Protocol embraces and extends OpenID (in an open and positive way) via the WebID + OpenID Hybrid variant of the protocol -- basic effect is that OpenID calls are re-routed to the WebID aspect which simply removes Username and Password Authentication from the authentication challenge interaction pattern.
Anyway, Socialtext and Mike 2.0 (they aren't identical and juxtaposition isn't seeking to imply this) provide nice demonstrations of socially enhanced collaboration for individuals and/or enterprises is all about:
As is typically the case in this emerging realm, the critical issue of discrete "identifiers" (record keys in sense) for data items, data containers, and data creators (individuals and groups) is overlooked albeit unintentionally.
Rather than using platform constrained identifiers such as:
It enables you to leverage the platform independence of HTTP scheme Identifiers (Generic URIs) such that Identifiers for:
simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked Data Objects endowed with High SDQ (Serendipitious Discovery Quotient). For example my Personal WebID is all anyone needs to know if they want to explore:
Even when you reach a point of equilibrium where: your daily activities trigger orchestratestration of CRUD (Create, Read, Update, Delete) operations against Linked Data Objects within your socially enhanced collaboration network, you still have to deal with the thorny issues of security, that includes the following:
FOAF+SSL, an application of HTTP based Linked Data, enables you to enhance your Personal HTTP scheme based Identifer (or WebID) via the following steps (peformed by a FOAF+SSL compliant platform):
Contrary to conventional experiences with all things PKI (Public Key Infrastructure) related, FOAF+SSL compliant platforms typically handle the PKI issues as part of the protocol implementation; thereby protecting you from any administrative tedium without compromising security.
Understanding how new technology innovations address long standing problems, or understanding how new solutions inadvertently fail to address old problems, provides time tested mechanisms for product selection and value proposition comprehension that ultimately save scarce resources such as time and money.
If you want to understand real world problem solution #1 with regards to HTTP based Linked Data look no further than the issues of secure, socially aware, and platform independent identifiers for data objects, that build bridges across erstwhile data silos.
If you want to cost-effectively experience what I've outlined in this post, take a look at OpenLink Data Spaces (ODS) which is a distributed collaboration engine (enterprise of individual) built around the Virtuoso database engines. It simply enhances existing collaboration tools via the following capabilities:
Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items (if missing)
In this post I provide a brief re-introduction to this essential aspect of Virtuoso.
This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).
In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.
The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!
You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).
You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).
It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.
As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.
2009 is over. Yeah, sure, trueg, we know that, it has been over for a while now! Ok, ok, I am a bit late, but still I would like to get this one out - if only for my archive. So here goes.
Letâs start with the major topic of 2009 (and also the beginning of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the same problems: you either used the sesame2 backend which depends on Java and steals all of your memory or you were stuck with Redland which had the worst performance and missed some SPARQL features making important parts of Nepomuk like queries unusable. So more than a year ago I had the idea to use the one GPLâed database server out there that supported RDF in a professional manner: OpenLinkâs Virtuoso. It has all the features we need, has a very good performance, and scales up to dimensions we will probably never reach on the desktop (yeah, right, and 64k main memory will be enough forever!). So very early I started coding the necessary Soprano plugin which would talk to a locally running Virtuoso server through ODBC. But since I ran into tons of small problems (as always) and got sidetracked by other tasks I did not finish it right away. OpenLink, however, was very interested in the idea of their server being part of every KDE installation (why wouldnât they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable for the desktop but also helped in debugging all the problems that I had left. Many test runs, patches, and a Virtuoso 5.0.12 release later I could finally announce the Virtuoso integration as usable.
Then end of last year I dropped the support for sesame2 and redland. Virtuoso is now the only supported database backend. The reason is simple: Virtuoso is way more powerful than the rest - not only in terms of performance - and it is fully implemented in C(++) without any traces of Java. Maybe even more important is the integration of the full text index which makes the previously used CLucene index unnecessary. Thus, we can finally combine full text and graph queries in one SPARQL query. This results in a cleaner API and way faster return of search results since there is no need to combine the results from several queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss later.
So now the only thing I am waiting for is the first bugfix release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make 6.0.0 fail with Nepomuk. Should be out any day now. :)
Querying data in Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the very limited capabilities of the ResourceManager to list resources with certain properties or of a certain type; or 2. Write your own SPARQL query using ugly QString::arg replacements.
With the introduction of Virtuoso and its awesome power we can now do pretty much everything in one query. This allowed me to finally create a query API for KDE: Nepomuk::Query::Query and friends. I wonât go into much detail here since I did that before.
All in all you should remember one thing: whenever you think about writing your own SPARQL query in a KDE application - have a look at libnepomukquery. It is very likely that you can avoid the hassle of debugging a query by using the query API.
The first nice effect of the new API (apart from me using it all over the place obviously) is the new query interface in Dolphin. Internally it simply combines a bunch of Nepomuk::Query::Term objects into a Nepomuk::Query::AndTerm. All very readable and no ugly query strings.
An important part of the Nepomuk research project was the creation of a set of ontologies for describing desktop resources and their metadata. After the Xesam project under the umbrella of freedesktop.org had been convinced to use RDF for describing file metadata they developed their own ontology. Thanks to Evgeny (phreedom) Egorochkin and Antonie Mylka both the Xesam ontology and the Nepomuk Information Elements Ontology were already very close in design. Thus, it was relatively easy to merge the two and be left with only one ontology to support. Since then not only KDE but also Strigi and Tracker are using the Nepomuk ontologies.
At the Gran Canaria Desktop Summit I met some of the guys from Tracker and we tried to come up with a plan to create a joint project to maintain the ontologies. This got off to a rough start as nobody really felt responsible. So I simply took the initiative and released the shared-desktop-ontologies version 0.1 in November 2009. The result was a s***-load of hate-mails and bug reports due to me breaking KDE build. But in the end it was worth it. Now the package is established and other projects can start to pick it up to create data compatible to the Nepomuk system and Tracker.
Today the ontologies (and the shared-desktop-ontologies package) are maintained in the Oscaf project at Sourceforge. The situation is far from perfect but it is a good start. If you need specific properties in the ontologies or are thinking about creating one for your own application - come and join us in the bug trackerâ¦
It was at the Akonadi meeting that Will Stephenson and myself got into talking about mimicking some Zeitgeist functionality through Nepomuk. Basically it meant gathering some data when opening and when saving files. We quickly came up with a hacky patch for KIO and KFileDialog which covered most cases and allowed us to track when a file was modified and by which application. This little experiment did not leave that state though (it will, however, this year) but another one did: Zeitgeist also provides a fuse filesystem which allows to browse the files by modification dates. Well, whatever fuse can do, KIO can do as well. Introducing the timeline:/ KIO slave which gives a calendar view onto your files.
Well, I thought I would mention the Tips And Tricks section I wrote for the techbase. It might not be a big deal but I think it contains some valuable information in case you are using Nepomuk as a developer.
This time around I had the privilege to mentor two students in the Google Summer of Code. Alessandro Sivieri and Adam Kidder did outstanding work on Improved Virtual Folders and the Smart File Dialog.
Adamâs work lead me to some heavy improvements in the Nepomuk KIO slaves myself which I only finished this week (more details on that coming up). Alessandro continued his work on faceted file browsing in KDE and created:
Alessandro is following up on his work to make faceted file browsing a reality in 2010 (and KDE SC 4.5). Since it was too late to get faceted browsing into KDE SC 4.4 he is working on Sembrowser, a stand-alone faceted file browser which will be the grounds for experiments until the code is merged into Dolphin.
In 2009 I organized the first Nepomuk workshop in Freiburg, Germany. And also the second one. While I reported properly on the first one I still owe a summary for the second one. I will get around to that - sooner or later. ;)
Soprano gives us a nice command line tool to create a C++ namespace from an ontology file: onto2vocabularyclass. It produces nice convenience namespaces like Soprano::Vocabulary::NAO. Nepomuk adds another tool named nepomuk-rcgen. Both were a bit clumsy to use before. Now we have nice cmake macros which make it very simple to use both.
See the techbase article on how to use the new macros.
Without my knowledge (imagine that!) Andrew Lake created an amazing new media player named Bangarang - a Jamaican word for noise, chaos or disorder. This player is Nepomuk-enabled in the sense that it has a media library which lets you browse your media files based on the Nepomuk data. It remembers the number of times a song or a video has been played and when it was played last. It allows to add detail such as the TV series name, season, episode number, or actors that are in the video - all through Nepomuk (I hope we will soon get tvdb integration).
I am especially excited about this since finally applications not written or mentored by me start contributing Nepomuk data.
2009 was also the year of the first Gnome-KDE joint-conference. Let me make a bulletin for completeness and refer to my previous blog post reporting on my experiences on the island.
Well, that was by far not all I did in 2009 but I think I covered most of the important topics. And after all it is âjust a blog entryâ - there is no need for completeness. Thanks for reading.
Like Apache, Virtuoso is a bona-fide Web Application Server for PHP based applications. Unlike Apache, Virtuoso is also the following:
As result of the above, when you deploy a PHP application using Virtuoso, you inherit the following benefits:
As indicated in prior posts, producing RDF Linked Data from the existing Web, where a lot of content is deployed by PHP based content managers, should simply come down to RDF Views over the SQL Schemas and deployment / publishing of the RDF Views in RDF Linked data form. In a nutshell, this is what Virtuoso delivers via its PHP runtime hosting and pre packaged VADs (Virtuoso Application Distribution packages), for popular PHP based applications such as: phpBB3, Drupal, WordPress, and MediaWiki.
In addition, to the RDF Linked Data deployment, we've also taken the traditional LAMP installation tedium out of the typical PHP application deployment process. For instance, you don't have to rebuild PHP 3.5 (32 or 64 Bit) on Windows, Mac OS X, or Linux to get going, simply install Virtuoso, and then select a VAD package for the relevant application and you're set. If the application of choice isn't pre packaged by us, simply install as you would when using Apache, which comes dow to situating the PHP files in your Web structure under the Web Application's root directory.
At the current time, I've only provided links to ZIP files containing the Virtuoso installation "silent movies". This approach is a short-term solution to some of my current movie publishing challenges re. YouTube and Vimeo -- where the compressed output hasn't been of acceptable visual quality. Once resolved, I will publish much more "Multimedia Web" friendly movies :-)
]]>The significant problems we face cannot be solved at the same level of thinking we were at when we created them.
This quote also applies to the current global financial mess because the essence of this crisis remains inextricably linked to dependency on outdated "closed world" systems.
We have a global human network that depends on systems driven by, and confined to, data silos! Every time you hear a CEO, Government Official, work colleague, neighbor, sibling, or relative tell you they didn't see it coming, just remember:
There won't be a depression because we can't afford one. Just like we couldn't afford to continue with the manner in which our systems work today. Unlike the '30s, we all know that there are no absolute safe havens right now, we have enough information at our disposal to eventually understand (post panic) that stuffing the mattress isn't an option (even government bonds won't cut it, ditto money market accounts).
Take a deep breadth and tell traditional media to "shut up". As per usual, the traditional mass media wants to have it both ways by stoking the panic and maxing out on the frenzy with reckless abandon (as per usual). If there is a time to appreciate the blogosphere and quality journalism etc.. It's now.
Anyway, as the saying goes: "It's always darkest before dawn", and as bizarre as this may sound in some quarters, things will ultimately change for the better. It just so happened that a really big cane was required in order for us to change our dysfunctional ways :-(
I recently wrote a post about "zero based cognition" that sought to bring attention to the power of "Human Thought" in relation to value creation.
Innovative creation and dissemination of value is how we will eventually get out of the current mess (as we've done in the past). The predictability of the aforementioned reality is significantly increased by the sheer link density and resulting "network effects" potential of the Internet and World Wide Web. Our ability to "connect the dots" as part of our value creation, dissemination, and consumption processing pipelines is what will ultimately separate the winners from the losers (individuals, enterprises, nations).
Ubiquity from Mozilla Labs, provides an alternative entry point for experiencing the "Controller" aspect of the Web's natural compatibility with the MVC development pattern. As I've noted (in various posts) Web Services, as practiced by the REST oriented Web 2.0 community or SOAP oriented SOA community within the enterprise, is fundamentally about the ("Controller" aspect of MVC.
Ubiquity provides a commandline interface for direct invocation of Web Services. For instance, in our case, we can expose the Virtuoso's in-built RDF Middleware ("Sponger") and Linked Data deployment services via a single command of the form: describe-resource <url>
To experience this neat addition to Firefox you need to do the following:
Enjoy!
]]>Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk's coordinator, explains, "The basic problem that we all face nowadays is how to handle vast amounts of information at a sensible rate." According to Bernardi, Nepomuk takes a traditional approach by creating a meta-data layer with well-defined elements that services can be built upon to create and manipulate the information.
The comment above echoes my sentiments about the imminence of "information overload" due to the vast amounts of user generated content on the Internet as a whole. We are going to need to process more an more data within a fixed 24 hour timeframe, while attempting to balance our professional and personal lives. Be rest assured, this is a very serious issue, and you cannot event begin to address it without a Web of Linked Data.
"The first idea of building the semantic desktop arose from the fact that one of our colleagues could not remember the girlfriends of his friends," Bernard says, more than half-seriously. "Because they kept changing -- you know how it is. The point is, you have a vast amount of information on your desktop, hidden in files, hidden in emails, hidden in the names and structures of your folders. Nepomuk gives a standard way to handle such information."
If you get a personal URI for Entity "You", via a Linked Data aware platform (e.g. OpenLink Data Spaces) that virtualizes data across your existing Web data spaces (blogs, feed subscriptions, wikis, shared bookmarks, photo galleries, calendars, etc.), you then only have to remember your URI whenever you need to "Find" something, imagine that!
To conclude, "information overload" is the imminent challenge of our time, and the keys to challenge alleviation lie in our ability to construct and maintain (via solutions) few context lenses (URIs) that provide coherent conduits into the dense mesh of structured Linked Data on the Web.
]]>Of course, I also believe that Linked Data serves Web Data Integration across the Internet very well too, and the fact that it will be beneficial to businesses in a big way. No individual or organization is an island, I think the Internet and Web have done a good job of demonstrating that thus far :-) We're all data nodes in a Giant Global Graph.
Daniel lewis did shed light on the read-write aspects of the Linked Data Web, which is actually very close to the callout for a Wikipedia for Data. TimBL has been working on this via Tabulator (see Tabulator Editing Screencast), Bengamin Nowack also added similar functionality to ARC, and of course we support the same SPARQL UPDATE into an RDF information resource via the RDF Sink feature of our WebDAV and ODS-Briefcase implementations.
]]>The list is nice, but actual execution can be challenging. For instance, when writing a blog post, or constructing a WikiWord, would you have enough disposable time to go searching for these URIs? Or would you compromise and continue to inject "Literal" values into the Web, leaving it to the reasoning endowed human reader to connect the dots?
Anyway, OpenLink Data Spaces is now equipped with a Glossary system that allows me to manage terms, meaning of terms, and hyper-linking of phrases and words matching associated with my terms. The great thing about all of this is that everything I do is scoped to my Data Space (my universe of discourse), I don't break or impede the other meanings of these terms outside my Data Space. The Glossary system can be shared with anyone I choose to share it with, and even better, it makes my upstreaming (rules based replication) style of blogging even more productive :-)
Remember, on the Linked Data Web, who you know doesn't matter as much as what your are connected to, directly or indirectly. Jason Kolb covers this issue in his post: People as Data Connectors, and so doesFrederick Giasson via a recent post titled: Networks are everywhere. For instance, this blog post (or the entire Blog) is a bona fide RDF Linked Data Source, you can use it as the Data Source of a SPARQL Query to find things that aren't even mentioned in this post, since all you are doing is beaming a query through my Data Space (a container of Linked Data Graphs). On that note, let's re-watch Jon Udell's "On-Demand-Blogosphere" screencast from 2006 :-)
]]>Once you grasp the concept of entering values into the "Default Data Source URI field", take a look at: http://programmableweb.com and other URIs (hint: scroll through the results grid to the QEDWiki demo item)
]]>What I am demonstrating is how existing Web Content hooks transperently into the "Data Web". Zero RDF Tax :-) Everything is good!
Note: Please look to the bottom of the screen for the "Run Query" Button. Remember, it not quite Grandma's UI but should do for Infonauts etc.. A screencast will follow.
]]>The Semantic Web is about granular exposure of the underlying web-of-data that fuels the World Wide Web. It models "Web Data" using a Directed Graph Data Model (back-to-the-future: Network Model Database) called RDF.
In line with contemporary database technology thinking, the Semantic Web also seeks to expose Web Data to architects, developers, and users via a concrete Conceptual Layer that is defined using RDF Schema.
The abstract nature of Conceptual Models implies that actual instance data (Entities, Attributes, and Relationships/Associations) occurs by way of "Logical to Conceptual" schema mapping and data generation that can involve a myriad of logical data sources (SQL, XML, Object databases, traditional web content, RSS/Atom feeds etc.). Thus, by implication, it is safe assume that the Semantic Web's construction is basically a Data Integration and exposure effort. The point that Stefano alludes to in the blog post excerpts that follow:
The semantic web is really just data integration at a global scale. Some of this data might end up being consistent, detailed and small enough to perform symbolic reasoning on, but even if this is the case, that would be such a small, expensive and fragile island of knowledge that it would have the same impact on the world as calculus had on deciding to invade Iraq.
The biggest problem we face right now is a way to 'link' information that comes from different sources that can scale to hundreds of millions of statements (and hundreds of thousands of equivalences). Equivalences and subclasses are the only things that we have ever needed of OWL and RDFS, we want to 'connect' dots that otherwise would be unconnected. We want to suggest people to use whatever ontology pleases them and then think of just mapping it against existing ones later. This is easier to bootstrap than to force them to agree on a conceptualization before they even know how to start!
Additional insightful material from Stefano:
Benjamin Nowack also chimes into this conversation via his simple guide to understanding Data, Information, and Knowledge in relation so the Semantic Web.
]]>SPARQL (query language for the Semantic Web) basically enables me to query a collection of typed links (predicates/properties/attributes) in my Data Space (ODS based of course) without breaking my existing local bookmarks database or the one I maintain at del.icio.us.
I am also demonstrating how Web 2.0 concepts such as Tagging mesh nicely with the more formal concepts of Topics in the Semantic Web realm. The key to all of this is the ability to generate RDF Data Model Instance Data based on Shared Ontologies such as SIOC (from DERI's SIOC Project) and SKOS (again showing that Ontologies and Folksonomies are complimentary).
This demo also shows that Ajax also works well in the Semantic Web realm (or web dimension of interaction 3.0) especially when you have a toolkit with Data Aware controls (for SQL, RDF, and XML) such as OAT (OpenLink Ajax Toolkit). For instance, we've successfully used this to build a Visual Query Building Tool for SPARQL (alpha) that really takes a lot of the pain out of constructing SPARQL Queries (there is much more to come on this front re. handling of DISTINCT, FILTER, ORDER BY etc..).
For now, take a look at the SPARQL Query dump generated by this SIOC & SKOS SPARQL QBE Canvas Screenshot.
You can cut and paste the queries that follow into the Query Builder or use the screenshot to build your variation of this query sample. Alternatively, you can simply click on *This* SPARQL Protocol URL to see the query results in a basic HTML Table. And one last thing, you can grab the SPARQL Query File saved into my ODS-Briefcase (the WebDAV repository aspect of my Data Space).
Note the following SPARQL Protocol Endpoints:
My beautified Version of the SPARQL Generated by QBE (you can cut and paste into "Advanced Query" section of QBE) is presented below:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX dct: <http://purl.org/dc/elements/1.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT distinct ?forum_name, ?owner, ?post, ?title, ?link, ?url, ?tag FROM <http://myopenlink.net/dataspace> WHERE { ?forum a sioc:Forum; sioc:type "bookmark"; sioc:id ?forum_name; sioc:has_member ?owner. ?owner sioc:id "kidehen". ?forum sioc:container_of ?post . ?post dct:title ?title . optional { ?post sioc:link ?link } optional { ?post sioc:links_to ?url } optional { ?post sioc:topic ?topic. ?topic a skos:Concept; skos:prefLabel ?tag}. }
Unmodified dump from the QBE (this will be beautified automatically in due course by the QBE):
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX dct: <http://purl.org/dc/elements/1.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?var8 ?var9 ?var13 ?var14 ?var24 ?var27 ?var29 ?var54 ?var56 WHERE { graph ?graph { ?var8 rdf:type sioc:Forum . ?var8 sioc:container_of ?var9 . ?var8 sioc:type "bookmark" . ?var8 sioc:id ?var54 . ?var8 sioc:has_member ?var56 . ?var9 rdf:type sioc:Post . OPTIONAL {?var9 dc:title ?var13} . OPTIONAL {?var9 sioc:links_to ?var14} . OPTIONAL {?var9 sioc:link ?var29} . ?var9 sioc:has_creator ?var37 . OPTIONAL {?var9 sioc:topic ?var24} . ?var24 rdf:type skos:Concept . OPTIONAL {?var24 skos:prefLabel ?var27} . ?var56 rdf:type sioc:User . ?var56 sioc:id "kidehen" . } }
Current missing items re. Visual QBE for SPARQL are:
Quick Query Builder Tip: You will need to import the following (using the Import Button in the Ontologies & Schemas side-bar);
Browser Support: The SPARQL QBE is SVG based and currently works fine with the following browsers; Firefox 1.5/2.0, Camino (Cocoa variant of Firefox for Mac OS X), Webkit (Safari pre-release / advanced sibling), Opera 9.x. We are evaluating the use of the Adobe SVG plugin re. IE 6/7 support.
Of course this should be a screencast, but I am the middle of a plethora of things right now :-)
]]>GeoRSS & Geonames for Philanthropy: "
I heard about Kiva.ORG in a BusinessWeek podcast. After visiting its website, I think there are few places where GeoRSS (in the RDF/A syntax) and Geonames can be used to enhance the siteâs functionality.
Itâs a microfinance website for people in the developing countries. Its business model is in the intersection between peer-to-peer financing and philanthropy. The goal is to help developing country businesses to borrow small loans from a large group of Web users, so that they can avoid paying high interests to the banks.
For example, a person in Uganda can request a $500 loan and use it for buying and selling more poultry. One or more lenders (anyone on the Web) may decide to grant loans to that person in increments as tiny as $25. After few years, that person will pay back the loans to the lenders.
I went to the website and discovered the site has a relative weak search and browsing interface. In particular, there is no way to group loan requests based on geographical locations (e.g., countries, cities and regions).
Took a look at individual loan pages. Each page actually has standard ways to describe location information â e.g., Location: Mbale, Uganda.
It should be relative easy to add GeoRSS points (in the RDF/A syntax) to describe these location information (an alternative maybe using Microformat Geo or W3C Geo). Once the location information is annotated, one can imagine building a map mashup to display loan requests in a geospatial perspective. One can also build search engines to support spatial queries such as âfind me all loans with from Mbaleâ.
Since Kiva.ORG webmasters may not be GIS experts, it will be nice if we can find ways to automatically geocode location information and describe that using GeoRSS. This automatic geocoding procedure can be developed using Geonamesâs webservices. Take a string âMbaleâ or âUgandaâ, and send to Geonamesâs search service. The procedure will get back JSON or XML description of the location, which include latitude and longitude. This will then be used to annotate the location information in a Kiva loan page.
Can you think of other ways to help Kiva.ORG to become more âgeospatially intelligentâ?
You can learn more about Kiva.ORG at its website and listen to this podcast.
The screencasts covered the following functionality realms:
To bring additional clarity to the screencasts demos and OAT in general, I have saved a number of documents that are the by products of activities in the screenvcasts:
Notes:
You can see a full collection of saved documents at the following locations:
]]>Two graphs that explain most IT dysfunction (Part I): "
Inspired by reading about other peopleâs blogging weaknesses, Iâve decided to finally get this one off the back burner and post it. Iâm pretty sure that this isnât original, but I started thinking about this way back in 1996 (pre-social-bookmarking) and Iâve lost my pointer to whatever influenced it. Anybody who can set me straight- Iâd appreciate it.
So here goes.
There are two graphs which, when seen together, explain a hell of a lot about various forms of dysfunction that you see in the technology world.
In this first graph, X represents relative âtechnical expertiseâ and Y represents the âperceived benefitâ in the introduction of a new technology:
The summary is that technical neophytes (A) tend to see high potential benefit in new technologies, while people who have a bit of technology experience (B) grow increasingly cynical about technology claims and can rattle-off the names of technologies that they have seen over-hyped and that have under-delivered. The interesting thing though, is that, as people become really expert in technology (C), their view of the potential benefits in new technology starts to increase again. At the far right of this scale Iâm talking about the real experts- the alpha-geeks of the world.
In the second graph, X again represents technical expertise, but Y represents âperceived riskâ associated with the introduction of a new technology:
Here the curve is inverted, but the basic pattern is the same. The neophytes (A) are blissfully unaware of the things that can go wrong with the introduction of a new technology. The tech-savvy (B) are battle-scarred and have seen (and possibly caused) countless disasters. The alpha-geeks (C) have also seen their share of problems, but they have also learned from their mistakes and know how to avoid them in the future. The alpha-geeks understand how to manage the risk.
Now things get interesting when you map these two dynamics against each other:
You see that neophytes in group A have essentially the same world view as the alpha-geeks in group C, but for completely different reasons. The trouble starts when you realize that most of senior executives, venture capitalists and members of the popular press are in group A. At the other extreme, most R&D groups, architecture groups, independent consultancies, technology pundits, etc. are in group C . There are a few problems with this:
- People in group A will often talk to and solicit advice from people in group C
- There are relatively few people in group C
- Most of the people who actually have to implement new technologies are in group B.
So you can start to see the problem.
In Part II Iâl talk some more about group B and Iâll discuss some of the classic patterns that emerge when A, B and C try to work with each other.
"
A powerful next generation server product that implements otherwise distinct server functionality within a single server product. Think of Virtuoso as the server software analog of a dual core processor where each core represents a traditional server functionality realm.
The Virtuoso History page tells the whole story.
90% of the aforementioned functionality has been available in Virtuoso since 2000 with the RDF Triple Store being the only 2006 item.
The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).
Simple, there is no value in a product of this magnitude remaining the "best kept secret". That status works well for our competitors, but absolutely works against the legions of new generation developers, systems integrators, and knowledge workers that need to be aware of what is actually achievable today with the right server architecture.
GPL version 2.
Dual licensing.
The Open Source version of Virtuoso includes all of the functionality listed above. While the Virtual Database (distributed heterogeneous join engine) and Replication Engine (across heterogeneous data sources) functionality will only be available in the commercial version.
On SourceForge.
Of course!
Up until this point, the Virtuoso Product Blog has been a covert live demonstration of some aspects of Virtuoso (Content Management). My Personal Blog and the Virtuoso Product Blog are actual Virtuoso instances, and have been so since I started blogging in 2003.
Is There a product Wiki?
Sure! The Virtuoso Product Wiki is also an instance of Virtuoso demonstrating another aspect of the Content Management prowess of Virtuoso.
Yep! Virtuoso Online Documentation is hosted via yet another Virtuoso instance. This particular instance also attempts to demonstrate Free Text search combined with the ability to repurpose well formed content in a myriad of forms (Atom, RSS, RDF, OPML, and OCS).
The Virtuoso Online Tutorial Site has operated as a live demonstration and tutorial portal for a numbers of years. During the same timeframe (circa. 2001) we also assembled a few Screencast style demos (their look feel certainly show their age; updates are in the works).
BTW - We have also updated the Virtuoso FAQ and also released a number of missing Virtuoso White Papers (amongst many long overdue action items).
]]>WinXP and OSX dual boot in MacBook Pro: "
Finally Iâve succeeded in installing Windows XP in MacBook Pro. Now it can dual boot between Windows XP and MacOS X. Thereâre few issues with windows xp but being able to boot smoothly between these 2 OSes are really amazing. Iâve followed this HOWTO where more and more information is being added every few hours. I think most of the minor problems will be solved soon. If you want to install it for your self or want more information this wiki is the best place to go. Here Iâm posting the photos of major installation sequence and some problems I encountered.
Installation
1. Downloaded winxponmac0.1.zip
Windows XP Pro CD that came with my Samsung Notebook is SP1 but the patch works only with SP2. So this is what I did:
2. Downloaded WinXP SP2 separately.
3. Used the free tool nLite to integrate the WinXP SP2 with the XP Pro CD (SP1) and created the WinXP SP2 CD source.
4. Then followed Step-by-step-instruction
5. Started Windows XP installation.
6. I encountered a problem with the partition listing. I was presented with following options.
According to the guide the correct option should be as following:
If you choose the Partition2 then youâll get follwing error:
7. To solve the above problem I selected the first 'unpartitioned space,' then pressed 'C' to create a new partition. As described in this solution. After this things went smoothly.
8. Finally itâs installed
9. System Properties
10. Device Manager with unrecognized devices.
11. Downloaded the drivers from here. Ethernet works fine. Wireless doesnât work. If I press restart it will shutdown.
12. Browsing my blog.
13. Boot Choice: Mac OSX
14. Boot Choice: Windows XP
Now thereâre few driver issues Iâm quite sure theyâll be solved soon.
" ]]>(Via The OSx86 Project.)
]]>Anyway, read the post from Doc Searls titled: Saving the Net from the pipeholders
"I've spent much of the last two weeks writing an essay that just went up at Linux Journal: Saving the Net: How to Keep the Carriers from Flushing the Net Down the Tubes. It's probably the longest post I've ever put up on the Web. It's certainly the most important. And not just to me.
I started writing it after a recent surprise visit by David Isenberg to Santa Barbara. He's the one who got me â and, I hope, us â going.
I finished writing it yesterday after David Berlind published threeexcellentpieces, which I highly recommend reading, and acting upon.
For guidance during the rest of this thing (whether they knew it or not), I also want to thank David Weinberger, Dave Winer, Steve Gillmor, Kevin Werbach, Cory Doctorow, Don Marti, Richard M. Stallman, Eric S. Raymond, Susan Crawford, Larry Lessig, John Palfrey, Chris Nolan, Jeff Jarvis, Craig Burton, Andrew Sullivan, Paul Kunz, Dean Landsman, Matt Welch, Sheila Lennon, George Lakoff, Om Malik, Phil Hughes, J.D. Lasica, Virginia Postrel, Chris Anderson, Esther Dyson, Jim Thompson, Micah Sifry, John Perry Barlow, The EFF, the Berkman Center, the Personal Democracy Forum and others I'm overlooking but will fill in later when I have the time.
Although it's kinda huge, Saving the Net wasn't written as a Finished Work, but rather as a conversation starter â a way to change a rock we're pushing uphill to a snowball we're rolling downhill.
Larry Lessig started rolling it at OSCON in 2002, and in various other ways before that, and the whole thing has been too damn sisyphean for too damn long. Time to change that.
There's a thesis involved: that the Net is in danger of becoming what Kevin Werbachcalls'a private toiled garden for the phone companies', but that the real enemy is in how we understand the Net itself. We have choices there, and those choices may mean life or death for the Net as most of us have known it â and taken it for granted â for the last decade or more.
A couple days ago I spoke to a group of about thirty local citizens here in Santa Barbara County, gathered in the County supervisors' conference room to discuss forming a broadband task force. Early on, I asked people what the Net was. The answers were varied, but had one thing in common: it was a place, and not just fiber and copper."
]]>Stop whatever you are doing ...: "
.. and go and read Tom Coates' explanation of his last project with the BBC. After 21 years working in broadcasting Ireckon this is one of the coolest things to happen for a very, very long time.
The ramifications of this will go very deep indeed."
(Spotted Via The Obvious?.)
Yes, the ramifications are deep! Tom Coates' screencast demonstrates an internal variation of an activity that is taking place on many fronts (concurrently) across the NET. I tend to refer to this effort as "Self Annotation"; the very process that will ultimately take us straight to "Semantic Web". It is going to happen much quicker than anticipated because technology is taking the pain out of metadata annotation (e.g. what you do when you tag everything that is ultimately URI accessible). Technology is basically delivering what Jon Udell calls: "reducing the activation threshold".
Using my comments above for context placement, I suggest you take a look at, or re-read Jon Udell's post titled: Many Meanings of Metadata.
Once again, the Web 2.0 brouhaha (in every sense of the word) is a reaction to a critical inflection that ultimately transitions the "Semantic Web" from "Mirage" to "Nirvana". Put differently (with humor in mind solely!), Web 2.0 is what I tend to call a "John the Baptist" paradigm, and we all know what happened to him :-)
Web 2.0 is a conduit to a far more important destination. The tendency to treat Web 2.0 as a destination rather than a conduit has contributed to the recent spate of Bozo bit flipping posts all over the blogosphere (is this an attempt to behead John, metaphorically speaking?). Humor aside, a really important thing about the Web 2.0 situation is that when we make the quantum evolutionary leap (internet time, mind you) to the "Semantic Web" (or whatever groovy name we dig up for it in due course) we will certainly have a plethora of reference points (I mean Web 2.0 URIs) ensuring that we do not revisit the "Missing Link" evolutionary paradox :-)
BTW - You can see some example of my contribution to the ongoing annotation process by looking at:
A Webpage is Not An API or a Platform (The Populicio.us Remix): "
A few months ago in my post GMail Domain Change Exposes Bad Design and Poor Code, I wrote Repeat after me, a web page is not an API or a platform. It seems some people are still learning this lesson the hard way. In the post The danger of running a remix service Richard MacManus writes
Populicio.us was a service that used data from social bookmarking site del.icio.us, to create a site with enhanced statistics and a better variety of 'popular' links. However the Populicio.us service has just been taken off air, because its developer can no longer get the required information from del.icio.us. The developer of Populicio.us wrote:
'Del.icio.us doesn't serve its homepage as it did and I'm not able to get all needed data to continue Populicio.us. Right now Del.icio.us doesn't show all the bookmarked links in the homepage so there is no way I can generate real statistics.'
This plainly illustrates the danger for remix or mash-up service providers who rely on third party sites for their data. del.icio.us can not only giveth, it can taketh away.
It seems Richard Macmanus has missed the point. The issue isn't depending on a third party site for data. The problem is depending on screen scraping their HTML webpage. An API is a service contract which is unlikely to be broken without warning. A web page can change depending on the whims of the web master or graphic designer behind the site.
Versioning APIs is hard enough, let alone trying to figure out how to version an HTML website so screen scrapers are not broken. Web 2.0 isn't about screenscraping. Turning the Web into an online platform isn't about legitimizing bad practices from the early days of the Web. Screen scraping needs to die a horrible death. Web APIs and Web feeds are the way of the future.
" Amen! ]]>The value of the Internet as a repository of useful information is very low. Carl Shapiro in âInformation Rulesâ suggests that the amount of actually useful information on the Internet would fit within roughly 15,000 books, which is about half the size of an average mall bookstore. To put this in perspective: there are over 5 billion unique, static & publicly accessible web pages on the www. Apparently Only 6% of web sites have educational content (Maureen Henninger, âDonât just surf the net: Effective research strategiesâ. UNSW Press). Even of the educational content only a fraction is of significant informational value.
..As Stanford students, Larry Page and Sergey Brin looked at the same problemâhow to impart meaning to all the content on the Webâand decided to take a different approach. The two developed sophisticated software that relied on other clues to discover the meaning of content, such as which Web sites the information was linked to. And in 1998 they launched Google..
You mean noise ranking. Now, I don't think Larry and Sergey set out to do this, but Google page ranks are ultimately based on the concept of "Google Juice" (aka links). The value quotient of this algorithm is accelerating at internet speed (ironically, but naturally). Human beings are smarter than computers, we just process data (not information!) much slower that's all. Thus, we can conjure up numerous ways to bubble up the google link ranking algorithms in no time (as is the case today).
..What most differentiates Google's approach from Berners-Lee's is that Google doesn't require people to change the way they post content..
The Semantic Web doesn't require anyone to change how they post content either! It just provides a roadmap for intelligent content managment and consumption through innovative products.
..As Sergey Brin told Infoworld's 2002 CTO Forum, "I'd rather make progress by having computers under-stand what humans write, than by forcing -humans to write in ways that computers can understand." In fact, Google has not participated at all in the W3C's formulation of Semantic Web standards, says Eric Miller..
Semantic Content generated by next generation content managers will make more progress, and they certainly won't require humans to write any differently. If anything, humans will find the process quite refreshing as and when participation is required e.g. clicking bookmarklets associated with tagging services such as 'del.icio.us', 'de.lirio.us', or Unalog and others. But this is only the beginning, if I can click on a bookmarklet to post this blog post to a tagging service, then why wouldn't I be able to incorporate the "tag service post" into the same process that saves my blog post (the post is content that ends up in a content management system aka blog server)?
Yet Google's impact on the Web is so dramatic that it probably makes more sense to call the next generation of the Web the "Google Web" rather than the "Semantic Web."
Ah! so you think we really want the noisy "Google Web" as opposed to a federation of distributed Information- and Knowledgbases ala the "Semantic Web"? I don't think so somehow!
Today we are generally excited about "tagging" but fail to see its correlation with the "Semantic Web", somehow? I have said this before, and I will say it again, the "Semantic Web" is going to be self-annotated by humans with the aid of intelligent and unobtrusive annotation technology solutions. These solutions will provide context and purpose by using our our social essence as currency. The annotation effort will be subliminal, there won't be a "Semantic Web Day" parade or anything of the like. It will appear before us all, in all its glory, without any fanfare. Funnily enough, we might not even call it "The Semantic Web", who cares? But it will have the distinct attributes of being very "Quiet" and highly "Valuable"; with no burden on "how we write", but constructive burden on "why we write" as part of the content contribution process (less Google/Yahoo/etc juice chasing for more knowledge assembly and exchange).
We are social creatures at our core. The Internet and Web have collectively reduced the connectivity hurdles that once made social network oriented solutions implausible. The eradication of these hurdles ultimately feeds the very impulses that trigger the critical self-annotation that is the basis of my fundamental belief in the realization of TBL's Semantic Web vision.
Â
]]>Here are a few links that resolve any confusion about this matter:
Or simple google on PHP and ODBC or PHP and iODBC ...
]]>Ajax, Hard Facts, Brass Tacks ... and Bad Slacks
<a>
and <form>
pack an enormous amount of functionality into deceptively simple tags, so too can new declarative mark-up capture patterns that have emerged 'in the wild'.
<form action=/search name=f>
<input type=hidden name=hl value=en>
<input maxLength=256 size=55 name=q value="">
<input type=submit value="Google Search" name=btnG>
</form>
<a>
and <form>
are pretty much the same thing.)
<xf:submission id="sub-search"
action="http://www.google.com/complete/search?hl=en"
method="get" separator="&"
replace="all"
/>
<xf:input ref="q">
<xf:label>Query:</xf:label>
</xf:input>
<xf:submit submission="sub-search">
<xf:label>Google Search</xf:label>
</xf:submit>
replace
attribute is actually optional in XForms, but I showed it in the previous mark-up so that you can compare it to this:
<xf:submission id="sub-search"
action="http://www.google.com/complete/search?hl=en"
method="get" separator="&"
replace="instance"
/>
replace
attribute can take the values all
, instance
, or none
.)
var req;
function loadXMLDoc(url) {
// native XMLHttpRequest object
if (window.XMLHttpRequest) {
req = new XMLHttpRequest();
req.onreadystatechange = readyStateChange;
req.open("GET", url, true);
req.send(null);
// IE/Windows ActiveX version
} else if (window.ActiveXObject) {
req = new ActiveXObject("Microsoft.XMLHTTP");
if (req) {
req.onreadystatechange = readyStateChange;
req.open("GET", url, true);
req.send();
}
}
}
readyStateChange()
method is invoked:
function readyStateChange() {
// '4' means document "loaded"
if (req.readyState == 4) {
// 200 means "OK"
if (req.status == 200) {
// do something here
} else {
// error processing here
}
}
}
<a>
in order to enter the exciting new world of 'hypertext' -- but XMLHttpRequest raises the bar again, and takes us right back into the heart of geek-world.
if (req.status == 200) {
// do something here
} else {
// error processing here
}
<xf:action ev:observer="sub-search" ev:event="xforms-submit-error">
<xf:message level="modal">
Submission failed
</xf:message>
</xf:action>
submission
part of XForms:multipart/related
;put
means the same thing whether the target URL begins http:
or file:
, a form with relative paths will run unchanged on a local machine or a web server;submission
element to read and write from an ADO database, allowing programmers to convert forms from using the web to using a local database by doing nothing more than changing a single target URL. (Try doing that with XMLHttpRequest!)submission
part of XForms is in fact so powerful that it will eventually form a separate specification, for use in other languages.<div>
, a CSS display: none;
, a mouseover
event handler and a timer? Nowadays the programmer with better things to do than work with spaghetti-JavaScript just uses the XForms <hint>
element, and for free they get platform independence (and therefore accessibility), as well as the ability to insert any mark-up.message
s?It finally dawned on me what OpenSearch does. Basically you tell it about different search engines by showing it how to query something in each, and get back an RSS return. Then when you search for some term, say foo+bar, it performs the search in all the engines you have configured it for. So it's a way to group a bunch of search engines together and command them all to look for the same thing. It is clever. It is something that hasn't been done before, to my knowledge. That's the good news. The bad news is that Amazon is a leading patent abuser. So as good as this idea is, it's bad for all the rest of us, unless they tell us that they're granting us some kind of license to use the idea. [via Scripting News]
When putting together a post yesterday about "Virtualization", I instinctively looked to Gurunet's "answers.com" service for information on the subject: Enterprise Information Integration (EII). Woe and behold! Here is what I found at the tail end of the answers.com article on this subject:
Now, I knew this was Wikipedia content repurposed by "answers.com", and I proceeded to clean up the article. The wikified article took a while to complete, because true to the "Wikipedia" ethos, I had to contribute knowledge as opposed to the original weenie marketing gunk. Its naturally easier to cut and paste marketing fluff for a misguided quick win attempt than it is to embed links, add knowledge, and discern Wiki Markup (but "Wiki" don't play that!).
This little exercise has broader implications for marketing as a whole, especially for the IT sector. The end of days for "Misinformation based Marketing" are nigh! Wikis, Blogs, Search Engines, Web Services, and Social Networking are rapidly destroying the historically prohibitive costs associated with customer pursuit of facts.
I am very confident that product quality will soon overshadow market share as the key determinant for both product selection on the part of customers (this is no longer a pipe dream!). I also have increased hope that IT product development and associated product marketing by technology vendors will veer in the same direction.
]]>While each of us can alter and build our own multiple models, it is an uphill struggle once we are past the initial years in educational institutions. We stand at the crossroads in India. We have the advantage of demographics on our side. We need to address the twin challenges of educating India's youth and doing it right. Education done right can be IndiaÃÂs biggest change agent. Conversely, putting people with limited and incomplete mental models in decision-making positions can worsen the situation dramatically.
So, what does it take for us to fix the problem at the source? Atanu Dey wrote about how to re-invent the education system recently on his blog:
I think that at a minimum, an educational system must teach people how to think. How to fast and how to wait would be good but perhaps it is too much to ask for right now. Does such a system exist anywhere in the world? I don't know for sure but I doubt it very sincerely. I realize of course that there are people who have gone through the current educational systems and they are also able to think. But I would be wary of ascribing that result to the present setup. It is more likely that despite the present system, those people have learnt how to think.I believe that learning how to think may be something alike to learning a language. It appears that we have a language learning sub-system in our brains which shuts down sometime around age 12 or so. Before reaching that age, you can very easily learn languages; after that, learning languages is extremely hard. So also, I believe that if you catch a kid early enough, you can teach him or her to think. It is as if the brain circuits are just a lot of firmware in early childhood and then as one grows up, the firmware hardens and become hardware that cannot be re-programmed.
Here is my prescription for a good education. Focus primarily on teaching how to think and on teaching people how to learn. Teaching how to think is like giving kids a very high powered CPU. Teaching them how to learn gives them control of a very broadband channel through which they can have access to content that the CPU can process. Alternative analogy: good thinking skills is like have a good operating system. And good learning skills is like having a great set of applications.
Recommended Reading:
[via Abhay Bhagat] Michael Mauboussin writes:
Economists have successfully described the economics of both information and networks. These economic principles appear durable. It is the combination of information and network properties that creates opportunities for businesses and investors. Most investors have not internalized these ideas.We believe the importance of information-based networks is increasing in todayÂs global economy for four reasons:
1. Physical capital needs are lower than they were in the past. Information-based networks require less capital as they grow than physical networks do.
2. Networks demonstrate increasing returns. Most industries benefit from supply-side increasing returns to scale: higher volume leads to lower unit costs, up to a point. In contrast, successful networks generate increasing returns from the demand-side as users beget users.
3. Networks can form faster and more frequently than in the past. Because of plummeting communication and computing costs, the barriers to creating a network are declining. But even though the barriers to entry are low, the barriers to success remain high.
4. Networks can spread globally. Because many networks have high upfront costs and low incremental costs, they can expand rapidly within countries and across borders.
This report focuses on how to categorize networks, how they affect economic value, and how they form.
By Uche Ogbuji, IBM developerWorks
The world of XML and Web services is huge, and growing. developerWorks does much to map it out for you, but when you're looking for a schema or a public Web service to meet some pressing need, it's useful to have handy several key resources. This tip shows you how to comb through the enormous variety of Internet resources to find schemata and Web services using common search criteria. The best known source for finding public SOAP Web services is XMethods. It has a comprehensive list of SOAP services that you can sort by several criteria. It also provides a demo client so you can try out the services right from the index site. You can also keep track of the listings on XMethods programmatically using UDDI, RSS, and other means.sites that provide directories of Web services include RemoteMethods.com and Web Service List. A chronicle of interesting Web services is Web service of the Day.
One resource that straddles the Web services/Semantic Web is WSindex.org, a directory of Web services, XML, SOAP, UDDI, WSDL, and Semantic Web resources. This site is a hierarchical and searchable directory.
http://www-106.ibm.com/developerworks/xml/library/x-tiplkws.html
]]>By Peter Sefton, XML.org
The author explores some of the ways that OpenOffice.org's Writer application is open to customization and configuration. He coveres a few techniques that will be of interest to template maintainers working with OpenOffice.org writer: how to crack open the file format, how to maintain large sets of styles, and how to customize menus and macros, all without using anything except standard tools, zip, an XSLT processor, and a text editor. All this can, of course, be further automated with a programming language of some kind, even a batch file.
There are some changes coming in version 2 of OpenOffice.org, but all these techniques will be forwards compatible, although some things like the location and name of the menu-bar files look like they will change. If you are also trying to store and manipulate content in XML but want to use a word processing environment for authoring, then well-crafted templates are even more important.
http://www.xml.com/pub/a/2005/01/26/hacking-ooo.html
See also the OpenDocument 1.0 CD:
http://xml.coverpages.org/ni2005-01-04-a.html]]>Enjoy!
Questions about Longhorn, part 3: Avalon's enterprise mission
The slide shown at the right comes from a presentation entitled Windows client roadmap, given last month to the International .NET Association (INETA). When I see slides like this, I always want to change the word "How" to "Why" -- so, in this case, the question would become "Why do I have to pick between Windows Forms and Avalon?" Similarly, MSDN's Channel 9 ran a video clip of Joe Beda, from the Avalon team, entitled How should developers prepare for Longhorn/Avalon? that, at least for me, begs the question "Why should developers prepare for Longhorn/Avalon?"
I've been looking at decision trees like the one shown in this slide for more than a decade. It's always the same yellow-on-blue PowerPoint template, and always the same message: here's how to manage your investment in current Windows technologies while preparing to assimilate the new stuff. For platform junkies, the internal logic can be compelling. The INETA presentation shows, for example, how it'll be possible to use XAML to write WinForms apps that host combinations of WinForms and Avalon components, or to write Avalon apps that host either or both style of component. Cool! But...huh? Listen to how Joe Beda frames the "rich vs. reach" debate:
Avalon will be supplanting WinForms, but WinForms is more reach than it is rich. It's the reach versus rich thing, and in some ways there's a spectrum. If you write an ASP.NET thing and deploy via the browser, that's really reach. If you write a WinForms app, you can go down to Win98, I believe. Avalon's going to be Longhorn only.So developers are invited to classify degrees of reach -- not only with respect to the Web, but even within Windows -- and to code accordingly. What's more, they're invited to consider WinForms, the post-MFC (Microsoft Foundation Classes) GUI framework in the .NET Framework, as "reachier" than Avalon. That's true by definition since Avalon's not here yet, but bizarre given that mainstream Windows developers can't yet regard .NET as a ubiquitous foundation, even though many would like to.
Beda recommends that developers isolate business logic and data-intensive stuff from the visual stuff -- which is always smart, of course -- and goes on to sketch an incremental plan for retrofitting Avalon goodness into existing apps. He concludes:
Avalon, and Longhorn in general, is Microsoft's stake in the ground, saying that we believe power on your desktop, locally sitting there doing cool stuff, is here to stay. We're investing on the desktop, we think it's a good place to be, and we hope we're going to start a wave of excitement leveraging all these new technologies that we're building.It's not every decade that the Windows presentation subsystem gets a complete overhaul. As a matter of fact, it's never happened before. Avalon will retire the hodge-podge of DLLs that began with 16-bit Windows, and were carried forward (with accretion) to XP and Server 2003. It will replace this whole edifice with a new one that aims to unify three formerly distinct modes: the document, the user interface, and audio-visual media. This is a great idea, and it's a big deal. If you're a developer writing a Windows application that needs to deliver maximum consumer appeal three or four years from now, this is a wave you won't want to miss. But if you're an enterprise that will have to buy or build such applications, deploy them, and manage them, you'll want to know things like:
How much fragmentation can my developers and users tolerate within the Windows platform, never mind across platforms?
Will I be able to remote the Avalon GUI using Terminal Services and Citrix?
Is there any way to invest in Avalon without stealing resources from the Web and mobile stuff that I still have to support?
Then again, why even bother to ask these questions? It's not enough to believe that the return of rich-client technology will deliver compelling business benefits. (Which, by the way, I think it will.) You'd also have to be shown that Microsoft's brand of rich-client technology will trump all the platform-neutral variations. Perhaps such a case can be made, but the concept demos shown so far don't do so convincingly. The Amazon demo at the Longhorn PDC (Professional Developers Conference) was indeed cool, but you can see similar stuff happening in Laszlo, Flex, and other RIA (rich Internet application) environments today. Not, admittedly, with the same 3D effects. But if enterprises are going to head down a path that entails more Windows lock-in, Microsoft will have to combat the perception that the 3D stuff is gratuitous eye candy, and show order-of-magnitude improvements in users' ability to absorb and interact with information-rich services.
In section, 4.1 Human-friendly Syntax, you say "There must be a text-based form of the query language which can be read and written by users of the language", and you list the status as "pending".
As background for section 4.1, you may be interested in RDFQueryLangComparison1 (original text replaced with live link).
It shows how to write queries in a form that includes English meanings.
The example queries can be run by pointing a browser to www.reengineeringllc.com .
Perhaps importantly, given the intricacy of RDF for nonprogrammers, one can get an English explanation of the result of each query.
-- Dr. Adrian Walker of Internet Business Logic
The Semantic Web continues to take shape, and Infonauts (information centric agents) are already taking shape.
A great thing about the net is the "back to the future" nature of most Web and Internet technology. For instance we are now frenzied about Service Oriented Architecture (SOA), Event Drivent Architecture (EDA), Loose Coupling of Composite Services etc. Basically rehashing the CORBA vision.
I see the Semantic Web playing a similar role in relation to artificial intelligence.
BTW - It still always comes down to data, and as you can imagine Virtuoso will be playing its usual role of alleviating the practical implementation and ulization challenges of all of the above :-)
]]>
In section, 4.1 Human-friendly Syntax, you say "There must be a text-based form of the query language which can be read and written by users of the language", and you list the status as "pending".
As background for section 4.1, you may be interested in RDFQueryLangComparison1 (original text replaced with live link).
It shows how to write queries in a form that includes English meanings.
The example queries can be run by pointing a browser to www.reengineeringllc.com .
Perhaps importantly, given the intricacy of RDF for nonprogrammers, one can get an English explanation of the result of each query.
-- Dr. Adrian Walker of Internet Business Logic
The Semantic Web continues to take shape, and Infonauts (information centric agents) are already taking shape.
A great this about the net is the "back to the future" nature of most Web and Internet technology. For instance we are now frenzied about Service Oriented Architecture (SOA), Event Drivent Architecture (EDA), Loose Coupling of Composite Services etc. Basically rehashing the CORBA vision.
I see the Semantic Web playing a similar role in relation to artificial intelligence.
BTW - It still always comes down to data, and as you can imagine Virtuoso will be playing its usual role of alleviating the practical implementation and ulization challenges of all of the above :-)
]]>
I have little to add to this matter as our understanding and vision is aptly expressed via the architecture and feature set of Virtuoso (this area was actually addressed circa 1999).
We are heading into a era of multi-model databases, these are single database engines that are capable of effectively serving the requirements of the Hierarchical, Network, Relational, and Object database models . As we get closer to the unravelling of universal storage, hopefully this will get clearer.
Back to Dare's commentary:
]]>C.J. Date, one of the most influential names in the relational database world, had some harsh words about XML's encroachment into the world of relational databases in a recent article entitled Date defends relational model that appeared on SearchDatabases.com. Key parts of the article are excerpted below
Date reserved his harshest criticism for the competition, namely object-oriented and XML-based DBMSs. Calling them "the latest fashions in the computer world," Date said he rejects the argument that relational DBMSs are yesterday's news. Fans of object-oriented database systems "see flaws in the relational model because they don't fully understand it," he said.
Date also said that XML enthusiasts have gone overboard.
"XML was invented to solve the problem of data interchange, but having solved that, they now want to take over the world," he said. "With XML, it's like we forget what we are supposed to be doing, and focus instead on how to do it."
Craig S. Mullins, the director of technology planning at BMC Software and a SearchDatabase.com expert, shares Date's opinion of XML. It can be worthwhile, Mullins said, as long as XML is only used as a method of taking data and putting it into a DBMS. But Mullins cautioned that XML data that is stored in relational DBMSs as whole documents will be useless if the data needs to be queried, and he stressed Date's point that XML is not a real data model.
Craig Mullins points are more straightforward to answer since his comments don't jibe with the current state of the art in the XML world. He states that you can't query XML documents stored in databases but this is untrue. Almost three years ago, I was writing articles about querying XML documents stored in relational databases. Storing XML in a relational database doesn't mean it has to be stored in as an opaque binary BLOB or as a big, bunch of text which cannot effectively be queried. The next version of SQL Server will have extensive capabilities for querying XML data in relational database and doing joins across relational and XML data, a lot of this functionality is described in the article on XML Support in SQL Server 2005. As for XML not having a data model, I beg to differ. There is a data model for XML that many applications and people adhere to, often without realizing that they are doing so. This data model is the XPath 1.0 data model, which is being updated to handled typed data as the XQuery and XPath 2.0 data model.
Now to tackle the meat of C.J. Date's criticisms which is that XML solves the problem of data interchange but now is showing up in the database. The thing first point I'd like point out is that there are two broad usage patterns of XML, it is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The latter type of data will only grow now that office productivity software like Microsoft Office have enabled users to save their documents as XML instead of proprietary binary formats. In many cases, these documents cannot simply shredded into relational tables. Sure you can shred an Excel spreadsheet written in spreadsheetML into relational tables but is the same really feasible for a Word document written in WordprocessingML? Many enterprises would rather have their important business data being stored and queried from a unified location instead of the current situation where some data is in document management systems, some hangs around as random files in people's folders while some sits in a database management system.
As for stating that critics of the relational model don't understand it, I disagree. One of the major benefits of using XML in relational databases is that it is a lot easier to deal with fluid schemas or data with sparse entries with XML. When the shape of the data tends to change or is not fixed the relational model is simply not designed to deal with this. Constantly changing your database schema is simply not feasible and there is no easy way to provide the extensibility of XML where one can say "after the X element, any element from any namespace can appear". How would one describe the capacity to store âany dataâ in a traditional relational database without resorting to an opaque blob?
I do tend to agree that some people are going overboard and trying to model their data hierarchically instead of relationally which experience has thought us is a bad idea. Recently on the XML-DEV mailing list entitled Designing XML to Support Information Evolution where Roger L. Costello described his travails trying to model his data which was being transferred as XML in a hierarchical manner. Micheal Champion accurately described the process Roger Costello went through as having "rediscovered the relational model". In a response to that thread I wrote "Hierarchical databases failed for a reason".
Using hierarchy as a primary way to model data is bad for at least the following reasons
Hierarchies tend to encourage redundancy. Imagine I have a <Customer> element who has one or more <ShippingAddress> elements as children as well as one or more <Order> elements as children as well. Each order was shipped to an address, so if modelled hierarchically each <Order> element also will have a <ShippingAddress> element which leads to a lot of unnecessary duplication of data. In the real world, there are often multiple groups to which a piece of data belongs which often cannot be modelled with a single hierarchy. Data is too tightly coupled. If I delete a <Customer> element, this means I've automatically deleted his entire order history since all the <Order> elements are children of <Customer>. Similarly if I query for a <Customer>, I end up getting all the <Order> information as well.To put it simply, experience has taught the software world that the relational model is a better way to model data than the hierarchical model. Unfortunately, in the rush to embrace XML many a repreating the mistakes from decades ago in the new millenium.
"Can I create an updateable SQL VIEW in Virtuoso that would comprise columns from 3rd party databases such as Oracle, SQL Server, and say MySQL".
The answer was yes, based on the fact that Virtuoso does support SQL INSTEAD-OF Triggers - even in Virtual Database mode.
I am certainly keen to see if any other Virtual Database style products achieve this feat (which is trying for many homogeneous SQL database engines).
Dr. Paul Dorsey of Dulcian, Inc. wrote a very good article about this subject, and here is an excerpt from his article overiew:
Views are an important part of application development. Since Oracle 7.3, we quickly recognized the importance of using Oracleâs updateable view feature. An updateable view allows you to join several tables and perform updates against the driving table. For example, if you join EMP and DEPT in the traditional way and display columns from both tables, DML operations are possible against EMP but not DEPT.
For traditional relational database designs, this is enough functionality. For example, in a typical Forms application, when you are basing a block on a table, the additional columns that you want to display are lookups from other tables and can therefore be easily supported using traditional updateable views. These views are built using a combination of joins and outer joins or, in extreme cases, looking up additional information through functions embedded in the views. Under no circumstances should post query triggers be used to support this functionality. Post query triggers cause unnecessary network traffic and also embed the logic in the application rather than in the database or somewhere else where it can easily be reused.
What happens in a situation where the information you want to display in the block requires a query that is so complex that your ability to maintain (insert, update, delete) that information using a simple updateable view is eliminated? The updateable views are relatively restrictive. Only a single table can be updated. Joins must be created carefully and based on Foreign Key constraints in the database. No set operators such as UNION or MINUS can be used. For these reasons, it is common to end up with a block that cannot be updated as required. How do most developers handle this situation?
<!--[if !supportLists]-->a) <!--[endif]-->By placing complex logic in the form (WHEN-VALIDATE-ITEM triggers)
<!--[if !supportLists]-->b) <!--[endif]-->By writing procedures that access Formsâ ability to replace the Insert, Update, Delete routines and place that logic in the form
These practices are just as undesirable as using POST-QUERY triggers. The logic is in the wrong place and is not reusable.
The INSTEAD-OF trigger views feature was introduced by Oracle in version 8.15. This feature enables developers to create views on single or multiple tables or any other view imaginable by writing INSTEAD-OF triggers that tell the view how to behave when Inserts, Updates or Deletes are issued. Peter Koletzke and I first wrote about this feature in our Oracle Press book Oracle Developer: Advanced Forms & Reports (2000). At the time, we gave the feature relatively brief mention because we believed that most of the systems we were building included blocks based on traditional updateable views, which allow updates to a single table. Now, there is a good reason to look more closely at INSTEAD-OF trigger views.
Database Journal also has an article on this subject.
]]>By Lars Marius Garshol, Ontopia Technical Report
Information Architecture is the discipline dealing with the modern version of this problem: how to organize web sites so that users actually can find what they are looking for. Information architects have so far applied known and well-tried tools from library science to solve this problem, and now topic maps are sailing up as another potential tool for information architects. This raises the question of how topic maps compare with the traditional solutions. The paper argues that topic maps go beyond the traditional solutions in the sense that it provides a framework within which they can be represented as they are, but also extended in ways which significantly improve information retrieval. The paper tries to show that topic maps provide a common reference model that can be used to explain how to understand many common techniques from library science and information architecture.
http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html
See also (XML) Topic Maps:
http://xml.coverpages.org/topicMaps.html]]>To share your calendars, you need access to a webDAV server. If you run your own web server, you can install mod_dav, a free Apache module that will turn your web server into a webDAV server. Instructions on how to set it up are on their website. Once you set up your webDAV server, you can publish your calendar to the site, then subscribe to it from any other Mozilla Calendar. Automatically updating the calendar will give you a poor man's calendar server.
Through WebDAV we will be able to share calendars across disparate calendaring tools (albeit with some degree of pain when Outlook is in the mix). Even better for me, I can post my shared calendar data via a Virtuoso instance (internally and externally since WebDAV is one of the many protocols that it implements), in short I could even seriously consider generating this on the fly and sharing it via this blog (Wow!).
We aren't too many miles away from open and standards compliant Unified Data Storage thanks to WebDAV.
]]>
This is further illuminates the content of my earlier post on this subject.
]]>
Reading the Longhorn SDK docs is a disorienting experience. Everything's familiar but different. Consider these three examples:
[Full story: Replace and defend via Jon's Radio]
"Replace & Defend" is certainly a strategy that would have awakened the entire non Microsoft Developer world during the recent PDC event. I know these events are all about preaching to the choir (Windows only developers), but as someone who has worked with Microsoft technologies as an ISV since the late 80's there is something about this events announcements that leave me concerned.
Ironically these concerns aren't about the competitive aspects of their technology disruptions, but more along the lines of how Microsoft (I hope inadvertently) generates the kinds of sentiments echoed in the comments thread from Scobles recent "How to hate Microsoft" post. As indicated in my response to this post, I don't believe Microsoft is as bad or evil as is instinctively assumed in many quarters, but I can certainly understand why they are hated by others which is really unfortunate, especially bearing in mind that they have done more good than harm to date (in my humble opinion) .
Anyway, back to my concerns post PDC which I break down as follows:
WinFS needs to architecturally separate the System Provider from the Data Provider (pretty much the OLE-DB architecture) with Microsoft naturally providing reference System Provider (pretty much what was demonstrated at PDC) and Data Provider (ADO.NET, OLE DB, and ODBC) implementations. Third parties can choose to produce custom WinFS Service or Data Providers which serve their data access needs. It's impractical to want to force every non SQL Server customer over to SQL Server in order them to exploit WinFS, and I certainly hope this isn't the definitive strategy at Microsoft.
]]>There is a new HOWTO document that addresses an area of frequent confusion on Mac OS X, which is how do you build PHP with an ODBC data access layer binding (iODBC variant) using Mac OS X Frameworks as opposed to Darwin Shared Libraries.
This document basically brings clarity to both the Frameworks and Darwin Shared library approaches.
]]>Lives could be saved if pioneering messaging trial is a success Mobile phone photo messaging could help to save lives at the scene of an accident if a new service being tested in Scotland is successful. Fife Fire & Rescue Service has started trials using photo messaging to receive advice from doctors on how to deal with critical injuries at major incidents. Rescue officers will send photo messages of accidents via GPRS to the Accident and Emergency (A&E) unit at Dunfermline's Queen Margaret hospital, preparing emergency wards for the arrival of casualties and receiving help in return. 'We plan to send pictures of traffic accidents directly to the hospital, in order to get advice about how best to deal with the accident victims,' said Fife Fire & Rescue Service firemaster Mike Bitcon. Using the photographs, doctors can assess the injuries and prepare appropriately, as well as deciding if a doctor should be present at the scene of the accident. 'We're confident that this initiative will help to save lives,' said Bitcon.This can all happen right now, and independent of any particular network carrier or device via the inherent power of blogging (using mobile or multimedia blogging technology). ]]>
In the year 2000 the question of the shape and form of XML data was unclear to many, and reading the article below basically took me back in time to when we released Virtuoso 2.0 (we are now at release 3.0 commercially with a 3.2 beta dropping any minute).
RSS is a great XML application, and it does a great job of demonstrating how XML --the new data access foundation layer-- will galvanize the next generation Web (I refer to this as Web 2.0.).
RSS: INJAN (It's not just about news)
RSS is not just about news, according to Ian Davis on rss-dev.
He presents a nice list of alternatives, which I reproduce here (and to which I�d add, of course, bibliography management)
- Sitemaps: one of the S�s in RSS stands for summary. A sitemap is a summary of the content on a site, the items are pages or content areas. This is clearly a non-chronological ordering of items. Is a hierarchy of RSS sitemaps implied here � how would the linking between them work? How hard would it be to hack a web browser to pick up the RSS sitemap and display it in a sidebar when you visit the site?
- Small ads: also known as classifieds. These expire so there�s some kind of dynamic going on here but the ordering of items isn�t necessarily chronological. How to describe the location of the seller, or the condition of the item or even the price. Not every ad is selling something � perhaps it�s to rent out a room.
- Personals: similar model to the small ads. No prices though (I hope). Comes with a ready made vocabulary of terms that could be converted to an RDF schema. Probably should do that just for the hell of it anyway � gsoh
- Weather reports: how about a week�s worth of weather in an RSS channel. If an item is dated in the future, should an aggregator display it before time? Alternate representations include maps of temperature and pressure etc.
- Auctions: again, related to small ads, but these are much more time limited since there is a hard cutoff after which the auction is closed. The sequence of bids could be interesting � would it make sense to thread them like a discussion so you can see the tactics?
- TV listings: this is definitely chronological but with a twist � the items have durations. They also have other metadata such as cast lists, classification ratings, widescreen, stereo, program type. Some types have additional information such as director and production year.
- Top ten listings: top ten singles, books, dvds, richest people, ugliest, rear of the year etc. Not chronological, but has definate order. May update from day to day or even more often.
- Sales reporting: imagine if every department of a company reported their sales figures via RSS. Then the divisions aggregate the departmental figures and republish to the regional offices, who aggregate and add value up the chain. The chairman of the company subscribes to one super-aggregate feed.
- Membership lists / buddy lists: could I publish my buddy list from Jabber or other instant messengers? Maybe as an interchange format or perhaps could be used to look for shared contacts. Lots of potential overlap with FOAF here.
- Mailing lists: or in fact any messaging system such as usenet. There are some efforts at doing this already (e.g. yahoogroups) but we need more information � threads; references; headers; links into archives.
- Price lists / inventory: the items here are products or services. No particular ordering but it�d be nice to be able to subscribe to a catalog of products and prices from a company. The aggregator should be able to pick out price rises or bargains given enough history.
Thus, if we can comprehend RSS (the blog article below does a great job) we should be able to see the fundamental challenges that are before any organization seeking to exploit the potential of the imminent Web 2.0 inflection; how will you cost-effectively create XML data from existing data sources? Without upgrading or switching database engines, operating systems, programming languages? Put differently how can you exploit this phenomenon without losing your ever dwindling technology choices (believe me choices are dwindling fast but most are oblivious to this fact).
Â
xmlrsssyndication]]>Tim O'Reilly wrote some thoughts about network aware software. Good sumup and nice ideas, why not only blogs should be net-aware (and where even blogs can be improved ;) )
"For the desktop, my personal vision is to see existing software instrumented to become increasingly web aware. It seems that Apple are doing a good job with this. (What does web aware mean for me? Being able to grok URIs, speaking WebDAV, and using open standard data formats.)" -- Edd Dumbill[via Bitflux Blog]
Rendezvous-like functionality for automatic discovery of and potential synchronization with other instances of the application on other computers. Apple is showing the power of this idea with iChat and iTunes, but it really could be applied in so many other places. For example, if every PIM supported this functionality, we could have the equivalent of "phonester" where you could automatically ask peers for contact information. Of course, that leads to guideline 2.
Another application is discovery of ODBC data sources, and database servers. Rendezvous can also simply security and administration of data sources accessible by either one of these standards data access mechanisms. It can also apply to XML databases and data sources exposed by XML Databases.
The very point I continue to make about Internet Points of Presence beingactual data acces points, in short these end points should be served by database serverprocesses. This is the very basis of Virtuoso, the inevitability of this realization remains the undepinings of this product. There are other products out there that have some sense of this vision too, but there is a little snag (at least so far in my research efforts), and that is the tendency to create dedicated independent server per protocol (an ultimate integration, administration, and maintenance nightmare).
We need to get with the program, technology is no silver bullet, we have brains for a reason, we simply need to exercise the brain muscle (this activity has been in rapid decline). The piece below pretty much sums up this sentiment:
Lack Of Internet Skills A Barrier To Progress At Work I would guess this really depends on what your job entails, but a new survey has found that many people who lack internet "skills" feel that it has held them back at work. There are plenty of jobs where I would assume it would be a requirement that you know how to use the internet, while there are plenty of others where it shouldn't matter one way or the other. Also, I imagine this problem will begin to decrease over time as a new generation of workers shows up who were brought up on the internet. Of course, then we'll find out that a lack of "mobile phone text messaging" or some other random tech skill will be holding people back at work. These are all skills that can be picked up with a little bit of effort. If people think they need them to advance in their job, isn't it their responsibility to learn these skills? You make yourself employable by keeping up-to-date. [via Techdirt]
I say, "Get with the Program!".
]]>
Ingres (technically, Advantage Ingres Enterprise) is, arguably, the forgotten database. There used to be five major databases: Oracle, DB2, Sybase, Informix and Ingres. Then along came Microsoft and, if you listened to most press comment (or the lack of it), you would think that there were only two of these left, plus SQL Server. [From IT-Director]
]]>Oracle, Microsoft, and IBM would certainly like the illusion of a 3 horse race, as this is the only way they can induce Ingres, Informix, and Sybase users to jump ship, and this, even though database migrations are by far the most risk prone and problematic aspects of any IT infrastructure.
Here is the interesting logic from the self-made big three, if you want to take advanatage of new paradigms and technologies such as XML, Web Services, and anything else in the pipeline you have to move all your data out of these databases, and then get all the mission critical applications re-associated with one of these databases, and by the way when you do so it is advisable that you use native interfaces (so that sometime in the future you have no chance whatsoever of repeating this folly at their expense).
The simple fact of the matter (which the self-made big three do not want you to know) is that you can put ODBC, JDBC, even platform specific data access APIs such as OLE DB and ADO.NET atop any of these databases, and then explore and exploit the benefits of new technologies and paradigms as long as the tool pool supports one of more of these standards.
Unfortunately the no-brainer above appears to be the more difficult of the choices before decision makers. In other words, many would rather dig themselves into a deeper hole (unknowingly i can only presume) that ultimately leads to technology lock-in.
The biggest challenge before any RDBMS based infrastructure today isn't which of the self-made big three to migrate to wholesale, rather, how to make progressive use of the pool of disparate applications, and application databases that proliferate the enterprise.
This is another way of understanding the burgeoning market for Virtual Databases, which in my opiion present the new frontier in database technology.