Virtuoso Chronicles from the Field: Nepomuk, KDE, and the quest for a sophisticated RDF DBMS.

For this particular user experience chronicle, I've simply inserted the content of Sebastian Trueg's post titled: What We Did Last Summer (And the Rest of 2009) – A Look Back Onto the Nepomuk Development Year ..., directly into this post, without any additional commentary or modification.

2009 is over. Yeah, sure, trueg, we know that, it has been over for a while now! Ok, ok, I am a bit late, but still I would like to get this one out - if only for my archive. So here goes.

Virtuoso

Let’s start with the major topic of 2009 (and also the beginning of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the same problems: you either used the sesame2 backend which depends on Java and steals all of your memory or you were stuck with Redland which had the worst performance and missed some SPARQL features making important parts of Nepomuk like queries unusable. So more than a year ago I had the idea to use the one GPL’ed database server out there that supported RDF in a professional manner: OpenLink’s Virtuoso. It has all the features we need, has a very good performance, and scales up to dimensions we will probably never reach on the desktop (yeah, right, and 64k main memory will be enough forever!). So very early I started coding the necessary Soprano plugin which would talk to a locally running Virtuoso server through ODBC. But since I ran into tons of small problems (as always) and got sidetracked by other tasks I did not finish it right away. OpenLink, however, was very interested in the idea of their server being part of every KDE installation (why wouldn’t they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable for the desktop but also helped in debugging all the problems that I had left. Many test runs, patches, and a Virtuoso 5.0.12 release later I could finally announce the Virtuoso integration as usable.

Then end of last year I dropped the support for sesame2 and redland. Virtuoso is now the only supported database backend. The reason is simple: Virtuoso is way more powerful than the rest - not only in terms of performance - and it is fully implemented in C(++) without any traces of Java. Maybe even more important is the integration of the full text index which makes the previously used CLucene index unnecessary. Thus, we can finally combine full text and graph queries in one SPARQL query. This results in a cleaner API and way faster return of search results since there is no need to combine the results from several queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss later.

So now the only thing I am waiting for is the first bugfix release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make 6.0.0 fail with Nepomuk. Should be out any day now. :)

The Nepomuk Query API

Querying data in Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the very limited capabilities of the ResourceManager to list resources with certain properties or of a certain type; or 2. Write your own SPARQL query using ugly QString::arg replacements.

With the introduction of Virtuoso and its awesome power we can now do pretty much everything in one query. This allowed me to finally create a query API for KDE: Nepomuk::Query::Query and friends. I won’t go into much detail here since I did that before.

All in all you should remember one thing: whenever you think about writing your own SPARQL query in a KDE application - have a look at libnepomukquery. It is very likely that you can avoid the hassle of debugging a query by using the query API.

The first nice effect of the new API (apart from me using it all over the place obviously) is the new query interface in Dolphin. Internally it simply combines a bunch of Nepomuk::Query::Term objects into a Nepomuk::Query::AndTerm. All very readable and no ugly query strings.

Dolphin Search Panel in KDE SC 4.4

Shared Desktop Ontologies

An important part of the Nepomuk research project was the creation of a set of ontologies for describing desktop resources and their metadata. After the Xesam project under the umbrella of freedesktop.org had been convinced to use RDF for describing file metadata they developed their own ontology. Thanks to Evgeny (phreedom) Egorochkin and Antonie Mylka both the Xesam ontology and the Nepomuk Information Elements Ontology were already very close in design. Thus, it was relatively easy to merge the two and be left with only one ontology to support. Since then not only KDE but also Strigi and Tracker are using the Nepomuk ontologies.

At the Gran Canaria Desktop Summit I met some of the guys from Tracker and we tried to come up with a plan to create a joint project to maintain the ontologies. This got off to a rough start as nobody really felt responsible. So I simply took the initiative and released the shared-desktop-ontologies version 0.1 in November 2009. The result was a s***-load of hate-mails and bug reports due to me breaking KDE build. But in the end it was worth it. Now the package is established and other projects can start to pick it up to create data compatible to the Nepomuk system and Tracker.

Today the ontologies (and the shared-desktop-ontologies package) are maintained in the Oscaf project at Sourceforge. The situation is far from perfect but it is a good start. If you need specific properties in the ontologies or are thinking about creating one for your own application - come and join us in the bug tracker…

Timeline KIO Slave

It was at the Akonadi meeting that Will Stephenson and myself got into talking about mimicking some Zeitgeist functionality through Nepomuk. Basically it meant gathering some data when opening and when saving files. We quickly came up with a hacky patch for KIO and KFileDialog which covered most cases and allowed us to track when a file was modified and by which application. This little experiment did not leave that state though (it will, however, this year) but another one did: Zeitgeist also provides a fuse filesystem which allows to browse the files by modification dates. Well, whatever fuse can do, KIO can do as well. Introducing the timeline:/ KIO slave which gives a calendar view onto your files.

Tips And Tricks

Well, I thought I would mention the Tips And Tricks section I wrote for the techbase. It might not be a big deal but I think it contains some valuable information in case you are using Nepomuk as a developer.

Google Summer Of Code 2009

This time around I had the privilege to mentor two students in the Google Summer of Code. Alessandro Sivieri and Adam Kidder did outstanding work on Improved Virtual Folders and the Smart File Dialog.

Adam’s work lead me to some heavy improvements in the Nepomuk KIO slaves myself which I only finished this week (more details on that coming up). Alessandro continued his work on faceted file browsing in KDE and created:

Sembrowser

Alessandro is following up on his work to make faceted file browsing a reality in 2010 (and KDE SC 4.5). Since it was too late to get faceted browsing into KDE SC 4.4 he is working on Sembrowser, a stand-alone faceted file browser which will be the grounds for experiments until the code is merged into Dolphin.

Faceted Browsing in KDE with Sembrowser

Nepomuk Workshops

In 2009 I organized the first Nepomuk workshop in Freiburg, Germany. And also the second one. While I reported properly on the first one I still owe a summary for the second one. I will get around to that - sooner or later. ;)

CMake Magic

Soprano gives us a nice command line tool to create a C++ namespace from an ontology file: onto2vocabularyclass. It produces nice convenience namespaces like Soprano::Vocabulary::NAO. Nepomuk adds another tool named nepomuk-rcgen. Both were a bit clumsy to use before. Now we have nice cmake macros which make it very simple to use both.

See the techbase article on how to use the new macros.

Bangarang

Without my knowledge (imagine that!) Andrew Lake created an amazing new media player named Bangarang - a Jamaican word for noise, chaos or disorder. This player is Nepomuk-enabled in the sense that it has a media library which lets you browse your media files based on the Nepomuk data. It remembers the number of times a song or a video has been played and when it was played last. It allows to add detail such as the TV series name, season, episode number, or actors that are in the video - all through Nepomuk (I hope we will soon get tvdb integration).

Edit metadata directly in Bangarang

Dolphin showing TV episode metadata created by Bangarang

And of course searching for it works, too...

And it is pretty, too...

I am especially excited about this since finally applications not written or mentored by me start contributing Nepomuk data.

Gran Canaria Desktop Summit

2009 was also the year of the first Gnome-KDE joint-conference. Let me make a bulletin for completeness and refer to my previous blog post reporting on my experiences on the island.

Well, that was by far not all I did in 2009 but I think I covered most of the important topics. And after all it is ‘just a blog entry’ - there is no need for completeness. Thanks for reading.