Kingsley Idehen's Blog Data Space

Entries: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 ]

Details

Kingsley Uyi Idehen

Lexington, United States

FOAF

Full profile

Atom 1.0

RSS 2.0

RSS (USM) 2.0

RDF RSS 1.0

OPML 1.0

Multimedia

Videos

Audio

Images

iTunes Subscription

Media RSS (Yahoo!)

GData

SIOC (RDF/XML)

SIOC (N3/Turtle)

Post Categories

ALL

Commentary

Content Syndication

Data Access

Database Technology

Demos

History - Technology

Industry News

Programming

Security

Semantic Web

Social Networking

SQL

Tips and Tricks

Virtual Database

Web Services

Web Services (Web 2.0)

XML

XQuery/XPath

E-Mail:

Display Settings

articles per page.

order.

DBpedia + BBC (combined) Linked Data Space Installation Guide

What?

The DBpedia + BBC Combo Linked Dataset is a preconfigured Virtuoso Cluster (4 Virtuoso Cluster Nodes, each comprised of one Virtuoso Instance; initial deployment is to a single Cluster Host, but license may be converted for physically distributed deployment), available via the Amazon EC2 Cloud, preloaded with the following datasets:

Why?

The BBC has been publishing Linked Data from its Web Data Space for a number of years. In line with best practices for injecting Linked Data into the World Wide Web (Web), the BBC datasets are interlinked with other datasets such as DBpedia and MusicBrainz.

Typical follow-your-nose exploration using a Web Browser (or even via sophisticated SPARQL query crawls) isn't always practical once you get past the initial euphoria that comes from comprehending the Linked Data concept. As your queries get more complex, the overhead of remote sub-queries increases its impact, until query results take so long to return that you simply give up.

Thus, maximizing the effects of the BBC's efforts requires Linked Data that shares locality in a Web-accessible Data Space — i.e., where all Linked Data sets have been loaded into the same data store or warehouse. This holds true even when leveraging SPARQL-FED style virtualization — there's always a need to localize data as part of any marginally-decent locality-aware cost-optimization algorithm.

This DBpedia + BBC dataset, exposed via a preloaded and preconfigured Virtuoso Cluster, delivers a practical point of presence on the Web for immediate and cost-effective exploitation of Linked Data at the individual and/or service specific levels.

How?

To work through this guide, you'll need to start with 90 GB of free disk space. (Only 41 GB will be consumed after you delete the installer archives, but starting with 90+ GB ensures enough work space for the installation.)

Install Virtuoso

Download Virtuoso installer archive(s). You must deploy the Personal or Enterprise Edition; the Open Source Edition does not support Shared-Nothing Cluster Deployment.
Obtain a Virtuoso Cluster license.
Install Virtuoso.
Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell and install directory):

. /opt/virtuoso/virtuoso-enterprise.sh
Optional: To keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME environment variable to a different directory, e.g.,

export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/

Note: You will have to adjust this setting every time you shift between this cluster setup and your single-server setup. Either may be made your environment's default through the virtuoso-enterprise.sh and related scripts.
Set up your cluster by running the mkcluster.sh script. Note that initial deployment of the DBpedia + BBC Combo requires a 4 node cluster, which is the default for this script.
Start the Virtuoso Cluster with this command:

virtuoso-start.sh
Stop the Virtuoso Cluster with this command:

virtuoso-stop.sh

Using the DBpedia + BBC Combo dataset

Navigate to your installation directory.
Download the combo dataset installer script — bbc-dbpedia-install.sh.
For best results, set the downloaded script to fully executable using this command:

chmod 755 bbc-dbpedia-install.sh
Shut down any Virtuoso instances that may be currently running.
Optional: As above, if you have decided to keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME environment variable appropriately, e.g.,

export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Run the combo dataset installer script with this command:

sh bbc-dbpedia-install.sh

Verify installation

The combo dataset typically deploys to EC2 virtual machines in under 90 minutes; your time will vary depending on your network connection speed, machine speed, and other variables.

Once the script completes, perform the following steps:

Verify that the Virtuoso Conductor (HTTP-based Admin UI) is in place via:

http://localhost:[port]/conductor
Verify that the Virtuoso SPARQL endpoint is in place via:

http://localhost:[port]/sparql
Verify that the Precision Search & Find UI is in place via:

http://localhost:[port]/fct
Verify that the Virtuoso hosted PivotViewer is in place via:

http://localhost:[port]/PivotViewer

BBC Linked Data Spaces Presentation
BBC Music Linked Dataset Snapshot -- PivotViewer Page Screenshot
BBC Programmes Linked Dataset Snapshot -- -- PivotViewer Page Screenshot
BBC Nature Linked Dataset Snapshot -- PivotViewer Page Screenshot
BBC Food Recipes Snapshot -- PivotViewer Page Screenshot
My Del.icio.us bookmark collection re. BBC Linked Data Demos
Amazon EC2 Snapshots for DBpedia 3.6 + BBC combo -- delivers the BBC and DBpedia dataset combo via a mountable Elastic Block Storage (EBS) device usable with an Amazon Machine Image (AMI)
Amazon EC2 Snapshots for DBpedia 3.6 & 3.5
Virtuoso Commercial Edition Download Page
Virtuoso Cluster Edition Guide

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [0]

02/17/2011 17:15 GMT-0500

Modified: 03/29/2011 10:09 GMT-0500

Re-introducing the Virtuoso Virtual Database Engine

In recent times a lot of the commentary and focus re. Virtuoso has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked is the sophisticated Virtual Database Engine that provides the foundation for all of Virtuoso's data integration capabilities.

In this post I provide a brief re-introduction to this essential aspect of Virtuoso.

What is it?

This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).

Why is it important?

In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools

In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.

How do I use it?

The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:

Relational Database Federation

You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!

Conceptual Level Data Access using the RDF Model

You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).

You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).

It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.

Conceptual Level Data Access using ADO.NET Entity Frameworks

As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [0]

02/17/2010 16:38 GMT-0500

Modified: 02/17/2010 16:46 GMT-0500

What is the DBpedia Project? (Updated)

The recent Wikipedia imbroglio centered around DBpedia is the fundamental driver for this particular blog post. At time of writing this blog post, the DBpedia project definition in Wikipedia remains unsatisfactory due to the following shortcomings:

inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.

Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.

What is DBpedia?

A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.

Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.

When was it Created?

As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.

Who's Behind It?

OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).

How is it Constructed?

The steps are as follows:

RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia

In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.

Why is it Important?

It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.

How Do I Use it?

In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.

What Can I Use it For?

Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:

List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
Tagging - as a compliment to existing practices
Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.

Pre-loaded and Pre-configured instances of DBpedia 3.4 - via publicly shared Amazon Elastic Block Storage Snapshots
Virtuoso & DBpedia Tunning Guide
What's In a Name & The Linked Data Police.

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [0]

01/31/2010 17:45 GMT-0500

Modified: 01/31/2010 17:46 GMT-0500

What is the DBpedia Project? (Updated)

inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.

Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.

What is DBpedia?

When was it Created?

Who's Behind It?

How is it Constructed?

The steps are as follows:

RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia

In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist without a fully populated SPARQL compliant Quad Store. Last but not least, it doesn't exist if you have a fully loaded SPARQL compliant Quad Store isn't up to the cocktail of challenges (query load and complexity) presented by live Web database accessibility.

Why is it Important?

How Do I Use it?

In the most basic sense, simply browse the HTML based resource decriptor pages en route to discovering erstwhile undiscovered relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records), comprised of HTTP URIs from both realms e.g., via owl:sameAs relations.

What Can I Use it For?

Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:

List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
Tagging - as a compliment to existing practices
Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.

Pre-loaded and Pre-configured instances of DBpedia 3.4 - via publicly shared Amazon Elastic Block Storage Snapshots
Virtuoso & DBpedia Tunning Guide
What's In a Name & The Linked Data Police.

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [0]

01/31/2010 17:43 GMT-0500

Modified: 09/15/2010 18:10 GMT-0500

5 Very Important Things to Note about HTTP based Linked Data

It isn't World Wide Web Specific (HTTP != World Wide Web)
It isn't Open Data Specific
It isn't about "Free" (Beer or Speech)
It isn't about Markup (so don't expect to grok it via "markup first" approach)
It's about Hyperdata - the use of HTTP and REST to deliver a powerful platform agnostic mechanism for Data Reference, Access, and Integration.

When trying to understand HTTP based Linked Data, especially if you're well versed in DBMS technology use (User, Power User, Architect, Analyst, DBA, or Programmer) think:

Open Database Connectivity (ODBC) without operating system, data model, or wire-protocol specificity or lock-in potential
Java Database Connectivity (JDBC) without programming language specificity
ADO.NET without .NET runtime specificity and .NET bound language specificity
OLE-DB without Windows operating system & programming language specificity
XMLA without XML format specificity - with Tabular and Multidimensional results formats expressible in a variety of data representation formats.
All of the above scoped to the Record rather than Container level, with Generic HTTP scheme URIs associated with each Record, Field, and Field value (optionally)

Remember the need for Data Access & Integration technology is the by product of the following realities:

Human curated data is ultimately dirty, because:
- our thick thumbs, inattention, distractions, and general discomfort with typing, make typos prevalent
- database engines exist for a variety of data models - Graph, Relational, Hierarchical;
- within databases you have different record container/partition names e.g. Table Names;
- within a database record container you have records that are really aspects of the same thing (different keys exist in a plethora of operational / line of business systems that expose aspects of the same entity e.g., customer data that spans Accounts, CRM, ERP application databases);
- different field names (one database has "EMP" while another has "Employee") for the same record
- .
Units of measurement is driven by locale, the UK office wants to see sales in Pounds Sterling while the French office prefers Euros etc.
All of the above is subject to context halos which can be quite granular re. sensitivity e.g. staff travel between locations that alter locales and their roles; basically, profiles matters a lot.

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [0]

01/31/2010 17:31 GMT-0500

Modified: 02/01/2010 09:00 GMT-0500

Virtuoso Chronicles from the Field: Nepomuk, KDE, and the quest for a sophisticated RDF DBMS.

For this particular user experience chronicle, I've simply inserted the content of Sebastian Trueg's post titled: What We Did Last Summer (And the Rest of 2009) – A Look Back Onto the Nepomuk Development Year ..., directly into this post, without any additional commentary or modification.

2009 is over. Yeah, sure, trueg, we know that, it has been over for a while now! Ok, ok, I am a bit late, but still I would like to get this one out - if only for my archive. So here goes.

Virtuoso

Let’s start with the major topic of 2009 (and also the beginning of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the same problems: you either used the sesame2 backend which depends on Java and steals all of your memory or you were stuck with Redland which had the worst performance and missed some SPARQL features making important parts of Nepomuk like queries unusable. So more than a year ago I had the idea to use the one GPL’ed database server out there that supported RDF in a professional manner: OpenLink’s Virtuoso. It has all the features we need, has a very good performance, and scales up to dimensions we will probably never reach on the desktop (yeah, right, and 64k main memory will be enough forever!). So very early I started coding the necessary Soprano plugin which would talk to a locally running Virtuoso server through ODBC. But since I ran into tons of small problems (as always) and got sidetracked by other tasks I did not finish it right away. OpenLink, however, was very interested in the idea of their server being part of every KDE installation (why wouldn’t they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable for the desktop but also helped in debugging all the problems that I had left. Many test runs, patches, and a Virtuoso 5.0.12 release later I could finally announce the Virtuoso integration as usable.

Then end of last year I dropped the support for sesame2 and redland. Virtuoso is now the only supported database backend. The reason is simple: Virtuoso is way more powerful than the rest - not only in terms of performance - and it is fully implemented in C(++) without any traces of Java. Maybe even more important is the integration of the full text index which makes the previously used CLucene index unnecessary. Thus, we can finally combine full text and graph queries in one SPARQL query. This results in a cleaner API and way faster return of search results since there is no need to combine the results from several queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss later.

So now the only thing I am waiting for is the first bugfix release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make 6.0.0 fail with Nepomuk. Should be out any day now. :)

The Nepomuk Query API

Querying data in Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the very limited capabilities of the ResourceManager to list resources with certain properties or of a certain type; or 2. Write your own SPARQL query using ugly QString::arg replacements.

With the introduction of Virtuoso and its awesome power we can now do pretty much everything in one query. This allowed me to finally create a query API for KDE: Nepomuk::Query::Query and friends. I won’t go into much detail here since I did that before.

All in all you should remember one thing: whenever you think about writing your own SPARQL query in a KDE application - have a look at libnepomukquery. It is very likely that you can avoid the hassle of debugging a query by using the query API.

The first nice effect of the new API (apart from me using it all over the place obviously) is the new query interface in Dolphin. Internally it simply combines a bunch of Nepomuk::Query::Term objects into a Nepomuk::Query::AndTerm. All very readable and no ugly query strings.

Dolphin Search Panel in KDE SC 4.4

Shared Desktop Ontologies

An important part of the Nepomuk research project was the creation of a set of ontologies for describing desktop resources and their metadata. After the Xesam project under the umbrella of freedesktop.org had been convinced to use RDF for describing file metadata they developed their own ontology. Thanks to Evgeny (phreedom) Egorochkin and Antonie Mylka both the Xesam ontology and the Nepomuk Information Elements Ontology were already very close in design. Thus, it was relatively easy to merge the two and be left with only one ontology to support. Since then not only KDE but also Strigi and Tracker are using the Nepomuk ontologies.

At the Gran Canaria Desktop Summit I met some of the guys from Tracker and we tried to come up with a plan to create a joint project to maintain the ontologies. This got off to a rough start as nobody really felt responsible. So I simply took the initiative and released the shared-desktop-ontologies version 0.1 in November 2009. The result was a s***-load of hate-mails and bug reports due to me breaking KDE build. But in the end it was worth it. Now the package is established and other projects can start to pick it up to create data compatible to the Nepomuk system and Tracker.

Today the ontologies (and the shared-desktop-ontologies package) are maintained in the Oscaf project at Sourceforge. The situation is far from perfect but it is a good start. If you need specific properties in the ontologies or are thinking about creating one for your own application - come and join us in the bug tracker…

Timeline KIO Slave

It was at the Akonadi meeting that Will Stephenson and myself got into talking about mimicking some Zeitgeist functionality through Nepomuk. Basically it meant gathering some data when opening and when saving files. We quickly came up with a hacky patch for KIO and KFileDialog which covered most cases and allowed us to track when a file was modified and by which application. This little experiment did not leave that state though (it will, however, this year) but another one did: Zeitgeist also provides a fuse filesystem which allows to browse the files by modification dates. Well, whatever fuse can do, KIO can do as well. Introducing the timeline:/ KIO slave which gives a calendar view onto your files.

Tips And Tricks

Well, I thought I would mention the Tips And Tricks section I wrote for the techbase. It might not be a big deal but I think it contains some valuable information in case you are using Nepomuk as a developer.

Google Summer Of Code 2009

This time around I had the privilege to mentor two students in the Google Summer of Code. Alessandro Sivieri and Adam Kidder did outstanding work on Improved Virtual Folders and the Smart File Dialog.

Adam’s work lead me to some heavy improvements in the Nepomuk KIO slaves myself which I only finished this week (more details on that coming up). Alessandro continued his work on faceted file browsing in KDE and created:

Sembrowser

Alessandro is following up on his work to make faceted file browsing a reality in 2010 (and KDE SC 4.5). Since it was too late to get faceted browsing into KDE SC 4.4 he is working on Sembrowser, a stand-alone faceted file browser which will be the grounds for experiments until the code is merged into Dolphin.

Faceted Browsing in KDE with Sembrowser

Nepomuk Workshops

In 2009 I organized the first Nepomuk workshop in Freiburg, Germany. And also the second one. While I reported properly on the first one I still owe a summary for the second one. I will get around to that - sooner or later. ;)

CMake Magic

Soprano gives us a nice command line tool to create a C++ namespace from an ontology file: onto2vocabularyclass. It produces nice convenience namespaces like Soprano::Vocabulary::NAO. Nepomuk adds another tool named nepomuk-rcgen. Both were a bit clumsy to use before. Now we have nice cmake macros which make it very simple to use both.

See the techbase article on how to use the new macros.

Bangarang

Without my knowledge (imagine that!) Andrew Lake created an amazing new media player named Bangarang - a Jamaican word for noise, chaos or disorder. This player is Nepomuk-enabled in the sense that it has a media library which lets you browse your media files based on the Nepomuk data. It remembers the number of times a song or a video has been played and when it was played last. It allows to add detail such as the TV series name, season, episode number, or actors that are in the video - all through Nepomuk (I hope we will soon get tvdb integration).

Edit metadata directly in Bangarang

Dolphin showing TV episode metadata created by Bangarang

And of course searching for it works, too...

And it is pretty, too...

I am especially excited about this since finally applications not written or mentored by me start contributing Nepomuk data.

Gran Canaria Desktop Summit

2009 was also the year of the first Gnome-KDE joint-conference. Let me make a bulletin for completeness and refer to my previous blog post reporting on my experiences on the island.

Well, that was by far not all I did in 2009 but I think I covered most of the important topics. And after all it is ‘just a blog entry’ - there is no need for completeness. Thanks for reading.

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [0]

01/28/2010 11:14 GMT-0500

Modified: 02/01/2010 09:02 GMT-0500

Time for RDBMS Primacy Downgrade is Nigh! (No Embedded Images Edition - Update 1)

As the world works it way through a "once in a generation" economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the apex of the data access and data management pyramid is nigh.

What is the Data Access, and Data Management Value Pyramid?

As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.

See: AVF Pyramid Diagram.

The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.

In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.

Why has RDBMS Primacy has Endured?

Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.

See: RDBMS Primacy Diagram.

For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:

"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."
-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.

"One size fits all: A concept whose time has come and gone

They are direct descendants of System R and Ingres and were architected more than 25 years ago

They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.

-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.

Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".

Circumstantial Pain

As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).

Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:

Government (Globally) -

Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.

Enterprises -

Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.

In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.

Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).

Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.

Technology

There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:

Query language standardization - nothing close to SQL standardization
Data Access API standardization - nothing close to ODBC, JDBC, OLE-DB, or ADO.NET
Wire protocol standardization - nothing close to HTTP
Distributed Identity infrastructure - nothing close to the non-repudiatable digital Identity that foaf+ssl accords
Use of Identifiers as network based pointers to data sources - nothing close to RDF based Linked Data
Negotiable data representation - nothing close to Mime and HTTP based Content Negotiation
Scalability especially in the era of Internet & Web scale.

Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models

A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.

What Comes Next?

The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:

The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore e.g., the current global economic crisis remains centered on the inability to connect dots across "Open World" and "Closed World" data frontiers
Entity-Attribute-Value with Classes & Relationships (EAV/CR) based DBMS models are more effective when dealing with disparate data associated with disparate schemas, across disparate DBMS engines, host operating systems, and networks.

Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:

Every item of data (Datum/Entity/Object/Resource) has Identity
Identity is achieved via Identifiers that aren't locked at the DBMS, OS, Network, or Application levels
Object Identifiers and Object values are independent (extricably linked by association)
Object values should be de-referencable via Object Identifier
Representation of de-referenced value graph (entity, attributes, and values mesh) must be negotiable (i.e. content negotiation)
Structured query language must provide mechanism for Creation, Deletion, Updates, and Querying of data objects
Performance & Scalability across "Closed World" (enterprise) and "Open World" (Internet & Web) realms.

Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.

The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.

It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.

EAV/CR Oriented Data Access & Management Technology

Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:

Resource Description Framework (RDF) - an EAV/CR based framework
RDF Linked Data - EAV/CR based framework that mandates de-referencable HTTP based Identifiers
ADO.NET Entity Frameworks - Microsoft .NET based EAV/CR framework
Core Data Services - Mac OS X based EAV/CR framework that evolved from NeXT's Enterprise Object Frameworks (EOF).

The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.

See: New EAV/CR Primacy Diagram.

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [2]

01/27/2009 19:19 GMT-0500

Modified: 03/17/2009 11:50 GMT-0500

The Time for RDBMS Primacy Downgrade is Nigh!

What is the Data Access, and Data Management Value Pyramid?

Why has RDBMS Primacy has Endured?

"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."
-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.

"One size fits all: A concept whose time has come and gone

They are direct descendants of System R and Ingres and were architected more than 25 years ago

They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.

-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.

Circumstantial Pain

Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:

Government (Globally) -

Enterprises -

Technology

Query language standardization - nothing close to SQL standardization
Data Access API standardization - nothing close to ODBC, JDBC, OLE-DB, or ADO.NET
Wire protocol standardization - nothing close to HTTP
Distributed Identity infrastructure - nothing close to the non-repudiatable digital Identity that foaf+ssl accords
Use of Identifiers as network based pointers to data sources - nothing close to RDF based Linked Data
Negotiable data representation - nothing close to Mime and HTTP based Content Negotiation
Scalability especially in the era of Internet & Web scale.

Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models

What Comes Next?

The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore e.g., the current global economic crisis remains centered on the inability to connect dots across "Open World" and "Closed World" data frontiers
Entity-Attribute-Value with Classes & Relationships (EAV/CR) based DBMS models are more effective when dealing with disparate data associated with disparate schemas, across disparate DBMS engines, host operating systems, and networks.

Every item of data (Datum/Entity/Object/Resource) has Identity
Identity is achieved via Identifiers that aren't locked at the DBMS, OS, Network, or Application levels
Object Identifiers and Object values are independent (extricably linked by association)
Object values should be de-referencable via Object Identifier
Representation of de-referenced value graph (entity, attributes, and values mesh) must be negotiable (i.e. content negotiation)
Structured query language must provide mechanism for Creation, Deletion, Updates, and Querying of data objects
Performance & Scalability across "Closed World" (enterprise) and "Open World" (Internet & Web) realms.

Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.

EAV/CR Oriented Data Access & Management Technology

Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:

Resource Description Framework (RDF) - an EAV/CR based framework
RDF Linked Data - EAV/CR based framework that mandates de-referencable HTTP based Identifiers
ADO.NET Entity Frameworks - Microsoft .NET based EAV/CR framework
Core Data Services - Mac OS X based EAV/CR framework that evolved from NeXT's Enterprise Object Frameworks (EOF).

The Semantic Way - Alan Cho's Summary of PwC 2009 tech forecast report on the Semantic Web
Is the RDBMS Doomed - ReadWriteWeb Article
Anti-RDBMS: a list of Distributed Key-Value Stores - by Richard Jones (CTO Last.FM)
How & Why Glue is Using Amazon SimpleDB
Object Database Manifesto (Identity excerpt)
Database Models Overview
Ted Nelson Explaining Irregularity and Idiosyncrasy of Data Structures - ZigZag Demo

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [0]

01/24/2009 20:04 GMT-0500

Modified: 06/03/2009 18:09 GMT-0500

New ADO.NET 3.x Provider for Virtuoso Released (Update 2)

I am pleased to announce the immediate availability of the Virtuoso ADO.NET 3.5 data provider for Microsoft's .NET platform.

What is it?

A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.

Benefits?

Technical:

It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.

The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.

Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.

Strategic:

You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.

How do I use it?

Simply follow one of guides below:

Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [0]

01/08/2009 04:36 GMT-0500

Modified: 01/08/2009 09:12 GMT-0500

Introducing Virtuoso Universal Server (Cloud Edition) for Amazon EC2

What is it?

A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.

What does it offer?

From a Web Entrepreneur perspective it offers:

Low cost entry point to a game-changing Web 3.0+ (and beyond) platform that combines SQL, RDF, XML, and Web Services functionality
Flexible variable cost model (courtesy of EC2 DevPay) tightly bound to revenue generated by your services
Delivers federated and/or centralized model flexibility for you SaaS based solutions
Simple entry point for developing and deploying sophisticated database driven applications (SQL or RDF Linked Data Web oriented)
Complete framework for exploiting OpenID, OAuth (including Role enhancements) that simplifies exploitation of these vital Identity and Data Access technologies
Easily implement RDF Linked Data based Mail, Blogging, Wikis, Bookmarks, Calendaring, Discussion Forums, Tagging, Social-Networking as Data Space (data containers) features of your application or service offering
Instant alleviation of challenges (e.g. service costs and agility) associated with Data Portability and Open Data Access across Web 2.0 data silos
LDAP integration for Intranet / Extranet style applications.

From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:

RDF Database (a Quad Store with SPARQL & SPARUL Language & Protocol support)
SQL Database (with ODBC, JDBC, OLE-DB, ADO.NET, and XMLA driver access)
XML Database (XML Schema, XQuery/Xpath, XSLT, Full Text Indexing)
Full Text Indexing.

From a Middleware perspective it provides:

RDF Views (Wrappers / Semantic Covers) over SQL, XML, and other data sources accessible via SOAP or REST style Web Services
Sponger Service for converting non RDF information resources into RDF Linked Data "on the fly" via a large collection of pre-installed RDFizer Cartridges.

From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering

HTTP Web Server
WebDAV Server
Web Application Server (includes PHP runtime hosting)
SOAP or REST style Web Services Deployment
RDF Linked Data Deployment
SPARQL (SPARQL Query Language) and SPARUL (SPARQL Update Language) endpoints
Virtuoso Hosted PHP packages for MediaWiki, Drupal, Wordpress, and phpBB3 (just install the relevant Virtuoso Distro. Package).

From the general System Administrator's perspective it provides:

Online Backups (Backup Set dispatched to S3 buckets, FTP, or HTTP/WebDAV server locations)
Synchronized Incremental Backups to Backup Set locations
Backup Restore from Backup Set location (without exiting to EC2 shell).

Higher level user oriented offerings include:

OpenLink Data Explorer front-end for exploring the burgeoning Linked Data Web
Ajax based SPARQL Query Builder (iSPARQL) that enables SPARQL Query construction by Example
Ajax based SQL Query Builder (QBE) that enables SQL Query construction by Example.

For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:

Point of presence on the Linked Data Web that meshes your Identity and your Data via URIs
System generated Social Network Profile & Contact Data via FOAF?
System generated SIOC (Semantically Interconnected Online Community) Data Space (that includes a Social Graph) exposing all your Web data in RDF Linked Data form
System generated OpenID and automatic integration with FOAF
Transparent Data Integration across Facebook, Digg, LinkedIn, FriendFeed, Twitter, and any other Web 2.0 data space equipped with RSS / Atom support and/or REST style Web Services
In-built support for SyncML which enables data synchronization with Mobile Phones.

How Do I Get Going with It?

bookmark it! submit digg.com

digg it!

reddit!

# PermaLink Comments [0]

11/28/2008 19:27 GMT-0500

Modified: 11/28/2008 16:06 GMT-0500

<< | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | >>

Powered by OpenLink Virtuoso Universal Server

Running on Linux platform

The posts on this weblog are my personal views, and not those of OpenLink Software.

Kingsley Idehen's Blog Data Space

Details

Subscribe

Tag Cloud

Post Categories

Subscribe

Recent Articles

Display Settings

What?

Why?

How?

Install Virtuoso

Using the DBpedia + BBC Combo dataset

Verify installation

Related

What is it?

Why is it important?

How do I use it?

Relational Database Federation

Conceptual Level Data Access using the RDF Model

Conceptual Level Data Access using ADO.NET Entity Frameworks

Related

What is DBpedia?

When was it Created?

Who's Behind It?

How is it Constructed?

Why is it Important?

How Do I Use it?

What Can I Use it For?

Related

What is DBpedia?

When was it Created?

Who's Behind It?

How is it Constructed?

Why is it Important?

How Do I Use it?

What Can I Use it For?

Related

Related

Virtuoso

The Nepomuk Query API

Shared Desktop Ontologies

Timeline KIO Slave

Tips And Tricks

Google Summer Of Code 2009

Sembrowser

Nepomuk Workshops

CMake Magic

Bangarang

Gran Canaria Desktop Summit

What is the Data Access, and Data Management Value Pyramid?

Why has RDBMS Primacy has Endured?

Circumstantial Pain

Technology

Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models

What Comes Next?

EAV/CR Oriented Data Access & Management Technology

Related

What is the Data Access, and Data Management Value Pyramid?

Why has RDBMS Primacy has Endured?

Circumstantial Pain

Technology

Entity-Attribute-Value with Classes & Relationships (EAV/CR) data models

What Comes Next?

EAV/CR Oriented Data Access & Management Technology

Related

What is it?

Benefits?

Technical:

Strategic:

How do I use it?

Related

What is it?

What does it offer?

How Do I Get Going with It?