State of the Semantic Web, Part 2 - The Technical Questions (updated)

Here I will talk about some more technical questions that came up. This is mostly general; Virtuoso specific questions and answers are separate.

"How to Bootstrap? Where will the triples come from?"

There are already wrappers producing RDF from many applications. Since any structured or semi-structured data can be converted to RDF and often there is even a pre-existing terminology for the application domain, the availability of the data per se is not the concern.

The triples may come from any application or database, but they will not come from the end user directly. There was a good talk about photograph annotation in Vienna, describing many ways of deriving metadata for photos. The essential wisdom is annotating on the spot and wherever possible doing so automatically. The consumer is very unlikely to go annotate photos after the fact. Further, one can infer that photos made with the same camera around the same time are from the same location. There are other such heuristics. In this use case, the end user does not need to see triples. There is some benefit though in using commonly used geographical terminology for linking to other data sources.

"How will one develop applications?"

I'd say one will develop them much the same way as thus far. In PHP, for example. Whether one's query language is SPARQL or SQL does not make a large difference in how basic web UI is made.

A SPARQL end-point is no more an end-user item than a SQL command-line is.

A common mistake among techies is that they think the data structure and user experience can or ought to be of the same structure. The UI dialogs do not, for example, have to have a 1:1 correspondence with SQL tables.

The idea of generating UI from data, whether relational or data-web, is so seductive that generation upon generation of developers fall for it, repeatedly. Even I, at OpenLink, after supposedly having been around the block a couple of times made some experiments around the topic. What does make sense is putting a thin wrapper or HTML around the application, using XSLT and such for formatting. Since the model does allow for unforeseen properties of data, one can build a viewer for these alongside the regular forms. For this, Ajax technologies like OAT (the OpenLink AJAX Toolkit) will be good.

The UI ought not to completely hide the URIs of the data from the user. It should offer a drill down to faceted views of the triples for example. Remember when Xerox talked about graphical user interfaces in 1980? "Don't mode me in" was the slogan, as I recall.

Since then, we have vacillated between modal and non-modal interaction models. Repetitive workflows like order entry go best modally and are anyway being replaced by web services. Also workflows that are very infrequent benefit from modality; take personal network setup wizards, for example. But enabling the knowledge worker is a domain that by its nature must retain some respect for human intelligence and not kill this by denying access to the underlying data, including provenance and URIs. Face it: the world is not getting simpler. It is increasingly data dependent and when this is so, having semantics and flexibility of access for the data is important.

For a real-time task-oriented user interface like a fighter plane cockpit, one will not show URIs unless specifically requested. For planning fighter sorties though, there is some potential benefit in having all data such as friendly and hostile assets, geography, organizational structure, etc., as linked data. It makes for more flexible querying. Linked data does not per se mean open, so one can be joinable with open data through using the same identifiers even while maintaining arbitrary levels of security and compartmentalization.

For automating tasks that every time involve the same data and queries, RDF has no intrinsic superiority. Thus the user interfaces in places where RDF will have real edge must be more capable of ad hoc viewing and navigation than regular real-time or line of business user interfaces.

The OpenLink Data Explorer idea of a "data behind the web page" view goes in this direction. Read the web as before, then hit a switch to go to the data view. There are and will be separate clarifications and demos about this.

"What of the proliferation of standards? Does this not look too tangled, no clear identity? How would one know where to begin?"

When SWEO was beginning, there was an endlessly protracted discussion of the so-called layer cake. This acronym jungle is not good messaging. Just say linked, flexibly repurpose-able data, and rich vocabularies and structure. Just the right amount of structure for the application, less rigid and easier to change than relational.

Do not even mention the different serialization formats. Just say that it fits on top of the accepted web infrastructure — HTTP, URIs, and XML where desired.

It is misleading to say inference is a box at some specific place in the diagram. Inference of different types may or may not take place at diverse points, whether presentation or storage, on demand or as a preprocessing step. Since there is structure and semantics, inference is possible if desired.

"Can I make a social network application in RDF only, with no RDBMS?"

Yes, in principle, but what do you have in mind? The answer is very context dependent. The person posing the question had an E-learning system in mind, with things such as course catalogues, course material, etc. In such a case, RDF is a great match, especially since the user count will not be in the millions. No university has that many students and anyway they do not hang online browsing the course catalogue.

On the other hand, if I think of making a social network site with RDF as the exclusive data model, I see things that would be very inefficient. For example, keeping a count of logins or the last time of login would be by default several times less efficient than with a RDBMS.

If some application is really large scale and has a knowable workload profile, like any social network does, then some task-specific data structure is simply economical. This does not mean that the application language cannot be SPARQL but this means that the storage format must be tuned to favor some operations over others, relational style. This is a matter of cost more than of feasibility. Ten servers cost less than a hundred and have failures ten times less frequently.

In the near term we will see the birth of an application paradigm for the data web. The data will be open, exposed, first-class citizen; yet the user experience will not have to be in a 1:1 image of the data.

Orri Erling's Weblog

Details

Subscribe

Tag Cloud

Post Categories

Recent Articles

"How to Bootstrap? Where will the triples come from?"

"How will one develop applications?"

"What of the proliferation of standards? Does this not look too tangled, no clear identity? How would one know where to begin?"

"Can I make a social network application in RDF only, with no RDBMS?"

Comments

Post Comment

Orri Erling's Weblog

Details

Subscribe

Tag Cloud

Post Categories

Recent Articles

"How to Bootstrap? Where will the triples come from?"

"How will one develop applications?"

"What of the proliferation of standards? Does this not look too tangled, no clear identity? How would one know where to begin?"

"Can I make a social network application in RDF only, with no RDBMS?"

Related

Comments

Post Comment