Danny Ayers responds, via his post titled: Sampling, to "Stefano Mazzochi's post about Data Integration using Semantic Web Technologies.

"There is a potential problem with republication of transformed data, in that right away there may be inconsistency with the original source data. Here provenance tracking (probably via named graphs) becomes a must-have. The web data space itself can support very granular separation. Whatever, data integration is a hard problem. But if you have a uniform language for describing resources, at least it can be possible."

Alex James also chimes in with valuable insights in his post: Sampling the global data model, where he concludes:

"Exactly we need to use projected views, or conceptual models. '

See a projected view can be thought of as a conceptual model that has some mapping to a *sampling* of the global data model.

The benefits of introducing this extra layer are many and varied: Simplicity, URI predictability, Domain Specificity and the ability to separate semantics from lower level details like data mapping.

Unfortunately if you look at today’s ORMs you will quickly notice that they simply map directly from Object Model to Data Model in one step.

This naïve approach provides no place to manage the mapping to a conceptual model that sampling the world’s data requires.

What we need to solve the problems Stefano sees is to bring together the world of mapping and semantics. And the place they will meet is simply the Conceptual Model."

Data Integration challenges arise because the following facts hold true all of the time (whether we like it or not):

  1. Data Heterogeneity is a fact of life at the intranet and internet levels
  2. Data is rarely clean
  3. Data Integration prowess are ultimately measured by pain alleviation
  4. A some point human participation is required, but the trick is to move human activity up the value chain
  5. Glue code size and Data Integration success are inversely related
  6. Data Integration is best addressed via "M" rather than "C" (if we use the MVC pattern as a guide. "V" is dead on arrival for the scrappers out there)

In 1997 we commenced the Virtuoso Virtual DBMS Project that morphed into the Virtuoso Universal Server; A fusion of DBMS functionality and Middleware functionality in a single product. The goal of this undertaking remains alleviation of the costs associated with Data Integration Challenges by Virtualizing Data at the Logical and Conceptual Layers.

The Logical Data Layer has been concrete for a while (e.g Relational DBMS Engines), what hasn't reached the mainstream is the Concrete Conceptual Model, but this is changing fast courtesy of the activity taking place in the realm of RDF.

RDF provides an Open and Standards compliant vehicle for developing and exploiting Concrete Conceptual Data Models that ultimately move the Human aspect of the "Data Integration alleviation quest" higher up the value chain.