Danny Ayers responds, via
his post titled: Sampling, to
"Stefano Mazzochi's post about Data
Integration using Semantic Web Technologies.
"There is a potential problem with republication of transformed
data, in that right away there may be inconsistency with the
original source data. Here provenance tracking (probably via named
graphs) becomes a must-have. The web data space itself can support
very granular separation. Whatever, data integration is a hard
problem. But if you have a uniform language for describing
resources, at least it can be possible."
Alex James also chimes in with valuable insights in his post:
Sampling the global data model,
where he concludes:
"Exactly we need to use projected views, or conceptual
models. '
See a projected view can be thought of as a conceptual model
that has some mapping to a *sampling* of the global data model.
The benefits of introducing this extra layer are many and
varied: Simplicity, URI predictability, Domain Specificity and the
ability to separate semantics from lower level details like data
mapping.
Unfortunately if you look at today’s ORMs you will quickly
notice that they simply map directly from Object Model to Data
Model in one step.
This naïve approach provides no place to manage the mapping to a
conceptual model that sampling the world’s data requires.
What we need to solve the problems Stefano sees is to bring
together the world of mapping and semantics. And the place they
will meet is simply the Conceptual Model."
Data Integration challenges arise because the following facts
hold true all of the time (whether we like it or not):
- Data Heterogeneity is a fact of life at the intranet and
internet levels
- Data is rarely clean
- Data Integration prowess are ultimately measured by pain
alleviation
- A some point human participation is required, but the trick is
to move human activity up the value chain
- Glue code size and Data Integration success are inversely
related
- Data Integration is best addressed via "M" rather than "C" (if
we use the MVC pattern as a guide. "V" is dead on arrival for the
scrappers out there)
In 1997 we commenced the Virtuoso Virtual DBMS
Project that morphed into the Virtuoso
Universal Server; A fusion of DBMS functionality and Middleware
functionality in a single product. The goal of this undertaking
remains alleviation of the costs associated with Data Integration
Challenges by Virtualizing Data at the Logical and Conceptual
Layers.
The Logical Data Layer has been concrete for a while (e.g
Relational DBMS Engines), what hasn't reached the mainstream is the
Concrete Conceptual Model, but this is changing fast courtesy
of the activity taking place in the realm of RDF.
RDF provides an Open and Standards compliant vehicle for
developing and exploiting Concrete Conceptual Data Models that
ultimately move the Human aspect of the "Data Integration
alleviation quest" higher up the value chain.