Details
Kingsley Uyi Idehen
Lexington, United States
Subscribe
Post Categories
Subscribe
Recent Articles
Display Settings
|
DBpedia + BBC (combined) Linked Data Space Installation Guide
What?
The DBpedia + BBC
Combo Linked Dataset is a preconfigured Virtuoso Cluster (4 Virtuoso Cluster Nodes,
each comprised of one Virtuoso Instance; initial deployment is to a
single Cluster Host, but license may be converted for physically
distributed deployment), available via the Amazon EC2 Cloud,
preloaded with the following datasets:
Why?
The BBC has been publishing Linked Data from its Web Data Space for a number of years. In line
with best practices for injecting Linked Data into the World Wide Web (Web), the BBC datasets are
interlinked with other datasets such as DBpedia and
MusicBrainz.
Typical follow-your-nose exploration using a Web Browser (or
even via sophisticated SPARQL query crawls) isn't always practical
once you get past the initial euphoria that comes from
comprehending the Linked Data concept. As your queries get more
complex, the overhead of remote sub-queries increases its impact,
until query results take so long to return that you simply give
up.
Thus, maximizing the effects of the BBC's efforts requires
Linked Data that shares locality in a Web-accessible Data Space —
i.e., where all Linked Data sets have been loaded into the same
data store or warehouse. This holds true even when leveraging
SPARQL-FED style virtualization — there's always a need to localize
data as part of any marginally-decent locality-aware
cost-optimization algorithm.
This DBpedia + BBC dataset, exposed via a preloaded and
preconfigured Virtuoso Cluster, delivers a practical point of
presence on the Web for immediate and cost-effective exploitation
of Linked Data at the individual and/or service specific
levels.
How?
To work through this guide, you'll need to start with 90 GB of free
disk space. (Only 41 GB will be consumed after you delete the
installer archives, but starting with 90+ GB ensures enough work
space for the installation.)
Install Virtuoso
-
Download Virtuoso installer archive(s). You
must deploy the Personal or Enterprise Edition; the Open Source
Edition does not support Shared-Nothing Cluster Deployment.
-
Obtain a Virtuoso Cluster license.
-
Install Virtuoso.
-
Set key environment variables and start the OpenLink License
Manager, using command (this may vary depending on your shell and
install directory):
.
/opt/virtuoso/virtuoso-enterprise.sh
-
Optional: To keep the default single-server configuration
file and demo database intact, set the VIRTUOSO_HOME
environment variable to a different directory, e.g.,
export
VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Note: You will have to adjust this setting every time
you shift between this cluster setup and your single-server setup.
Either may be made your environment's default through the
virtuoso-enterprise.sh and related scripts.
-
Set up your cluster by running the
mkcluster.sh script. Note that initial deployment of
the DBpedia + BBC Combo requires a 4 node cluster, which is
the default for this script.
-
Start the Virtuoso Cluster with this command:
virtuoso-start.sh
-
Stop the Virtuoso Cluster with this command:
virtuoso-stop.sh
Using the DBpedia + BBC Combo dataset
-
Navigate to your installation directory.
-
Download the combo dataset installer script — bbc-dbpedia-install.sh .
-
For best results, set the downloaded script to fully executable
using this command:
chmod 755
bbc-dbpedia-install.sh
-
Shut down any Virtuoso instances that may be currently
running.
-
Optional: As above, if you have decided to keep the
default single-server configuration file and demo database intact,
set the VIRTUOSO_HOME environment variable
appropriately, e.g.,
export
VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
-
Run the combo dataset installer script with this command:
sh bbc-dbpedia-install.sh
Verify installation
The combo dataset typically deploys to EC2 virtual machines in
under 90 minutes; your time will vary depending on your network
connection speed, machine speed, and other variables.
Once the script completes, perform the following steps:
-
Verify that the Virtuoso Conductor (HTTP-based Admin UI) is in
place via:
http://localhost:[port]/conductor
-
Verify that the Virtuoso SPARQL endpoint is in place via:
http://localhost:[port]/sparql
-
Verify that the Precision Search & Find UI is in place
via:
http://localhost:[port]/fct
-
Verify that the Virtuoso hosted PivotViewer is in place via:
http://localhost:[port]/PivotViewer
Related
02/17/2011 17:15 GMT-0500 |
Modified: 03/29/2011 10:09
GMT-0500 |
Re-introducing the Virtuoso Virtual Database Engine
In recent times a lot of the commentary and focus re. Virtuoso
has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked
is the sophisticated Virtual Database Engine that provides the
foundation for all of Virtuoso's data integration
capabilities.
In this post I provide a brief re-introduction to this essential
aspect of Virtuoso.
What is it?
This component of Virtuoso is known as the Virtual Database
Engine (VDBMS). It provides transparent high-performance and secure
access to disparate data sources that are external to Virtuoso. It
enables federated access and integration of data hosted by any
ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or
Document (Free Text)-oriented Content Management System. In
addition, it facilitates integration with Web Services
(SOAP-based SOA RPCs or REST-fully accessible Web Resources).
Why is it important?
In the most basic sense, you shouldn't need to upgrade your
existing database engine version simply because your current DBMS
and Data Access Driver combo isn't compatible with ODBC-compliant
desktop tools such as Microsoft Access, Crystal Reports,
BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications.
Simply place Virtuoso in front of your so-called "legacy database,"
and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise,
through application evolution, company mergers, or acquisitions, is
often faced with disparately-structured data residing in any number
of line-of-business-oriented data silos. Compounding the problem is
the exponential growth of user-generated data via new social
media-oriented collaboration tools and platforms. For companies to
cost-effectively harness the opportunities accorded by the
increasing intersection between line-of-business applications and
social media, virtualization of data silos must be achieved, and
this virtualization must be delivered in a manner that doesn't
prohibitively compromise performance or completely undermine
security at either the enterprise or personal level. Again, this is
what you get by simply installing Virtuoso.
How do I use it?
The VDBMS may be used in a variety of ways, depending on the
data access and integration task at hand. Examples include:
Relational Database Federation
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA
connection to multiple ODBC- or JDBC-accessible RDBMS data sources,
concurrently, with the ability to perform intelligent distributed
joins against externally-hosted database tables. For instance, you
can join internal human resources data against internal sales and
external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come
from Ingres!
Conceptual Level Data Access using the RDF Model
You can construct RDF Model-based Conceptual Views atop
Relational Data Sources. This is about generating HTTP-based
Entity-Attribute-Value (E-A-V) graphs
using data culled "on the fly" from native or external data sources
(Relational Tables/Views, XML-based Web Services, or User Defined
Types).
You can also derive RDF Model-based Conceptual Views from Web
Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component)
enables you to generate RDF Model Linked Data via a RESTful Web
Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the
URL of a Web Resource in the FROM clause of a
SPARQL query).
It's important to note that Views take the form of HTTP links
that serve as both Data Source Names and Data Source Addresses.
This enables you to query and explore relationships across entities
(i.e., People, Places, and other Real World Things) via HTTP
clients (e.g., Web Browsers) or directly via SPARQL Query Language
constructs transmitted over HTTP.
Conceptual Level Data Access using ADO.NET Entity Frameworks
As an alternative to RDF, Virtuoso can expose ADO.NET Entity
Frameworks-based Conceptual Views over Relational Data Sources. It
achieves this by generating Entity Relationship graphs via its
native ADO.NET Provider, exposing all externally attached ODBC- and
JDBC-accessible data sources. In addition, the ADO.NET Provider
supports direct access to Virtuoso's native RDF database engine,
eliminating the need for resource intensive Entity Frameworks model
transformations.
Related
02/17/2010 16:38 GMT-0500 |
Modified: 02/17/2010 16:46
GMT-0500 |
What is the DBpedia Project? (Updated)
The recent Wikipedia imbroglio centered around
DBpedia is the fundamental driver for this
particular blog post. At time of writing this blog post,
the DBpedia project definition in Wikipedia
remains unsatisfactory due to the following shortcomings:
- inaccurate and incomplete definition of the Project's What,
Why, Who, Where, When, and How
- inaccurate reflection of project essence, by skewing focus
towards data extraction and data set dump
production, which is at best a quarter of the project.
Here are some insights on DBpedia, from the perspective of
someone intimately involved with the other three-quarters of the
project.
What is DBpedia?
A live Web accessible RDF model database (Quad
Store) derived from Wikipedia content snapshots, taken
periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently
HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot
(warehouse) and Live Editions (currently being hot-staged).
This post is about the snapshot (warehouse) edition, I'll drop a
different post about the DBpedia Live Edition where a new
Delta-Engine covers both extraction and database record
replacement, in realtime.
When was it Created?
As an idea under the moniker "DBpedia" it was conceptualized in
late 2006 by researchers at University of Leipzig (lead by Soren
Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of
DBpedia (as described above) was released in February 2007. The
official DBpedia coming out party occurred at WWW2007, Banff,
during the inaugural Linked Data gathering, where it
showcased the virtues and immense potential of TimBL's Linked Data meme.
Who's Behind It?
OpenLink Software (developers of OpenLink
Virtuoso and providers of Web Hosting
infrastructure), University of Leipzig, and Freie Univerity,
Berlin. In addition, there is a burgeoning community of
collaborators and contributors responsible DBpedia based
applications, cross-linked data sets, ontologies (OpenCyc,
SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia
wouldn't be possible without the global content contribution and
curation efforts of Wikipedians, a point typically overlooked
(albeit inadvertently).
How is it Constructed?
The steps are as follows:
- RDF data set dump preparation via Wikipedia content extraction
and transformation to RDF model data, using the N3 data
representation format - Java and PHP
extraction code produced and maintained by the teams at Leipzig and
Berlin
- Deployment of Linked Data that enables Data browsing and
exploration using any HTTP aware user agent (e.g. basic Web
Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the
Pubby Linked Data Server during the early months of the DBpedia
project)
- SPARQL compliant Quad Store, enabling direct access to database
records via SPARQL (Query language, REST or SOAP Web Service, plus
a variety of query results serialization formats) - OpenLink
Virtuoso since first public release of DBpedia
In a nutshell, there are four distinct and vital components to
DBpedia. Thus, DBpedia doesn't exist if all the project offered was
a collection of RDF data dumps. Likewise, it doesn't exist if you
have a SPARQL compliant Quad Store without loaded data sets, and of
course it doesn't exist if you have a fully loaded SPARQL compliant
Quad Store is up to the cocktail of challenges presented by live
Web accessibility.
Why is it Important?
It remains a live exemplar for any individual or organization
seeking to publishing or exploit HTTP based Linked Data on the
World Wide Web. Its existence continues to
stimulate growth in both density and quality of the burgeoning Web
of Linked Data.
How Do I Use it?
In the most basic sense, simply browse the HTML pages en route
to discovery erstwhile relationships that exist across named entities and subject
matter concepts / headings. Beyond that, simply look at DBpedia
as a master lookup table in a Web hosted distributed database setup; enabling you to
mesh your local domain specific details with DBpedia records via
structured relations (triples or 3-tuples records) comprised of
HTTP URIs from both realms e.g., owl:sameAs relations.
What Can I Use it For?
Expanding on the Master-Details point above, you can use its
rich URI corpus to alleviate tedium associated
with activities such as:
- List maintenance - e.g., Countries, States, Companies, Units of
Measurement, Subject Headings etc.
- Tagging - as a compliment to existing practices
- Analytical Research - you're only a LINK (URI) away from
erstwhile difficult to attain research data spread across a broad
range of topics
- Closed Vocabulary Construction - rather than commence the
futile quest of building your own closed vocabulary, simply
leverage Wikipedia's human curated vocabulary as our common
base.
Related
01/31/2010 17:45 GMT-0500 |
Modified: 01/31/2010 17:46
GMT-0500 |
What is the DBpedia Project? (Updated)
The recent Wikipedia imbroglio centered around
DBpedia is the fundamental driver for this
particular blog post. At time of writing this blog post,
the DBpedia project definition in Wikipedia
remains unsatisfactory due to the following shortcomings:
- inaccurate and incomplete definition of the Project's What,
Why, Who, Where, When, and How
- inaccurate reflection of project essence, by skewing focus
towards data
extraction and data set dump production, which is at best a quarter
of the project.
Here are some insights on DBpedia, from the perspective of
someone intimately involved with the other three-quarters of the
project.
What is DBpedia?
A live Web accessible RDF
model database (Quad Store) derived from Wikipedia content
snapshots, taken periodically. The RDF database underlies a
Linked Data Space comprised of: HTML (and most recently
HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot
(warehouse) and Live Editions (currently being hot-staged).
This post is about the snapshot (warehouse) edition, I'll drop a
different post about the DBpedia Live Edition where a new
Delta-Engine covers both extraction and database record
replacement, in realtime.
When was it Created?
As an idea under the moniker "DBpedia" it was conceptualized in
late 2006 by researchers at University of Leipzig (lead by Soren
Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of
DBpedia (as described above) was released in February 2007. The
official DBpedia coming out party occurred at WWW2007, Banff,
during the inaugural Linked Data gathering, where it
showcased the virtues and immense potential of TimBL's Linked Data meme.
Who's Behind It?
OpenLink Software (developers of OpenLink
Virtuoso and providers of Web Hosting
infrastructure), University of Leipzig, and Freie Univerity,
Berlin. In addition, there is a burgeoning community of
collaborators and contributors responsible DBpedia based
applications, cross-linked data sets, ontologies (OpenCyc,
SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia
wouldn't be possible without the global content contribution and
curation efforts of Wikipedians, a point typically overlooked
(albeit inadvertently).
How is it Constructed?
The steps are as follows:
- RDF data set dump preparation via Wikipedia content extraction
and transformation to RDF model data, using the N3 data
representation format - Java and PHP
extraction code produced and maintained by the teams at Leipzig and
Berlin
- Deployment of Linked Data that enables Data browsing and
exploration using any HTTP aware user agent (e.g. basic Web
Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the
Pubby Linked Data Server during the early months of the DBpedia
project)
- SPARQL compliant Quad Store, enabling direct access to database
records via SPARQL (Query language, REST or SOAP Web Service, plus
a variety of query results serialization formats) - OpenLink
Virtuoso since first public release of DBpedia
In a nutshell, there are four distinct and vital components to
DBpedia. Thus, DBpedia doesn't exist if all the project offered was
a collection of RDF data dumps. Likewise, it doesn't exist without
a fully populated SPARQL compliant Quad Store. Last but not least,
it doesn't exist if you have a fully loaded SPARQL compliant Quad
Store isn't up to the cocktail of challenges (query load and
complexity) presented by live Web database accessibility.
Why is it Important?
It remains a live exemplar for any individual or organization
seeking to publishing or exploit HTTP based Linked Data on the
World Wide Web. Its existence continues to
stimulate growth in both density and quality of the burgeoning Web
of Linked Data.
How Do I Use it?
In the most basic sense, simply browse the HTML based resource
decriptor pages en route to discovering erstwhile undiscovered
relationships that exist across named entities and subject
matter concepts / headings. Beyond that, simply look at DBpedia
as a master lookup table in a Web hosted distributed database setup; enabling you to
mesh your local domain specific details with DBpedia records via
structured relations (triples or 3-tuples records), comprised of
HTTP URIs from both realms e.g., via owl:sameAs relations.
What Can I Use it For?
Expanding on the Master-Details point above, you can use its
rich URI corpus to alleviate tedium associated
with activities such as:
- List maintenance - e.g., Countries, States, Companies, Units of
Measurement, Subject Headings etc.
- Tagging - as a compliment to existing practices
- Analytical Research - you're only a LINK (URI) away from
erstwhile difficult to attain research data spread across a broad
range of topics
- Closed Vocabulary Construction - rather than commence the
futile quest of building your own closed vocabulary, simply
leverage Wikipedia's human curated vocabulary as our common
base.
Related
01/31/2010 17:43 GMT-0500 |
Modified: 09/15/2010 18:10
GMT-0500 |
5 Very Important Things to Note about HTTP based Linked Data
- It isn't World Wide Web Specific (HTTP != World
Wide Web)
- It isn't Open Data Specific
- It isn't about "Free" (Beer or Speech)
- It isn't about Markup (so don't expect to grok it via "markup
first" approach)
- It's about Hyperdata - the use of HTTP and REST to
deliver a powerful platform agnostic mechanism for Data Reference,
Access, and Integration.
When trying to understand HTTP based Linked Data, especially if you're well versed
in DBMS technology use (User, Power User, Architect, Analyst, DBA,
or Programmer) think:
- Open Database Connectivity (ODBC) without operating system, data model,
or wire-protocol specificity or lock-in potential
- Java Database Connectivity (JDBC) without programming language
specificity
-
ADO.NET without .NET runtime specificity and
.NET bound language specificity
- OLE-DB without Windows operating system & programming
language specificity
- XMLA without XML format specificity - with Tabular and
Multidimensional results formats expressible in a variety of data
representation formats.
- All of the above scoped to the Record rather than Container
level, with Generic HTTP scheme URIs associated with each Record,
Field, and Field value (optionally)
Remember the need for Data Access & Integration technology
is the by product of the following realities:
- Human curated data is ultimately dirty, because:
- our thick thumbs, inattention, distractions, and general
discomfort with typing, make typos prevalent
- database engines exist for a variety of data models - Graph,
Relational, Hierarchical;
- within databases you have different record container/partition
names e.g. Table Names;
- within a database record container you have records that are
really aspects of the same thing (different keys exist in a
plethora of operational / line of business systems that expose
aspects of the same entity e.g., customer data that spans
Accounts, CRM, ERP application databases);
- different field names (one database has "EMP" while another has
"Employee") for the same record
- .
- Units of measurement is driven by locale, the UK office wants
to see sales in Pounds Sterling while the French office prefers
Euros etc.
- All of the above is subject to context halos which can be quite granular re.
sensitivity e.g. staff travel between locations that alter locales
and their roles; basically, profiles matters a lot.
Related
01/31/2010 17:31 GMT-0500 |
Modified: 02/01/2010 09:00
GMT-0500 |
Virtuoso Chronicles from the Field: Nepomuk, KDE, and the quest for a sophisticated RDF DBMS.
For this particular user experience chronicle, I've simply
inserted the content of Sebastian Trueg's post titled: What We Did Last Summer (And the Rest of 2009)
– A Look Back Onto the Nepomuk Development Year ..., directly
into this post, without any additional commentary or
modification.
2009 is over. Yeah, sure, trueg, we know that, it has been
over for a while now! Ok, ok, I am a bit late, but still I
would like to get this one out - if only for my archive. So here
goes.
Let’s start with the major topic of 2009 (and also the beginning
of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the
same problems: you either used the sesame2 backend which depends on Java and
steals all of your memory or you were stuck with Redland which had the
worst performance and missed some SPARQL features making important parts of
Nepomuk like queries unusable. So more than a year ago I had
the idea to use the one GPL’ed database server out there that
supported RDF in a professional manner: OpenLink’s
Virtuoso. It has all the features we need,
has a very good performance, and scales up to dimensions we will
probably never reach on the desktop (yeah, right, and 64k main
memory will be enough forever!). So very early I started
coding the necessary Soprano plugin which would talk to a locally
running Virtuoso server through ODBC. But since I ran into tons of small
problems (as always) and got sidetracked by other tasks I did not
finish it right away. OpenLink, however, was very interested in the
idea of their server being part of every KDE installation (why
wouldn’t they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable
for the desktop but also helped in debugging all the problems that
I had left. Many test runs, patches, and a Virtuoso 5.0.12 release
later I could finally announce the Virtuoso
integration as usable.
Then end of last year I dropped the support for sesame2 and
redland. Virtuoso is now the only supported database backend. The
reason is simple: Virtuoso is way more powerful than the rest - not
only in terms of performance - and it is fully implemented in
C(++) without any traces of Java. Maybe even
more important is the integration of the full text index which
makes the previously used CLucene index unnecessary. Thus, we can
finally combine full text and graph queries in one SPARQL query.
This results in a cleaner API and way faster return of search
results since there is no need to combine the results from several
queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss
later.
So now the only thing I am waiting for is the first bugfix
release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make
6.0.0 fail with Nepomuk. Should be out any day now. :)
The Nepomuk Query API
Querying data in
Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the
very limited capabilities of the ResourceManager to list resources with
certain properties or of a certain type; or 2. Write your own
SPARQL query using ugly QString::arg
replacements.
With the introduction of Virtuoso and its awesome power we can
now do pretty much everything in one query. This allowed me to finally create a query API for KDE:
Nepomuk::Query::Query and friends. I won’t
go into much detail here since I did that before.
All in all you should remember one thing: whenever you think
about writing your own SPARQL query in a KDE application - have a
look at libnepomukquery. It is very likely that you can avoid the
hassle of debugging a query by using the query API.
The first nice effect of the new API (apart from me using it all
over the place obviously) is the new query interface in Dolphin.
Internally it simply combines a bunch of Nepomuk::Query::Term objects into a
Nepomuk::Query::AndTerm. All very readable
and no ugly query strings.
Dolphin Search Panel in KDE SC 4.4
Shared Desktop Ontologies
An important part of the Nepomuk
research project was the creation of a set of ontologies for describing desktop resources
and their metadata. After the Xesam
project under the umbrella of freedesktop.org had been convinced to use
RDF for describing file metadata they developed their own ontology.
Thanks to Evgeny (phreedom) Egorochkin and Antonie Mylka both the
Xesam ontology and the Nepomuk Information Elements Ontology were already
very close in design. Thus, it was relatively easy to merge the two
and be left with only one ontology to support. Since then not only
KDE but also Strigi and Tracker are using the Nepomuk ontologies.
At the Gran Canaria Desktop Summit I met some of the guys from
Tracker and we tried to come up with a plan to create a joint
project to maintain the ontologies. This got off to a rough start
as nobody really felt responsible. So I simply took the initiative
and released the shared-desktop-ontologies version 0.1 in
November 2009. The result was a s***-load of hate-mails and bug
reports due to me breaking KDE build. But in the end it was worth
it. Now the package is established and other projects can start to
pick it up to create data compatible to the Nepomuk system and
Tracker.
Today the ontologies (and the shared-desktop-ontologies package)
are maintained in the Oscaf project at Sourceforge. The situation
is far from perfect but it is a good start. If you need specific
properties in the ontologies or are thinking about creating one for
your own application - come and join us in the bug tracker…
Timeline KIO Slave
It was at the Akonadi meeting that Will Stephenson and myself
got into talking about mimicking some Zeitgeist functionality through Nepomuk.
Basically it meant gathering some data when opening and when saving
files. We quickly came up with a hacky patch for KIO and KFileDialog which covered most cases and
allowed us to track when a file was modified and by which
application. This little experiment did not leave that state though
(it will, however, this year) but another one did: Zeitgeist also
provides a fuse filesystem which allows to browse the files by
modification dates. Well, whatever fuse can do, KIO can do as well.
Introducing the timeline:/ KIO slave which
gives a calendar view onto your files.
Tips And Tricks
Well, I thought I would mention the Tips And Tricks section I wrote for the
techbase. It might not be a big deal but I
think it contains some valuable information in case you are using
Nepomuk as a developer.
Google Summer Of Code 2009
This time around I had the privilege to mentor two students in the Google Summer
of Code. Alessandro Sivieri and Adam Kidder did outstanding work on
Improved Virtual Folders and the Smart File Dialog.
Adam’s work lead me to some heavy improvements in the Nepomuk
KIO slaves myself which I only finished this week (more details on
that coming up). Alessandro continued his work on faceted file
browsing in KDE and created:
Sembrowser
Alessandro is following up on his work to make faceted file
browsing a reality in 2010 (and KDE SC 4.5). Since it was too late
to get faceted browsing into KDE SC 4.4 he is working on Sembrowser, a stand-alone faceted file
browser which will be the grounds for experiments until the code is
merged into Dolphin.
Faceted Browsing in KDE with
Sembrowser
Nepomuk Workshops
In 2009 I organized the first Nepomuk workshop in Freiburg,
Germany. And also the second one. While I reported properly on the first one I
still owe a summary for the second one. I will get around to that -
sooner or later. ;)
CMake Magic
Soprano gives us a nice command line tool to
create a C++ namespace from an ontology file: onto2vocabularyclass. It produces nice
convenience namespaces like Soprano::Vocabulary::NAO. Nepomuk adds
another tool named nepomuk-rcgen. Both were a bit clumsy to
use before. Now we have nice cmake macros which make it very simple
to use both.
See the techbase article on how to use the new
macros.
Bangarang
Without my knowledge (imagine that!) Andrew Lake created
an amazing new media player named Bangarang - a Jamaican word for noise,
chaos or disorder. This player is Nepomuk-enabled in the sense
that it has a media library which lets you browse your media files
based on the Nepomuk data. It remembers the number of times a song
or a video has been played and when it was played last. It allows
to add detail such as the TV series name, season, episode number,
or actors that are in the video - all through Nepomuk (I hope we
will soon get tvdb integration).
Edit metadata directly in Bangarang
Dolphin showing TV episode metadata
created by Bangarang
And of course searching for it works,
too...
And it is pretty, too...
I am especially excited about this since finally applications
not written or mentored by me start contributing Nepomuk data.
Gran Canaria Desktop Summit
2009 was also the year of the first Gnome-KDE joint-conference.
Let me make a bulletin for completeness and refer to my previous blog post reporting on my
experiences on the island.
Well, that was by far not all I did in 2009 but I think I
covered most of the important topics. And after all it is ‘just a
blog entry’ - there is no need for completeness. Thanks for
reading.
"
01/28/2010 11:14 GMT-0500 |
Modified: 02/01/2010 09:02
GMT-0500 |
Time for RDBMS Primacy Downgrade is Nigh! (No Embedded Images Edition - Update 1)
As the world works it way through a "once in a generation"
economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the
apex of the data access and data management pyramid is
nigh.
What is the Data Access, and Data Management Value
Pyramid?
As depicted below, a top-down view of the data access and data
management value chain. The term: apex, simply indicates value
primacy, which takes the form of a data access API based entry
point into a DBMS realm -- aligned to an underlying data model.
Examples of data access APIs include: Native Call Level Interfaces
(CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
See: AVF Pyramid Diagram.
The degree to which ad-hoc views of data managed by a DBMS can
be produced and dispatched to relevant data consumers (e.g.
people), without compromising concurrency, data durability, and
security, collectively determine the "Agility Value Factor" (AVF)
of a given DBMS. Remember, agility as the cornerstone of
environmental adaptation is as old as the concept of evolution, and
intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to
which DBMS technology affects the ability to effectively implement
"Market Leadership Discipline" along the following pathways:
innovation, operation excellence, or customer intimacy.
Why has RDBMS Primacy has Endured?
Historically, at least since the late '80s, the RDBMS genre of
DBMS has consistently offered the highest AVF relative to other
DBMS genres en route to primacy within the value pyramid. The
desire to improve on paper reports and spreadsheets is basically
what DBMS technology has fundamentally addressed to date, even
though conceptual level interaction with data has never been its
forte.
See: RDBMS Primacy Diagram.
For more then 10 years -- at the very least -- limitations of
the traditional RDBMS in the realm of conceptual level interaction
with data across diverse data sources and schemas (enterprise, Web,
and Internet) has been crystal clear to many
RDBMS technology practitioners, as indicated by some of the quotes
excerpted below:
"Future of Database Research is excellent, but what is the
future of data?"
"..it is hard for me to disagree with the conclusions in this
report. It captures exactly the right thoughts, and should be a
must read for everyone involved in the area of databases and
database research in particular."
-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on
the 2007 RDBMS technology retreat attended by a
number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come
and gone
-
They are direct descendants of System R and Ingres and were architected more than 25
years ago
-
They are advocating "one size fits all"; i.e. a single
engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding
fathers of the RDBMS industry.
Until this point in time, the requisite confluence of
"circumstantial pain" and "open standards" based technology
required to enable an objective "compare and contrast" of RDBMS
engine virtues and viable alternatives hasn't occurred. Thus, the
RDBMS has endured it position of primacy albeit on a "one size fits
all basis".
Circumstantial Pain
As mentioned earlier, we are in the midst of an economic crisis
that is ultimately about a consistent inability to connect dots
across a substrate of interlinked data sources that transcend
traditional data access boundaries with high doses of schematic
heterogeneity. Ironically, in a era of the dot-com, we haven't been
able to make meaningful connections between relevant "real-world
things" that extend beyond primitive data hosted database tables
and content management style document containers; we've struggled
to achieve this in the most basic sense, let alone evolve our
ability to connect inline with the exponential rate at which the Internet & Web
are spawning "universes of discourse" (data spaces) that emanate
from user activity (within the enterprise and across the
Internet & Web). In a nutshell, we haven't been able to upgrade
our interaction with data such that "conceptual models" and
resulting "context lenses" (or facets) become concrete;
by this I mean: real-world entity interaction making its way into the
computer realm as opposed to the impedance we all suffer today when
we transition from conceptual model interaction (real-world) to
logical model interaction (when dealing with RDBMS based data
access and data management).
Here are some simple examples of what I can only best describe
as: "critical dots unconnected", resulting from an inability to
interact with data conceptually:
Government (Globally) -
Financial regulatory bodies couldn't effectively discern that a
Credit Default Swap is an Insurance policy in
all but literal name. And in not doing so the cost of an
unregulated insurance policy laid the foundation for
exacerbating the toxicity of fatally flawed mortgage backed
securities. Put simply: a flawed insurance policy was the fallback
on a toxic security that financiers found exotic based on
superficial packaging.
Enterprises -
Banks still don't understand that capital really does exists in
tangible and intangible forms; with the intangible being the
variant that is inherently dynamic. For example, a tech companies
intellectual capital far exceeds the value of fixture, fittings,
and buildings, but you be amazed to find that in most cases this
vital asset has not significant value when banks get down to the
nitty gritty of debt collateral; instead, a buffer of flawed
securitization has occurred atop a borderline static asset class
covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip
and replace" existing technology without ever effectively
addressing the timeless inability to connect data across disparate
data silos generated by internal enterprise applications, let alone
the broader need to mesh data from the inside with external data
sources. No correlations made between the growth of buzzwords and
the compounding nature of data integration challenges. It's 2009
and only a miniscule number of executives dare fantasize about
being anywhere within distance of the: relevant information at your
fingertips vision.
Looking more holistically at data interaction in general,
whether you interact with data in the enterprise space (i.e., at
work) or on the Internet or Web, you ultimately are delving into a
mishmash of disparate computer systems, applications, service (Web
or SOA), and databases (of the RDBMS variety in a majority of
cases) associated with a plethora of disparate schemas. Yes, but
even today "rip and replace" is still the norm pushed by most
vendors; pitting one mono culture against another as exemplified by
irrelevances such as: FOSS/LAMP vs Commercial or Web vs.
Enterprise, when none of this matters if the data access and
integration issues are recognized let alone addressed (see:
Applications are Like Fish and Data Like
Wine).
Like the current credit-crunch, exponential growth of data
originating from disparate application databases and associated
schemas, within shrinking processing time frames, has triggered a
rethinking of what defines data access and data management value
today en route to an inevitable RDBMS downgrade within the value
pyramid.
Technology
There have been many attempts to address real-world modeling
requirements across the broader DBMS community from Object
Databases to Object-Relational Databases, and more recently the
emergence of simple Entity-Attribute-Value model DBMS engines. In
all cases failure has come down to the existence of one or more of
the following deficiencies, across each potential alternative:
- Query language standardization - nothing close to SQL
standardization
- Data Access API standardization - nothing close to ODBC, JDBC,
OLE-DB, or ADO.NET
- Wire protocol standardization - nothing close to HTTP
- Distributed Identity infrastructure - nothing close to the
non-repudiatable digital Identity that foaf+ssl accords
- Use of Identifiers as network based pointers to data sources -
nothing close to RDF based Linked Data
- Negotiable data representation - nothing close to Mime and HTTP
based Content Negotiation
- Scalability especially in the era of Internet & Web
scale.
Entity-Attribute-Value with Classes & Relationships
(EAV/CR) data models
A common characteristic shared by all post-relational DBMS
management systems (from Object Relational to pure Object) is an
orientation towards variations of EAV/CR based data models.
Unfortunately, all efforts in the EAV/CR realm have typically
suffered from at least one of the deficiencies listed above. In
addition, the same "one DBMS model fits all" approach that lies at
the heart of the RDBMS downgrade also exists in the EAV/CR
realm.
What Comes Next?
The RDBMS is not going away (ever), but its era of primacy -- by
virtue of its placement at the apex of the data access and data
management value pyramid -- is over! I make this bold claim for the
following reasons:
- The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore
e.g., the current global economic crisis remains centered on the
inability to connect dots across "Open World" and "Closed World"
data frontiers
- Entity-Attribute-Value with Classes & Relationships
(EAV/CR) based DBMS models are more effective when dealing with
disparate data associated with disparate schemas, across disparate
DBMS engines, host operating systems, and networks.
Based on the above, it is crystal clear that a different kind of
DBMS -- one with higher AVF relative to the RDBMS -- needs to sit
atop today's data access and data management value pyramid. The
characteristics of this DBMS must include the following:
- Every item of data (Datum/Entity/Object/Resource) has
Identity
- Identity is achieved via Identifiers that aren't locked at the
DBMS, OS, Network, or Application levels
- Object Identifiers and Object values are independent
(extricably linked by association)
- Object values should be de-referencable via Object
Identifier
- Representation of de-referenced value graph (entity,
attributes, and values mesh) must be negotiable (i.e. content
negotiation)
- Structured query language must provide mechanism for Creation,
Deletion, Updates, and Querying of data objects
- Performance & Scalability across "Closed World"
(enterprise) and "Open World" (Internet & Web) realms.
Quick recap, I am not saying that RDBMS engine technology is
dead or obsolete. I am simply stating that the era of RDBMS primacy
within the data access and data management value pyramid is
over.
The problem domain (conceptual model views over heterogeneous
data sources) at the apex of the aforementioned pyramid has simply
evolved beyond the natural capabilities of the RDBMS which is
rooted in "Closed World" assumptions re., data definition, access,
and management. The need to maintain domain based conceptual
interaction with data is now palpable at every echelon within our
"Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with
support for the seven items enumerated above, can trigger the long
anticipated RDBMS downgrade. Such a DBMS would be inherently
multi-model because you would need to the best of RDBMS and EAV/CR
model engines in a single product, with in-built support for HTTP
and other Internet protocols in order to effectively address data
representation and serialization issues.
EAV/CR Oriented Data Access & Management Technology
Examples of contemporary EAV/CR frameworks that provide concrete
conceptual layers for data access and data management currently
include:
The frameworks above provide the basis for a revised AVF
pyramid, as depicted below, that reflects today's data access and
management realities i.e., an Internet & Web driven global
village comprised of interlinked distributed data objects,
compatible with "Open World" assumptions.
See: New EAV/CR Primacy Diagram.
Related
01/27/2009 19:19 GMT-0500 |
Modified: 03/17/2009 11:50
GMT-0500 |
The Time for RDBMS Primacy Downgrade is Nigh!
As the world works it way through a "once in a generation"
economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the
apex of the data access and data management pyramid is
nigh.
What is the Data Access, and Data Management Value
Pyramid?
As depicted below, a top-down view of the data access and data
management value chain. The term: apex, simply indicates value
primacy, which takes the form of a data access API based entry
point into a DBMS realm -- aligned to an underlying data model.
Examples of data access APIs include: Native Call Level Interfaces
(CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
The degree to which ad-hoc views of data managed by a DBMS can
be produced and dispatched to relevant data consumers (e.g.
people), without compromising concurrency, data durability, and
security, collectively determine the "Agility Value Factor" (AVF)
of a given DBMS. Remember, agility as the cornerstone of
environmental adaptation is as old as the concept of evolution, and
intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to
which DBMS technology affects the ability to effectively implement
"Market Leadership Discipline" along the following pathways:
innovation, operation excellence, or customer intimacy.
Why has RDBMS Primacy has Endured?
Historically, at least since the late '80s, the RDBMS genre of
DBMS has consistently offered the highest AVF relative to other
DBMS genres en route to primacy within the value pyramid. The
desire to improve on paper reports and spreadsheets is basically
what DBMS technology has fundamentally addressed to date, even
though conceptual level interaction with data has never been its
forte.
For more then 10 years -- at the very least -- limitations of
the traditional RDBMS in the realm of conceptual level interaction
with data across diverse data sources and schemas (enterprise, Web,
and Internet) has been crystal clear to many
RDBMS technology practitioners, as indicated by some of the quotes
excerpted below:
"Future of Database Research is excellent, but what is the
future of data?"
"..it is hard for me to disagree with the conclusions in this
report. It captures exactly the right thoughts, and should be a
must read for everyone involved in the area of databases and
database research in particular."
-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on
the 2007 RDBMS technology retreat attended by a
number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come
and gone
-
They are direct descendants of System R and Ingres and were architected more than 25
years ago
-
They are advocating "one size fits all"; i.e. a single
engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding
fathers of the RDBMS industry.
Until this point in time, the requisite confluence of
"circumstantial pain" and "open standards" based technology
required to enable an objective "compare and contrast" of RDBMS
engine virtues and viable alternatives hasn't occurred. Thus, the
RDBMS has endured it position of primacy albeit on a "one size fits
all basis".
Circumstantial Pain
As mentioned earlier, we are in the midst of an economic crisis
that is ultimately about a consistent inability to connect dots
across a substrate of interlinked data sources that transcend
traditional data access boundaries with high doses of schematic
heterogeneity. Ironically, in a era of the dot-com, we haven't been
able to make meaningful connections between relevant "real-world
things" that extend beyond primitive data hosted database tables
and content management style document containers; we've struggled
to achieve this in the most basic sense, let alone evolve our
ability to connect inline with the exponential rate at which the Internet & Web
are spawning "universes of discourse" (data spaces) that emanate
from user activity (within the enterprise and across the
Internet & Web). In a nutshell, we haven't been able to upgrade
our interaction with data such that "conceptual models" and
resulting "context lenses" (or facets) become concrete;
by this I mean: real-world entity interaction making its way into the
computer realm as opposed to the impedance we all suffer today when
we transition from conceptual model interaction (real-world) to
logical model interaction (when dealing with RDBMS based data
access and data management).
Here are some simple examples of what I can only best describe
as: "critical dots unconnected", resulting from an inability to
interact with data conceptually:
Government (Globally) -
Financial regulatory bodies couldn't effectively discern that a
Credit Default Swap is an Insurance policy in
all but literal name. And in not doing so the cost of an
unregulated insurance policy laid the foundation for
exacerbating the toxicity of fatally flawed mortgage backed
securities. Put simply: a flawed insurance policy was the fallback
on a toxic security that financiers found exotic based on
superficial packaging.
Enterprises -
Banks still don't understand that capital really does exists in
tangible and intangible forms; with the intangible being the
variant that is inherently dynamic. For example, a tech companies
intellectual capital far exceeds the value of fixture, fittings,
and buildings, but you be amazed to find that in most cases this
vital asset has not significant value when banks get down to the
nitty gritty of debt collateral; instead, a buffer of flawed
securitization has occurred atop a borderline static asset class
covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip
and replace" existing technology without ever effectively
addressing the timeless inability to connect data across disparate
data silos generated by internal enterprise applications, let alone
the broader need to mesh data from the inside with external data
sources. No correlations made between the growth of buzzwords and
the compounding nature of data integration challenges. It's 2009
and only a miniscule number of executives dare fantasize about
being anywhere within distance of the: relevant information at your
fingertips vision.
Looking more holistically at data interaction in general,
whether you interact with data in the enterprise space (i.e., at
work) or on the Internet or Web, you ultimately are delving into a
mishmash of disparate computer systems, applications, service (Web
or SOA), and databases (of the RDBMS variety in a majority of
cases) associated with a plethora of disparate schemas. Yes, but
even today "rip and replace" is still the norm pushed by most
vendors; pitting one mono culture against another as exemplified by
irrelevances such as: FOSS/LAMP vs Commercial or Web vs.
Enterprise, when none of this matters if the data access and
integration issues are recognized let alone addressed (see:
Applications are Like Fish and Data Like
Wine).
Like the current credit-crunch, exponential growth of data
originating from disparate application databases and associated
schemas, within shrinking processing time frames, has triggered a
rethinking of what defines data access and data management value
today en route to an inevitable RDBMS downgrade within the value
pyramid.
Technology
There have been many attempts to address real-world modeling
requirements across the broader DBMS community from Object
Databases to Object-Relational Databases, and more recently the
emergence of simple Entity-Attribute-Value model DBMS engines. In
all cases failure has come down to the existence of one or more of
the following deficiencies, across each potential alternative:
- Query language standardization - nothing close to SQL
standardization
- Data Access API standardization - nothing close to ODBC, JDBC,
OLE-DB, or ADO.NET
- Wire protocol standardization - nothing close to HTTP
- Distributed Identity infrastructure - nothing close to the
non-repudiatable digital Identity that foaf+ssl accords
- Use of Identifiers as network based pointers to data sources -
nothing close to RDF based Linked Data
- Negotiable data representation - nothing close to Mime and HTTP
based Content Negotiation
- Scalability especially in the era of Internet & Web
scale.
Entity-Attribute-Value with Classes & Relationships
(EAV/CR) data models
A common characteristic shared by all post-relational DBMS
management systems (from Object Relational to pure Object) is an
orientation towards variations of EAV/CR based data models.
Unfortunately, all efforts in the EAV/CR realm have typically
suffered from at least one of the deficiencies listed above. In
addition, the same "one DBMS model fits all" approach that lies at
the heart of the RDBMS downgrade also exists in the EAV/CR
realm.
What Comes Next?
The RDBMS is not going away (ever), but its era of primacy -- by
virtue of its placement at the apex of the data access and data
management value pyramid -- is over! I make this bold claim for the
following reasons:
- The Internet aided "Global Village" has brought "Open World" vs "Closed World" assumption issues to the fore
e.g., the current global economic crisis remains centered on the
inability to connect dots across "Open World" and "Closed World"
data frontiers
- Entity-Attribute-Value with Classes & Relationships
(EAV/CR) based DBMS models are more effective when dealing with
disparate data associated with disparate schemas, across disparate
DBMS engines, host operating systems, and networks.
Based on the above, it is crystal clear that a different kind of
DBMS -- one with higher AVF relative to the RDBMS -- needs to sit
atop today's data access and data management value pyramid. The
characteristics of this DBMS must include the following:
- Every item of data (Datum/Entity/Object/Resource) has
Identity
- Identity is achieved via Identifiers that aren't locked at the
DBMS, OS, Network, or Application levels
- Object Identifiers and Object values are independent
(extricably linked by association)
- Object values should be de-referencable via Object
Identifier
- Representation of de-referenced value graph (entity,
attributes, and values mesh) must be negotiable (i.e. content
negotiation)
- Structured query language must provide mechanism for Creation,
Deletion, Updates, and Querying of data objects
- Performance & Scalability across "Closed World"
(enterprise) and "Open World" (Internet & Web) realms.
Quick recap, I am not saying that RDBMS engine technology is
dead or obsolete. I am simply stating that the era of RDBMS primacy
within the data access and data management value pyramid is
over.
The problem domain (conceptual model views over heterogeneous
data sources) at the apex of the aforementioned pyramid has simply
evolved beyond the natural capabilities of the RDBMS which is
rooted in "Closed World" assumptions re., data definition, access,
and management. The need to maintain domain based conceptual
interaction with data is now palpable at every echelon within our
"Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with
support for the seven items enumerated above, can trigger the long
anticipated RDBMS downgrade. Such a DBMS would be inherently
multi-model because you would need to the best of RDBMS and EAV/CR
model engines in a single product, with in-built support for HTTP
and other Internet protocols in order to effectively address data
representation and serialization issues.
EAV/CR Oriented Data Access & Management Technology
Examples of contemporary EAV/CR frameworks that provide concrete
conceptual layers for data access and data management currently
include:
The frameworks above provide the basis for a revised AVF
pyramid, as depicted below, that reflects today's data access and
management realities i.e., an Internet & Web driven global
village comprised of interlinked distributed data objects,
compatible with "Open World" assumptions.
Related
01/24/2009 20:04 GMT-0500 |
Modified: 06/03/2009 18:09
GMT-0500 |
New ADO.NET 3.x Provider for Virtuoso Released (Update 2)
I am pleased to announce the immediate availability of the
Virtuoso ADO.NET 3.5 data provider for
Microsoft's .NET platform.
What is it?
A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally,
it also uses Virtuoso's in-built virtual / federated database layer to provide access to
ODBC and JDBC accessible RDBMS engines such as:
Oracle (7.x to latest), SQL
Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2,
Ingres (6.x to latest), Progress (7.x to
OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC
bridge drivers.
Benefits?
Technical:
It delivers an Entity-Attribute-Value + Classes &
Relationships model over disparate data sources that are
materialized as .NET Entity Framework Objects, which are then
consumable via ADO.NET Data Object Services, LINQ for Entities, and
other ADO.NET data consumers.
The provider is fully integrated into Visual Studio 2008 and
delivers the same "ease of use" offered by Microsoft's own SQL
Server provider, but across Virtuoso, Oracle, Sybase, DB2,
Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL,
Firebird, and others. The same benefits also apply uniformly to
Entity Frameworks compatibility.
Bearing in mind that Virtuoso is a multi-model (hybrid) data
manager, this also implies that you can use .NET Entity Frameworks
against all data managed by Virtuoso. Remember, Virtuoso's SQL
channel is a conduit to Virtuoso's core; thus, RDF (courtesy of
SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data
forms stored in Virtuoso also become accessible via .NET's Entity
Frameworks.
Strategic:
You can choose which entity oriented data access model works
best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks &
Entity SQL. Either way, Virtuoso delivers
a commercial grade, high-performance, secure, and scalable
solution.
How do I use it?
Simply follow one of guides below:
Note: When working with external or 3rd party databases,
simply use the Virtuoso Conductor to link the external data source
into Virtuoso. Once linked, the remote tables will simply be
treated as though they are native Virtuoso tables leaving the
virtual database engine to handle the rest.
This is similar to the role the Microsoft JET engine played in the
early days of ODBC, so if you've ever linked an ODBC data source
into Microsoft Access, you are ready to do the same using
Virtuoso.
Related
01/08/2009 04:36 GMT-0500 |
Modified: 01/08/2009 09:12
GMT-0500 |
Introducing Virtuoso Universal Server (Cloud Edition) for Amazon EC2
What is it?
A pre-installed edition of Virtuoso
for Amazon's EC2 Cloud platform.
What does it offer?
From a Web
Entrepreneur perspective it offers:
- Low cost entry point to a game-changing Web 3.0+ (and beyond)
platform that combines SQL, RDF, XML, and Web Services functionality
- Flexible variable cost model (courtesy of EC2
DevPay) tightly bound to revenue generated by your
services
- Delivers federated and/or centralized model flexibility for you
SaaS based solutions
- Simple entry point for developing and deploying sophisticated
database driven applications (SQL or RDF Linked Data Web oriented)
- Complete framework for exploiting OpenID, OAuth (including Role
enhancements) that simplifies exploitation of these vital Identity
and Data Access
technologies
- Easily implement RDF Linked Data based Mail, Blogging, Wikis,
Bookmarks, Calendaring, Discussion Forums, Tagging,
Social-Networking as Data Space (data containers) features of your
application or service offering
- Instant alleviation of challenges (e.g. service costs and
agility) associated with Data Portability and Open Data Access across
Web 2.0 data silos
- LDAP integration for Intranet / Extranet style applications.
From the DBMS engine perspective it provides you with one or
more pre-configured instances of Virtuoso that enable immediate
exploitation of the following services:
- RDF Database (a Quad Store with SPARQL & SPARUL Language & Protocol
support)
-
SQL Database (with ODBC, JDBC, OLE-DB, ADO.NET, and XMLA driver access)
- XML Database (XML Schema, XQuery/Xpath,
XSLT, Full Text Indexing)
- Full Text Indexing.
From a Middleware perspective it provides:
- RDF Views (Wrappers / Semantic Covers) over SQL, XML, and other
data sources accessible via SOAP or REST style Web Services
- Sponger Service for converting non RDF information resources into RDF Linked Data "on the fly" via a large
collection of pre-installed RDFizer Cartridges.
From the Web Server Platform perspective it provides an
alternative to LAMP stack components such as MySQL
and Apace by offering
- HTTP Web Server
- WebDAV Server
- Web Application Server (includes PHP
runtime hosting)
- SOAP or REST style Web Services Deployment
- RDF Linked Data Deployment
- SPARQL (SPARQL Query Language) and SPARUL (SPARQL Update
Language) endpoints
- Virtuoso Hosted PHP packages for MediaWiki, Drupal, Wordpress, and phpBB3
(just install the relevant Virtuoso Distro. Package).
From the general System Administrator's perspective it
provides:
- Online Backups (Backup Set dispatched to S3 buckets, FTP, or
HTTP/WebDAV server locations)
- Synchronized Incremental Backups to Backup Set locations
- Backup Restore from Backup Set location (without exiting to EC2
shell).
Higher level user oriented offerings include:
- OpenLink Data Explorer front-end for exploring the burgeoning
Linked Data Web
- Ajax based SPARQL Query Builder (iSPARQL) that enables SPARQL
Query construction by Example
- Ajax based SQL Query Builder (QBE) that enables SQL Query
construction by Example.
For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers
it includes Distributed Collaboration Tools & Social Media
realm functionality courtesy of ODS that includes:
- Point of presence on the Linked Data Web that meshes your
Identity and your Data via URIs
- System generated Social Network Profile & Contact Data via
FOAF?
- System generated SIOC (Semantically Interconnected Online
Community) Data Space (that includes a Social Graph)
exposing all your Web data in RDF Linked Data form
- System generated OpenID and automatic integration with
FOAF
- Transparent Data Integration across Facebook, Digg, LinkedIn,
FriendFeed, Twitter, and any other Web 2.0 data space equipped with
RSS / Atom support and/or REST style Web Services
- In-built support for SyncML which enables data synchronization
with Mobile Phones.
How Do I Get Going with It?
11/28/2008 19:27 GMT-0500 |
Modified: 11/28/2008 16:06
GMT-0500 |
|
|