The problems typically take the following form:
To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.
Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.
Many options allow you to easily and quickly generate RDF data from other data sources:
Install the Faceted Browser VAD package (fct_dav.vad
) which delivers the following:
Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --
http://<cname>[:<port>]/describe/?uri=<entity-uri>
<cname>[:<port>]
gets replaced by the host and port of your Virtuoso instance<entity-uri>
gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
In this post I provide a brief re-introduction to this essential aspect of Virtuoso.
This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).
In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.
The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!
You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).
You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).
It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.
As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.
Note to Web Programmers: Linked Data is about Data (Wine) and not about Code (Fish). Thus, it isn't a "programmer only zone", far from it. More than anything else, its inherently inclusive and spreads its participation net widely across: Data Architects, Data Integrators, Power Users, Knowledge Workers, Information Workers, Data Analysts, etc.. Basically, everyone that can "click on a link" is invited to this particular party; remember, it is about "Linked Data" not "Linked Code", after all. :-)
Here is an example of a Linked Data value pyramid that I am stumbling across --with some frequency-- these days (note: 1 being the pyramid apex):
Basically, Linked Data deployment (assigning de-referencable HTTP URIs to DBMS records, their attributes, and attribute values [optionally] ) is occurring last. Even worse, this happens in the context of Linked Open Data oriented endeavors, resulting in nothing but confusion or inadvertent perpetuation of the overarching pragmatically challenged "Semantic Web" stereotype.
As you can imagine, hitting SPARQL as your introduction to Linked Data is akin to hitting SQL as your introduction to Relational Database Technology, neither is an elevator-style value prop. relay mechanism.
In the relational realm, killer demos always started with desktop productivity tools (spreadsheets, report-writers, SQL QBE tools etc.) accessing, relational data sources en route to unveiling the "Productivity" and "Agility" value prop. that such binding delivered i.e., the desktop application (clients) and the databases (servers) are distinct, but operating in a mutually beneficial manner to all, courtesy of a data access standards such as ODBC (Open Database Connectivity).
In the Linked Data realm, learning to embrace and extend best practices from the relational dbms realm remains a challenge, a lot of this has to do with hangovers from a misguided perception that RDF databases will somehow completely replace RDBMS engines, rather than compliment them. Thus, you have a counter productive variant of NIH (Not Invented Here) in play, taking us to the dreaded realm of: Break the Pot and You Own It (exemplified by the 11+ year Semantic Web Project comprehension and appreciation odyssey).
From my vantage point, here is how I believe the Linked Data value pyramid should be layered, especially when communicating the essential value prop.:
When trying to understand HTTP based Linked Data, especially if you're well versed in DBMS technology use (User, Power User, Architect, Analyst, DBA, or Programmer) think:
Remember the need for Data Access & Integration technology is the by product of the following realities:
2009 is over. Yeah, sure, trueg, we know that, it has been over for a while now! Ok, ok, I am a bit late, but still I would like to get this one out - if only for my archive. So here goes.
Letâs start with the major topic of 2009 (and also the beginning of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the same problems: you either used the sesame2 backend which depends on Java and steals all of your memory or you were stuck with Redland which had the worst performance and missed some SPARQL features making important parts of Nepomuk like queries unusable. So more than a year ago I had the idea to use the one GPLâed database server out there that supported RDF in a professional manner: OpenLinkâs Virtuoso. It has all the features we need, has a very good performance, and scales up to dimensions we will probably never reach on the desktop (yeah, right, and 64k main memory will be enough forever!). So very early I started coding the necessary Soprano plugin which would talk to a locally running Virtuoso server through ODBC. But since I ran into tons of small problems (as always) and got sidetracked by other tasks I did not finish it right away. OpenLink, however, was very interested in the idea of their server being part of every KDE installation (why wouldnât they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable for the desktop but also helped in debugging all the problems that I had left. Many test runs, patches, and a Virtuoso 5.0.12 release later I could finally announce the Virtuoso integration as usable.
Then end of last year I dropped the support for sesame2 and redland. Virtuoso is now the only supported database backend. The reason is simple: Virtuoso is way more powerful than the rest - not only in terms of performance - and it is fully implemented in C(++) without any traces of Java. Maybe even more important is the integration of the full text index which makes the previously used CLucene index unnecessary. Thus, we can finally combine full text and graph queries in one SPARQL query. This results in a cleaner API and way faster return of search results since there is no need to combine the results from several queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss later.
So now the only thing I am waiting for is the first bugfix release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make 6.0.0 fail with Nepomuk. Should be out any day now. :)
Querying data in Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the very limited capabilities of the ResourceManager to list resources with certain properties or of a certain type; or 2. Write your own SPARQL query using ugly QString::arg replacements.
With the introduction of Virtuoso and its awesome power we can now do pretty much everything in one query. This allowed me to finally create a query API for KDE: Nepomuk::Query::Query and friends. I wonât go into much detail here since I did that before.
All in all you should remember one thing: whenever you think about writing your own SPARQL query in a KDE application - have a look at libnepomukquery. It is very likely that you can avoid the hassle of debugging a query by using the query API.
The first nice effect of the new API (apart from me using it all over the place obviously) is the new query interface in Dolphin. Internally it simply combines a bunch of Nepomuk::Query::Term objects into a Nepomuk::Query::AndTerm. All very readable and no ugly query strings.
An important part of the Nepomuk research project was the creation of a set of ontologies for describing desktop resources and their metadata. After the Xesam project under the umbrella of freedesktop.org had been convinced to use RDF for describing file metadata they developed their own ontology. Thanks to Evgeny (phreedom) Egorochkin and Antonie Mylka both the Xesam ontology and the Nepomuk Information Elements Ontology were already very close in design. Thus, it was relatively easy to merge the two and be left with only one ontology to support. Since then not only KDE but also Strigi and Tracker are using the Nepomuk ontologies.
At the Gran Canaria Desktop Summit I met some of the guys from Tracker and we tried to come up with a plan to create a joint project to maintain the ontologies. This got off to a rough start as nobody really felt responsible. So I simply took the initiative and released the shared-desktop-ontologies version 0.1 in November 2009. The result was a s***-load of hate-mails and bug reports due to me breaking KDE build. But in the end it was worth it. Now the package is established and other projects can start to pick it up to create data compatible to the Nepomuk system and Tracker.
Today the ontologies (and the shared-desktop-ontologies package) are maintained in the Oscaf project at Sourceforge. The situation is far from perfect but it is a good start. If you need specific properties in the ontologies or are thinking about creating one for your own application - come and join us in the bug trackerâ¦
It was at the Akonadi meeting that Will Stephenson and myself got into talking about mimicking some Zeitgeist functionality through Nepomuk. Basically it meant gathering some data when opening and when saving files. We quickly came up with a hacky patch for KIO and KFileDialog which covered most cases and allowed us to track when a file was modified and by which application. This little experiment did not leave that state though (it will, however, this year) but another one did: Zeitgeist also provides a fuse filesystem which allows to browse the files by modification dates. Well, whatever fuse can do, KIO can do as well. Introducing the timeline:/ KIO slave which gives a calendar view onto your files.
Well, I thought I would mention the Tips And Tricks section I wrote for the techbase. It might not be a big deal but I think it contains some valuable information in case you are using Nepomuk as a developer.
This time around I had the privilege to mentor two students in the Google Summer of Code. Alessandro Sivieri and Adam Kidder did outstanding work on Improved Virtual Folders and the Smart File Dialog.
Adamâs work lead me to some heavy improvements in the Nepomuk KIO slaves myself which I only finished this week (more details on that coming up). Alessandro continued his work on faceted file browsing in KDE and created:
Alessandro is following up on his work to make faceted file browsing a reality in 2010 (and KDE SC 4.5). Since it was too late to get faceted browsing into KDE SC 4.4 he is working on Sembrowser, a stand-alone faceted file browser which will be the grounds for experiments until the code is merged into Dolphin.
In 2009 I organized the first Nepomuk workshop in Freiburg, Germany. And also the second one. While I reported properly on the first one I still owe a summary for the second one. I will get around to that - sooner or later. ;)
Soprano gives us a nice command line tool to create a C++ namespace from an ontology file: onto2vocabularyclass. It produces nice convenience namespaces like Soprano::Vocabulary::NAO. Nepomuk adds another tool named nepomuk-rcgen. Both were a bit clumsy to use before. Now we have nice cmake macros which make it very simple to use both.
See the techbase article on how to use the new macros.
Without my knowledge (imagine that!) Andrew Lake created an amazing new media player named Bangarang - a Jamaican word for noise, chaos or disorder. This player is Nepomuk-enabled in the sense that it has a media library which lets you browse your media files based on the Nepomuk data. It remembers the number of times a song or a video has been played and when it was played last. It allows to add detail such as the TV series name, season, episode number, or actors that are in the video - all through Nepomuk (I hope we will soon get tvdb integration).
I am especially excited about this since finally applications not written or mentored by me start contributing Nepomuk data.
2009 was also the year of the first Gnome-KDE joint-conference. Let me make a bulletin for completeness and refer to my previous blog post reporting on my experiences on the island.
Well, that was by far not all I did in 2009 but I think I covered most of the important topics. And after all it is âjust a blog entryâ - there is no need for completeness. Thanks for reading.
Dr. Dre is one of the artists in the Linked Data Space we host for the BBC. He is also referenced in music oriented data spaces such as DBpedia, MusicBrainz and Last.FM (to name a few).
How do I obtain a holistic view of the entity "Dr. Dre" across the BBC, MusicBrainz, and Last.FM data spaces? We know the BBC published Linked Data, but what about Last.FM and MusicBrainz? Both of these data spaces only expose XML or JSON data via REST APIs?
The following took place:
The new enhanced URI for Dr. Dre now provides a rich holistic view of the aforementioned "Artist" entity. This URI is usable anywhere on the Web for Linked Data Conduction :-)
As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
See: AVF Pyramid Diagram.The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
See: RDBMS Primacy Diagram.For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
See: New EAV/CR Primacy Diagram.As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.
It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.
The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.
Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.
You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.
Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.
A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.
From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:
From a Middleware perspective it provides:
From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering
From the general System Administrator's perspective it provides:
Higher level user oriented offerings include:
For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:
Like Apache, Virtuoso is a bona-fide Web Application Server for PHP based applications. Unlike Apache, Virtuoso is also the following:
As result of the above, when you deploy a PHP application using Virtuoso, you inherit the following benefits:
As indicated in prior posts, producing RDF Linked Data from the existing Web, where a lot of content is deployed by PHP based content managers, should simply come down to RDF Views over the SQL Schemas and deployment / publishing of the RDF Views in RDF Linked data form. In a nutshell, this is what Virtuoso delivers via its PHP runtime hosting and pre packaged VADs (Virtuoso Application Distribution packages), for popular PHP based applications such as: phpBB3, Drupal, WordPress, and MediaWiki.
In addition, to the RDF Linked Data deployment, we've also taken the traditional LAMP installation tedium out of the typical PHP application deployment process. For instance, you don't have to rebuild PHP 3.5 (32 or 64 Bit) on Windows, Mac OS X, or Linux to get going, simply install Virtuoso, and then select a VAD package for the relevant application and you're set. If the application of choice isn't pre packaged by us, simply install as you would when using Apache, which comes dow to situating the PHP files in your Web structure under the Web Application's root directory.
At the current time, I've only provided links to ZIP files containing the Virtuoso installation "silent movies". This approach is a short-term solution to some of my current movie publishing challenges re. YouTube and Vimeo -- where the compressed output hasn't been of acceptable visual quality. Once resolved, I will publish much more "Multimedia Web" friendly movies :-)
]]>From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).
Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)
CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).
CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.
CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.
CrunchBase: Can you put into laymanâs terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.
CrunchBase: On your website you wrote that âRDF and SPARQL as productivity boosters in everyday web developmentâ. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).
Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.
Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:
]]>By coincidence, Glenn and I presented at this month's Cambridge Semantic Web Gathering.
I've provided a dump of Glenn's issues and my responses below:
RDF is a Graph based Data Model it stands for Resource Description Framework. The Metadata data angle comes from it's Meta Content Framework (MCF) origins. You can express and serialize data based on the RDF Data Model using: Turtle, N3, TriX, N-Triples, and RDF/XML.
These are just appeasement:
- old query paradigm: fishing in dark water with superstitiously tied lures; only works well in carefully stocked lakes
- we don't ask questions by defining answer shapes and then hoping they're dredged up whole.
SPARQL, MQL, and Entity-SQL are Graph Model oriented Query Languages. Query Languages always accompany Database Engines. SQL is the Relational Model equivalent.
Noble attempt to ground the abstract, but:
- URI dereferencing/namespace/open-world issues focus too much technical attention on cross-source cases where the human issues dwarf the technical ones anyway
- FOAF query over the people in this room? forget it.
- link asymmetry doesn't scale
- identity doesn't scale
- generating RDF from non-graph sources: more appeasement, right where the win from actually converting could be biggest!
Innovative use of HTTP to deliver "Data Access by Reference" to the Linked Data Web.
When you have a Data Model, Database Engine, and Query Language, the next thing you need is a Data Access mechanism that provides "Data Access by Reference". ODBC and JDBC (amongst others) provide "Data Access by Reference" via Data Source Names. Linked Data is about the same thing (URIs are Data Source Names) with the following differences:
Hugely motivating and powerful idea, worthy of a superhero (Graphius!), but:
- giant and global parts are too hard, and starting global makes every problem harder
- local projects become unmanageable in global context (Cyc, Freebase data-modeling lists...).
And my thus my plea, again. Forget "semantic" and "web", let's fix the database tech first:
- node/arc data-model, path-based exploratory query-model
- data-graph applications built easily on top of this common model; building them has to be easy, because if it's hard, they'll be bad
- given good database tech, good web data-publishing tech will be trivial!
- given good tools for graphs, the problems of uniting them will be only as hard as they have to be.
Giant Global Graph is just another moniker for a "Web of Linked Data" or "Linked Data Web".
Multi-Model Database technology that meshes the best of the Graph & Relational Models exist. In a nutshell, this is what Virtuoso is all about and it's existed for a very long time :-)
Virtuoso is also a Virtual DBMS engine (so you can see Heterogeneous Relational Data via Graph Model Context Lenses). Naturally, it is also a Linked Data Deployment platform (or Linked Data Sever).
The issue isn't the "Semantic Web" moniker per se., it's about how Linked Data (foundation layer of Semantic Web) gets introduced to users. As I said during the MIT Gathering: "The Web is experienced via Web Browsers primarily, so any enhancement to the Web must be exposed via traditional Web Browsers", which is why we've opted to simply add "View Linked Data Sources" to the existing set of common Browser options that includes:
By exposing the Linked Data Web option as described above, you enable the Web user to knowingly transition from the traditional Rendered (X)HTML page view to the Linked Data View (i.e., structured data behind the page). This simple "User Interaction" tweak makes the notion of exploiting a Structured Web becomes somewhat clearer.
The Linked Data Web isn't a panacea. It's just an addition to the existing Web that enrichens the things you can do with the Web. It's predominance, like any application feature, will be subject to the degrees to which it delivers tangible value or matrializes internal and external opportunity costs.
Note: The Web isn't ubiquitous today becuase all it's users groked HTML Markup. It's ubquitity is a function of opportunity costs: there simply came a point in the Web boostrap when nobody could afford the opportunity costs associated with being off the Web. The same thing will play out with Linked Data and the broader Semantic Web vision.
Links:Naturally, we've decided to join the Crunchbase RDFization party, and have just completed a Virtuoso Sponger Cartridge (an RDFizer) for Crouncbase. What we add in our particular cartridge is additional meshing with DBpedia and Wikicompany Linked Data Spaces, plus RDFizaton of the Crunchbase (X)HTML pages :-)
As I've postulated for a while, Linked Data is about data "Meshing" and "Meshups". This isn't a buzzword play. I am pointing out an important distinction between "Mashups" and "Meshpus". Which goes as follows: "Mashups" are about code level joining devoid of structured modelling, hence the revelation of code as opposed to data when you look behind a "Mashup". "Meshups" on the other hand, are about joining disparate structured data sources across the Web. And when you look behind a "Meshup" you see structured data (preferably Linked Data) that enables further "Meshing".
I truly believe that we are now inches away from critical mass re. Linked Data, and because we are dealing with data, the network-effect will be sky-high! I shudder to think about the state of the Linked Data Web in 12 months time. Yes, I am giving the explosion 12 months (or less). These are very exciting times.
Demo Links:
For best experience I encourage you to look at the OpenLink Data Explorer extension for Firefox (2.x - 3.x). This enables you to go to Crunchbase (X)HTML pages (and other sites on the Web of course), and then simply use the "View | Linked Data Sources" main or context menu sequence to unveil the Linked Data Sources associated with any Web Page.
Of course there is much more to come!
]]>My use of "old media" implies: a place that still seeks subscriber data (no OpenID etc..), for the umpteenth time, as the toll fee for discourse development and participation on the Web.
Anyway, here is what I attempted to post as a comment to Dan Grigorovici's post titled: Where is the Semantic Web Killer App?
Dan,
An intriguing post to say the least :-)
"Linked Data" and "Semantic Web" aren't synonymous, they are simply connected, infrastructure DNA-wise. You can have "Semantic Web" style graphs (i.e RDF Data) and not have "Linked Data" as per Linked Data deployment tenets and best practices, a very important point.
I've stated repeatedly, the "Linked Data" emphasis has more to do with focusing on a point of crystallization within the larger "Semantic Web" vision, so here is a quick recap:
A term coined by TimBL that describes an application of HTTP to the time-tested process of "Data Access by Reference". "Linked Data" adds vital items to the "Data Access by Reference" pattern that have been erstwhile unattainable:
So we have HTTP based URIs as the Data Sources Names for a "Linked Data Web" i.e a Web of inter-connected Data Source Names that de-emphasize the importance of their host containers (Compound Documents / Information Resources).
The business case or value proposition of "Linked Data" is synonymous with the value proposition of data access technologies such as ODBC, JDBC. ADO.NET, OLE-DB, XMLA, and others (enterprise or consumer) in relation to the Individual and Enterprise pursuit of agility; in a realm where data is growing exponentially, and the maximum processing time in a single day remains 24 hrs. Data Access & Data Integration are timeless challenges due to the following constants:
Note: The line between the Enterprise & Individuals continue to blur by the second, this is something I covered during my Linked Data Planet keynote, which is like most things I put on the Web (via this blog data space), is a live and practical demonstration of the virtues of Linked Data courtesy of RDFa, the Bibliographic Ontology, and dereferencable URIs (i.e. HTTP based Data Source Names for Documents and the Entities they host).
The keynote: Creating, Deploying, and Exploiting Linked Data, sought to achieve the fundamental goal of: Demystify the concept of "Linked Data" using anecdotal material that resonates with enterprise decision makers.
To my pleasure, 90% of the audience members confirmed familiarization with the "Data Source Name" concept of Open Database Connectivity (ODBC). Thus, all I had to do was map "Linked Data" to ODBC, and then unveil the fundamental add-ons that "Linked Data" delivers:
I believe a majority of attendees came to realize that the combination above injects a new Web interaction dynamic: access to "Subject matter Concepts" and Named Entities contained within a page via HTTP base Data Source Names (URIs).
BTW - My presentation is a Linked Data Space in it's own right courtesy of the Bibliographic Ontology (which provides slide show modeling) and RDFa that allows me to embed annotations into my Slidy based presentation :-)
ODBC identifies data sources using Data Source Names (DSNs).
WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.
ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.
WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).
ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).
WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!
So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!
Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.
URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.
I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.
The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.
By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)
]]>The Linked Data Web (aka. Linked Data) describes RDF data injected into the Web, where the Data Object Identifiers (URIs) in an RDF graph (collection of RDF triples) are endowed with HTTP based URIs. The net effect of this approach to Data Object Identity is that it facilitates "Open Data Access by Reference" on the Web (aka data dereferencing).
If you recall pre Web ubiquity, in the enterprise realm for instance, Open Database Connectivity (ODBC) emerged as a mechanism for separating Data Access and Data Management in the database oriented Client-Sever model. Although ODBC gave you access to data, the data access entry point took the form of a data access specific naming mechanism called a "Data Source Name" (DSN). ODBC DSNs typically exposed Tables or Views. The same thing applies to JDBC where a non HTTP based URN scheme applies.
Zip forward to where we are today on the Web; the Web is evolving from a Document centric Database to a Distributed Object Database, and you should see that in Linked Data we are now truly looking at the best of all worlds: Web Open Database Connectivity (WODBC) with the following advantages:
To conclude, we now have "Semantics Inside" (RDF or non RDF), "Semantic Web" (RDF graphs with Object Identifiers that may or may not be HTTP based), and "Linked Data Web" (RDF graphs with Object Identifiers that must be HTTP based and dereferencable) oriented applications, in the emerging landscape associated with the "Semantics" moniker.
As per usual, this post is a record in my Blog oriented Data Space on the Web. The permalink of this post is a URI constructed with Giant Global Graph enrichment in mind :-)
]]>Daniel simplifies my post by using diagrams to depict the different paths for PHP based applications exposing Linked Data - especially those that already provide a significant amount of the content that drives Web 2.0.
If all the content in Web 2.0 information resources are distillable into discrete data objects endowed with HTTP based IDs (URIs), with zero "RDF handcrafting Tax", what do we end up with? A Giant Global Graph of Linked Data; the Web as a Database.
So, what used to apply exclusively, within enterprise settings re. Oracle, DB2, Informix, Ingres, Sybase, Microsoft SQL Server, MySQL, PostrgeSQL, Progress Open Edge, Firebird, and others, now applies to the Web. The Web becomes the "Distributed Database Bus" that connects database records across disparate databases (or Data Spaces). These databases manage and expose records that are remotely accessible "by reference" via HTTP.
As I've stated at every opportunity in the past, Web 2.0 is the greatest thing that every happened to the Semantic Web vision :-) Without the "Web 2.0 Data Silo Conundrum" we wouldn't have the cry for "Data Portability" that brings a lot of clarity to some fundamental Web 2.0 limitations that end-users ultimately find unacceptable.
In the late '80s, the SQL Access Group (now part of X/Open) addressed a similar problem with RDBMS silos within the enterprise that lead to the SAG CLI which is exists today as Open Database Connectivity.
In a sense we now have WODBC (Web Open Database Connectivity), comprised of Web Services based CLIs and/or traditional back-end DBMS CLIs (ODBC, JDBC, ADO.NET, OLE-DB, or Native), Query Language (SPARQL Query Language), and a Wire Protocol (HTTP based SPARQL Protocol) delivering Web infrastructure equivalents of SQL and RDA, but much better, and with much broader scope for delivering profound value due to the Web's inherent openness. Today's PHP, Python, Ruby, Tcl, Perl, ASP.NET developer is the enterprise 4GL developer of yore, without enterprise confinement. We could even be talking about 5GL development once the Linked Data interaction is meshed with dynamic languages (delivering higher levels of abstraction at the language and data interaction levels). Even the underlying schemas and basic design will evolve from Closed World (solely) to a mesh of Closed & Open World view schemas.
]]>In the form above (the norm), Wordpress data can be injected into the Linked Data Web via RDFization middleware such as theVirtuoso Sponger (built into all Virtuoso instances) and Triplr. The downside of this approach is that the blog owner doesn't necessary possess full control over their contributions to the emerging Giant Global Graph or Linked Data.
Another route to Linked Data exposure is via Virtuoso's Metaschema Language for producing RDF Views over ODBC/JDBC accessible Data Sources, that enables the following setup:
Alternatively, you can also exploit Virtuoso as the SQL DBMS, RDF DBMS, Application Server, and Linked Data Deployment platform:
How Do I map the WordPress SQL Schema to RDF using Virtuoso?
Read the Meta Schema Language guide or simply apply our "WordPress SQL Schema to RDF" script to your Virtuoso hosted instance. Of course, there are other mappings that cover other PHP applications deployed via Virtuoso:
Yes, integration is hard, but I do profoundly believe that what's been happening on the Web over the last 10 or so years also applies to the Enterprise, and by this I absolutely do not mean "Enterprise 2.0" since "2.0" and productive agility do not compute in my realm of discourse.
large collections of RSS feeds, Wikiwords, Shared Bookmarks, Discussion Forums etc.. when disconnected at the data level (i.e. hosted in pages with no access to the "data behind") simply offer information deluge and inertia (there are only so many hours for processing opaque information sources in a given day).
Enterprises fundamentally need to process information efficiently as part of a perpetual assessment of their relative competitive Strengths, Weaknesses, Opportunities, and Threats (SWOT), in existing and/or future markets. Historically, IT acquisitions have run counter intuitively to the aforementioned quest for "Ability" due to the predominance of "rip and replace" approach technology acquisition that repeatedly creates and perpetuates information silos across Application, Database, Operating System, Development Environment boundaries. The sequence of events typically occurs as follows:
In the early to mid 90's (pre ubiquitous Web), operating system, programming language, operating system, and development framework independence inside the enterprise was technically achievable via ODBC (due to it's platform independence). That said, DBMS specific ODBC channels alone couldn't address the holistic requirements associated with Conceptual Views of disparate data sources, hence the need for Data Access Virtualization via Virtual Database Engine technology.
Just as is the case on the Web today, with the emergence of the "Linked Data" meme, enterprises now have a powerful mechanism for exploiting the Data Integration benefits associated with generating Data Objects from disparate data sources, endowed with HTTP based IDs (URIs).
Conceptualizing access to data exposed Databases APIs, SOA based Web Services (SOAP style Web Services), Web 2.0 APIs (REST style Web Services), XML Views of SQL Data (SQLX), pure XML etc.. is problem area addressed by RDF aware middleware (RDFizers e.g Virtuoso Sponger).
Here are examples of what SQL Rows exposed as RDF Data Objects (identified using HTTP based URIs) would look like outside or behind a corporate firewall:
What's Good for the Web Goose (Personal Data Space URIs) is good for the Enterprise Gander (Enterprise Data Space URIs).
The aforementioned qualification is increasingly necessary for the following reasons:
The terms GGG, Linked Data, Data Web, Web of Data, and Web 3.0 (when I use this term) all imply URI driven Open Data Access for the Web Database (maybe call this ODBC for the Web) -- ability to point to records across data spaces without any adverse effect to the remote data spaces. It's really important to note that none of the aforementioned terms have nothing to do with the "Linguistic Meaning of blurb". Building a smarter document exposed via a URL without exposing descriptive data links doesn't provide open access to information data sources.
As human beings we are all endowed with reasoning capability. But we can't reason without access to data. Dearth of openly accessible structured data is the source of many ills in cyberspace and across society in general. Today we still have Subjectivity reigning over Objectivity due to the prohibitive costs of open data access.
We can't cost-effectively pursue objectivity without cost-effective infrastructure for creating alternative views of the data behind information sources (e.g. Web Pages). More Objectivity and less Subjectivity is what the next Web Frontier is about. At OpenLink we simply use the moniker: Analysis for All! Everyone becomes a data analyst in some form, and even better, the analysis are easily accessible to anyone connected to the Web. Of course, you will be able to share special analysis with your private network of friends and family, or if you so choose, not at all :-)
Recap, it's important to note that Linked Data is the foundation layer of the Semantic Web vision. It's not only facilitates open data access, it also enables data integration (Meshing as opposed to Mashing) across disparate data schemas
As demonstrated by DBpedia and the Linked Data Solar system emerging around it, if you URI everything, then everything is Cool.
Linked Data and Information Silos are mutually exclusive concepts. Thus, you cannot produce a web accessible Information Silo and then refer to it as "Semantic Web" technology. Of course, it might be very Semantic, but it's fundamentally devoid of critical "Semantic Web" essence (DNA).
My acid test for any Semantic Web solution is simply this (using a Web User Agent or Client):
Here is the Acid test against my Data Space:
Now I can simply state the following using Linked Data (hyperdata) links:
OpenLink Software's product porfolio is comprised of the following product families:We no longer have to explain (repeatedly) why our drivers exist in Express, Lite, and Multi-Tier Edition formats, or why you ultimately need Multi-Tier Drivers over Single Tier Drivers (Express or Lite Editions) since you ultimately heed high-performance, data encryption, and policy based security across each of the data access driver formats.
]]>How Do I create the missing Bitmap Indexes?
Go to the HTML based Virtuoso Conductor, iSQL command line interface, or an ODBC / JDBC / ADO.NET / OLE DB client and execute:
CREATE BITMAP index RDF_QUAD_POGS on DB.DBA.RDF_QUAD (P,O,G,S);
CREATE BITMAP index RDF_QUAD_PSOG on DB.DBA.RDF_QUAD (P,S,O,G);
CREATE BITMAP index RDF_QUAD_SOPG on DB.DBA.RDF_QUAD (S,O,P,G);
Jason recently moved to Massachusetts which lead to me pinging him about our earlier blogosphere encounter and the emergence of a Data Portability Community. I also informed him about the fact that TimBL, myself, and a number of other Semantic Web technology enthusiasts, frequently meet on the 2nd Tuesday of each month at the MIT hosted Cambridge Semantic Web Gatherings, to discuss, demonstrate, debate all aspects of the Semantic Web. Luckily (for both of us), Jason attended the last event, and we got to meet each other in person.
Following our face to face meeting in Cambridge, a number of follow-on conversations ensued covering, Linked Data and practical applications of the Semantic Web vision. Jason writes about our exchanges a recent post titled: The Semantic Web. His passion for Data Portability enabled me to use OpenID and FOAF integration to connect the Semantic Web and Data Portability via the Linked Data concept.
During our conversations, Jason also eluded to the fact that he had already encountered OpenLink Software while working with our ODBC Drivers (part of or UDA product family) for IBM Informix (Single-Tier or Multi-Tier Editions) a few years ago (interesting random connection).
As I've stated in the past, I've always felt that the Semantic Web vision will materialize by way of a global epiphany. The count down to this inevitable event started at the birth of the blogosphere, ironically. And accelerated more recently, through the emergence of Web 2.0 and Social Networking, even more ironically :-)
The blogosphere started the process of Data Space coalescence via RSS/Atom based semi-strucutured data enclaves, Web 2.0 RDFpropagated Web Service usage en route to creating service provider controlled, data and information silosRDF, Social NetworkingRDF brought attention to the fact that User Generated Data wasn't actually owned or controlled by the Data Creators etc.
The emergence of "Data Portability" has created a palatable moniker for a clearly defined, and slightly easier to understand, problem: the meshing of Data and Identity in cyberspace i.e. individual points of presence in cyberspace, in the form of "Personal Data Spaces in the Clouds" (think: doing really powerful stuff with .name domains). In a sense, this is the critical inflection point between the document centric "Web of Linked Documents" and the data centric "Web or Linked Data". There is absolutely no other way solve this problem in a manner that alleviates the imminent challenges presented by information overload -- resulting from the exponential growth of user generated data across the Internet and enterprise Intranets.
]]>Writing a JDBC Driver for SPARQL is a little overkill. OpenOffice.org simply needs to make XML or Web Data (HTML, XHTML, and XML) bonafide data sources within its "Pivot Table" functionality realm. Then all that would then be required is a SPARQL SELECT Query transported via the SPARQL Protocol with results sent back using the SPARQL XML results serialization format (all part of a single SPARQL Protocol URL).
Excel successfully consumes the following information resource URI: http://tinyurl.com/yvoccj (a tiny url for a SPARQL SELECT against my FOAF file).
Alternatively, and currently achievable, you could simply use SPASQL (SPARQL within SQL) using a DBMS engine that supports SQL, SPARQL, and SPARQL e.g. Virtuoso.
Virtuoso SPASQL support is exposed via it's ODBC and/or JDBC Drivers. Thus you can do things such as:
BTW - My News Years Resolution: get my act together and shrink the ever increasing list of "simple & practical Virtuoso use case demos" on my todo which now spans all the way back to 2006 :-(
]]>On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS.
Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol.
The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers).
HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc).
Examples of Information Resource and Data Source URIs:
Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL).
Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following:
The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as an Information Resource. It's comprised of granular data items called "Entities", that expose fine grained properties values, individual and/or group characteristics (attributes), and relationships (associations) with other Entities.
The Web is in the final stages of the 3rd phase of it's evolution. A phase characterized by the shuttling of structured data payloads (RDF) alongside less data oriented payloads (HTML, XHTML, XML etc.). As you can see, Linked Data and Structured Data are both terms used to describe the addition of more data centric payloads to the Web. Thus, you could view the process of creating a Structured Web of Linked Data as follows:
The Semantic Data Web is an evolution of the current Web (an Information Space) that adds structured data payloads (RDF) to current, less data oriented, structured payloads (HTML, XHTML, XML, and others).
The Semantic Data Web is increasingly seen as an inevitability because it's rapidly reaching the point of critical mass (i.e. network effect kick-in). As a result, Data Web emphasis is moving away from: "What is the Semantic Data Web?" To: "How will Semantic Data Web make our globally interconnected village an even better place?", relative to the contributions accrued from the Web thus far. Remember, the initial "Document Web" (Web 1.0) bootstrapped because of the benefits it delivered to blurb-style content publishing (remember the term electronic brochure-ware?). Likewise, in the case of the "Services Web" (Web 2.0), the bootstrap occurred because it delivered platform independence to Web Application Developers - enabling them to expose application logic behind Web Services. It is my expectation that the Data Integration prowess of the Data Web will create a value exchange realm for data architects and other practitioners from the database and data access realms.
A vital component of the new Virtuoso release is the finalization of our SQL to RDF mapping functionality -- enabling the declarative mapping of SQL Data to RDF. Additional technical insight covering other new features (delivered and pending) is provided by Orri Erling, as part of a series of post-Banff posts.
A majority of the world's data (especially in the enterprise realm) resides in SQL Databases. In addition, Open Access to the data residing in said databases remains the biggest challenge to enterprises for the following reasons:
Enterprises have known from the beginning of modern corporate times that data access, discovery, and manipulation capabilities are inextricably linked to the "Real-time Enterprise" nirvana (hence my use of 0.0 before this becomes 3.0).
In my experience, as someone whose operated in the data access and data integration realms since the late '80s, I've painfully observed enterprises pursue, but unsuccessfully attain, full control over enterprise data (the prized asset of any organization) such that data-, information-, knowledge-workers are just a click away from commencing coherent platform and database independent data drill-downs and/or discovery that transcend intranet, internet, and extranet boundaries -- serendipitous interaction with relevant data, without compromise!
Okay, situation analysis done, we move on..
At our most recent (12th June) monthly Semantic Web Gathering, I unveiled to TimBL and a host of other attendees a simple, but powerful, demonstration of how Linked Data, as an aspect of the Semantic Data Web, can be applied to enterprise data integration challenges.
The vision of data, information, or knowledge at your fingertips is nigh! Thanks to the infrastructure provided by the Semantic Data Web (URIs, RDF Data Model, variety of RDF Serialization Formats[1][2][3], and Shared Data Dictionaries / Schemas / Ontologies [1][2][3][4][5]) it's now possible to Virtualize enterprise data from the Physical Storage Level, through the Logical Data Management Levels (Relational), up to a Concrete Conceptual Model (Graph) without operating system, development environment or framework, or database engine lock-in.
We produce a shared ontology for the CRM and Business Reporting Domains. I hope this experiment clarifies how this is quite achievable by converting XML Schemas to RDF Data Dictionaries (RDF Schemas or Ontologies). Stay tuned :-)
Also watch TimBL amplify and articulate Linked Data value in a recent interview.
To deliver a mechanism that facilitates the crystallization of this reality is a contribution of boundless magnitude (as we shall all see in due course). Thus, it is easy to understand why even "her majesty", the queen of England, simply had to get in on the act and appoint TimBL to the "British Order of Merit" :-)
Note: All of the demos above now work with IE & Safari (a "remember what Virtuoso is epiphany") by simply putting Virtuoso's DBMS hosted XSLT engine to use :-) This also applies to my earlier collection of demos from the Hello Data Web and other Data Web & Linked Data related demo style posts.
]]>From my perspective on things I prefer to align my articulation of the changes that are occurring across our industry (courtesy of the Internet Inflection) to the MVC pattern.
Re. the Web Versions (or Dimensions of Interaction):
The same applies to evolution of Openness:
In the (C)ontroller realm where the focal point is Application Logic, data access issues aren't obvious (*I recall my battles with Richard Stallman re. the appropriate Open Source License variant for iODBC during the embryonic years of database and data access technology on Linux*). Data is an enigma in this realm, unfortunately. This implies that "Data Lock-in" occurs deliberately, but in most cases, inadvertently when we make Application Logic the focal point of everything. Another example is Web 2.0 in which the norm (unfortunately) is to suck in your data, and then refuse to give you complete ownership over how it is used (including the fact that you may want to share it elsewhere).
Open Data is a really big deal which is why the SWEO supported Linking Open Data Project is a very big deal. The good news is that this movement is gathering moment at an exponential rate :-)]]>OAT: OpenAjax Alliance Compliant Toolkit: "
Ondrej Zara and his team at Openlink Software have created a Openlink Software JS Toolkit, known as OAT. It is a full-blown JS framework, suitable for developing
rich applications with special focus to data access.
OAT works standalone, offers vast number of widgets and has some rarely seen features, such as on-demand library loading (which reduces the total amount of downloaded JS code).
OAT is one of the first JS toolkits which show full OpenAjax Alliance conformance: see the appropriate wiki page and conformance test page.
There is a lot to see with this toolkit:
You can see some of the widgets in a Kitchen sink application
Sample data access applications:
OAT is Open Source and GPLâed over at sourceforge and the team has recently managed to incorporate our OAT data access layer as a
module to dojo datastore.
(Via Ajaxian Blog.)
This is a corrected version of the initial post. Unfortunately, the initial post was inadvertently littered with invalid links :-( Also, since the original post we have released OAT 1.2 that includes integration of our iSPARQL QBE into the OAT Form Designer application.
Re. Data Access, It is important to note that OAT's Ajax Database Connectivity layers supports data binding to the following data source types:
OAT also includes a number of prototype applications that are completely developed using OAT Controls and Libraries:
Note: Pick "Local DSN" from page initialization dialog's drop-down list control when prompted
]]>Here is a very important excerpt:
...And then something happened. Visual Basic became popular as a scriptable "automation language". ODBC, being a C-style interface, was not directly consumable from VB. However, some of you clever folks figured out that Microsoft Access supported executing queries against ODBC Datasources, and that Access did support scriptable automation through its Data Access Object (DAO) API. Voila! Now you could write applications against ODBC sources using VB.
However, DAO went through Access's internal "Jet" (Joint Engine Technology) database engine, which defaulted to building local keysets for each result in order to do advanced query processing and cursoring against the remote data. This was fine if you needed that functionality, but significant performance overhead and additional round trips when you didn't.
Enter the Visual Basic team who, responding to customer demand for better performance against ODBC sources, came up with something called Remote Data Objects (RDO). RDO implemented the same DAO programming patterns directly against ODBC, rather than going through Jet. RDO was extremely popular among VB developers, but the fact that we had two different sets of automation objects for accessing ODBC sources caused confusion.
But apparently not enough confusion, because our solution was to introduce "ODBCDirect". Despite its name, ODBCDirect was not a new API; it was just a mode we added to DAO that set defaults in such a way as to avoid the overhead of building keysets and such
...
To this very day (unfortunately!) ODBC has been maligned by the perpetuated misunderstanding of JET's DAO layer that sits atop ODBC providing advanced query processing (i.e. Virtual DBMS functionality) alongside a client-side keyset cursor model implementation.
]]>BTW - We have just released a collection of High-Performance Data Providers for ActiveRecord. Our providers deliver
Consistent Functionalityto RoR developers across Virtuoso, Oracle, SQL Server, Sybase, DB2, Ingres, Informix, and others without compromising performance or cross platform portability.]]>
A declarative language adapted from SPARQL's graph pattern language (N3/Turtle) for mapping SQL Data to RDF Ontologies. We currently refer to this as a Graph Pattern based RDF VIEW Definition Language.
It provides an effective mechanism for exposing existing SQL Data as virtual RDF Data Sets (Graphs) negating the data duplication associated with generating physical RDF Graphs from SQL Data en route to persistence in a dedicated Triple Store.
Enterprise applications (traditional and web based) and most Web Applications (Web 1.0 and Web 2.0) sit atop relational databases, implying that SQL/RDF model and data integration is an essential element of the burgeoning "Data Web" (Semantic Web - Layer 1) comprehension and adoption process.
In a nutshell, this is a quick route for non disruptive exposure of existing SQL Data to SPARQL supporting RDF Tools and Development Environments.
CREATE GRAPH IRI("http://myopenlink.net/dataspace")
CREATE IRI CLASS odsWeblog:feed_iri "http://myopenlink.net/dataspace/kidehen/weblog/MyFeeds" ( in memb varchar not null, in inst varchar not null)
OAT offers a broad Javascript-based, browser-independent widget set
for building data source independent rich internet applications that are usable across a broad range of Ajax-capable web browsers.
OAT's support binding to the following data sources via its Ajax Database Connectivity Layer:
SQL Data via XML for Analysis (XMLA)
Web Data via SPARQL, GData, and OpenSearch Query Services
Web Services specific Data via service specific binding to SOAP and REST style web services
The toolkit includes a collection of powerful rich internet application prototypes include: SQL Query By Example, Visual Database Modeling, and Data bound Web Form Designer.
Project homepage on sourceforge.net:
http://sourceforge.net/projects/oat
Source Code:
http://sourceforge.net/projects/oat/files
Live demonstration:
http://www.openlinksw.com/oat/
]]>The screencasts covered the following functionality realms:
To bring additional clarity to the screencasts demos and OAT in general, I have saved a number of documents that are the by products of activities in the screenvcasts:
Notes:
You can see a full collection of saved documents at the following locations:
Virtuoso extends its SQL3 implementation with syntax for integrating SPARQL into queries and subqueries.Thus, as part of a SQL SELECT query or subquery, one can write the SPARQL keyword and a SPARQL query as part of query text processed by Virtuoso's SQL Query Processor.
Using Virtuoso's Command line or the Web Based ISQL utility type in the following (note: "SQL>" is the command line prompt for the native ISQL utility):
SQL> sparql select distinct ?p where { graph ?g { ?s ?p ?o } };
Which will return the following:
p varchar ---------- http://example.org/ns#b http://example.org/ns#d http://xmlns.com/foaf/0.1/name http://xmlns.com/foaf/0.1/mbox ...
SQL> select distinct subseq (p, strchr (p, '#')) as fragment from (sparql select distinct ?p where { graph ?g { ?s ?p ?o } } ) as all_predicates where p like '%#%' ;
fragment varchar ---------- #query #data #name #comment ...
You can pass parameters to a SPARQL query using a Virtuoso-specific syntax extension. '??' or '$?' indicates a positional parameter similar to '?' in standard SQL. '??' can be used in graph patterns or anywhere else where a SPARQL variable is accepted. The value of a parameter should be passed in SQL form, i.e. this should be a number or an untyped string. An IRI ID can not be passed, but an absolute IRI can. Using this notation, a dynamic SQL capable client (ODBC, JDBC, ADO.NET, OLEDB, XMLA, or others) can execute parametrized SPARQL queries using parameter binding concepts that are common place in dynamic SQL. Which implies that existing SQL applications and development environments (PHP, Ruby, Python, Perl, VB, C#, Java, etc.) are capable of issuing SPARQL queries via their existing SQL bound data access channels against RDF Data stored in Virtuoso.
Note: This is the Virtuoso equivalent of a recently published example using Jena (a Java based RDF Triple Store).
Create a Virtuoso Function by execting the following:
SQL> create function param_passing_demo (); { declare stat, msg varchar; declare mdata, rset any; exec ('sparql select ?s where { graph ?g { ?s ?? ?? }}', stat, msg, vector ('http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#int1', 4 ), -- Vector of two parameters 10, -- Max. result-set rows mdata, -- Variable for handling result-set metadata rset -- Variable for handling query result-set ); return rset[0][0]; }Test new "param_passing_demo" function by executing the following:
SQL> select param_passing_demo ();
Which returns:
callret VARCHAR _______________________________________________________________________________http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#four1 Rows. -- 00000 msec.
A SPARQL ASK query can be used as an argument of the SQL EXISTS predicate.
create function sparql_ask_demo () returns varchar { if (exists (sparql ask where { graph ?g { ?s ?p 4}})) return 'YES'; else return 'NO'; };
Test by executing:
SQL> select sparql_ask_demo ();
Which returns:
_________________________ YES]]>
I added the missing piece regarding the "Virtuoso Conductor" (the Web based Admin UI for Virtuoso) to the original post below. I also added a link to our live SPARQL Demo so that anyone interested can start playing around with SPARQL and SPARQL integrated into SQL right away.
Another good thing about this post is the vast amount of valuable links that it contains. To really appreciate this point simply visit my Linkblog (excuse the current layout :-) - a Tab if you come in via the front door of this Data Space (what I used to call My Weblog Home Page).
]]>"Free" Databases: Express vs. Open-Source RDBMSs: "Open-source relational database management systems (RDBMSs) are gaining IT mindshare at a rapid pace. As an example, BusinessWeek's February 6, 2006 ' Taking On the Database Giants ' article asks 'Can open-source upstarts compete with Oracle, IBM, and Microsoft?' and then provides the answer: 'It's an uphill battle, but customers are starting to look at the alternatives.'
There's no shortage of open-source alternatives to look at. The BusinessWeek article concentrates on MySQL, which BW says 'is trying to be the Ikea of the database world: cheap, needs some assembly, but has a sleek, modern design and does the job.' The article also discusses Postgre[SQL] and Ingres, as well as EnterpriseDB, an Oracle clone created from PostgreSQL code*. Sun includes PostgreSQL with Solaris 10 and, as of April 6, 2006, with Solaris Express.**
*Frank Batten, Jr., the investor who originally funded Red Hat, invested a reported $16 million into Great Bridge with the hope of making a business out of providing paid support to PostgreSQL users. Great Bridge stayed in business only 18 months , having missed an opportunity to sell the business to Red Hat and finding that selling $50,000-per-year support packages for an open-source database wasn't easy. As Batten concluded, 'We could not get customers to pay us big dollars for support contracts.' Perhaps EnterpriseDB will be more successful with a choice of $5,000, $3,000, or $1,000 annual support subscriptions .
**Interestingly, Oracle announced in November 2005 that Solaris 10 is 'its preferred development and deployment platform for most x64 architectures, including x64 (x86, 64-bit) AMD Opteron and Intel Xeon processor-based systems and Sun's UltraSPARC(R)-based systems.'
There is a surfeit of reviews of current MySQL, PostgreSQL andâto a lesser extentâIngres implementations. These three open-source RDBMSs come with their own or third-party management tools. These systems compete against free versions of commercial (proprietary) databases: SQL Server 2005 Express Edition (and its MSDE 2000 and 1.0 predecessors), Oracle Database 10g Express Edition, IBM DB2 Express-C, and Sybase ASE Express Edition for Linux where database size and processor count limitations aren't important. Click here for a summary of recent InfoWorld reviews of the full versions of these four databases plus MySQL, which should be valid for Express editions also. The FTPOnline Special Report article, 'Microsoft SQL Server Turns 17,' that contains the preceding table is here (requires registration.)
SQL Server 2005 Express Edition SP-1 Advanced Features
SQL Server 2005 Express Edition with Advanced Features enhances SQL Server 2005 Express Edition (SQL Express or SSX) dramatically, so it deserves special treatment here. SQL Express gains full text indexing and now supports SQL Server Reporting Services (SSRS) on the local SSX instance. The SP-1 with Advanced Features setup package, which Microsoft released on April 18, 2006, installs the release version of SQL Server Management Studio Express (SSMSE) and the full version of Business Intelligence Development Studio (BIDS) for designing and editing SSRS reports. My 'Install SP-1 for SQL Server 2005 and Express' article for FTPOnline's SQL Server Special Report provides detailed, illustrated installation instructions for and related information about the release version of SP-1. SP-1 makes SSX the most capable of all currently available Express editions of commercial RDBMSs for Windows.
OpenLink Software's Virtuoso Open-Source Edition
OpenLink Software announced an open-source version of it's Virtuoso Universal Server commercial DBMS on April 11, 2006. On the initial date of this post, May 2, 2006, Virtuoso Open-Source Edition (VOS) was virtually under the radar as an open-source product. According to this press release, the new edition includes:VOS only lacks the virtual server and replication features that are offered by the commercial edition. VOS includes a Web-based administration tool called the "Virtuoso Conductor" According to Kingsley Idehen's Weblog, 'The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).'
- SPARQL compliant RDF Triple Store
- SQL-200n Object-Relational Database Engine (SQL, XML, and Free Text)
- Integrated BPEL Server and Enterprise Service Bus
- WebDAV and Native File Server
- Web Application Server that supports PHP, Perl, Python, ASP.NET, JSP, etc.
- Runtime Hosting for Microsoft .NET, Mono, and Java
InfoWorld's Jon Udell has tracked Virtuoso's progress since 2002, with an additional article in 2003 and a one-hour podcast with Kingsley Idehen on April 26, 2006. A major talking point for Virtuoso is its support for Atom 0.3 syndication and publication, Atom 1.0 syndication and (forthcoming) publication, and future support for Google's GData protocol, as mentioned in this Idehen post. Yahoo!'s Jeremy Zawodny points out that the 'fingerprints' of Adam Bosworth, Google's VP of Engineering and the primary force behind the development of Microsoft Access, 'are all over GData.' Click here to display a list of all OakLeaf posts that mention Adam Bosworth.
One application for the GData protocol is querying and updating the Google Base database independently of the Google Web client, as mentioned by Jeremy: 'It's not about building an easier onramp to Google Base. ... Well, it is. But, again, that's the small stuff.' Click here for a list of posts about my experiences with Google Base. Watch for a future OakLeaf post on the subject as the GData APIs gain ground.
Open-Source and Free Embedded Database Contenders
Open-source and free embedded SQL databases are gaining importance as the number and types of mobile devices and OSs proliferate. Embedded databases usually consist of Java classes or Windows DLLs that are designed to minimize file size and memory consumption. Embedded databases avoid the installation hassles, heavy resource usage and maintenance cost associated with client/server RDBMSs that run as an operating system service.
Andrew Hudson's December 2005 'Open Source databases rounded up and rodeoed' review for The Enquirer provides brief descriptions of one commercial and eight open source database purveyors/products: Sleepycat, MySQL, PostgreSQL, Ingres, InnoBase, Firebird, IBM Cloudscape (a.k.a, Derby), Genezzo, and Oracle. Oracle Sleepycat* isn't an SQL Database, Oracle InnoDB* is an OEM database engine that's used by MySQL, and Genezzo is a multi-user, multi-server distributed database engine written in Perl. These special-purpose databases are beyond the scope of this post.
* Oracle purchased Sleepycat Software, Inc. in February 2006 and purchased Innobase OY in October 2005 . The press release states: 'Oracle intends to continue developing the InnoDB technology and expand our commitment to open source software.'
Derby is an open-source release by the Apache Software Foundation of the Cloudscape Java-based database that IBM acquired when it bought Informix in 2001. IBM offers a commercial release of Derby as IBM Cloudscape 10.1. Derby is a Java class library that has a relatively light footprint (2 MB), which make it suitable for client/server synchronization with the IBM DB2 Everyplace Sync Server in mobile applications. The IBM DB2 Everyplace Express Edition isn't open source or free*, so it doesn't qualify for this post. The same is true for the corresponding Sybase SQL Anywhere components.**
* IBM DB2 Everyplace Express Edition with synchronization costs $379 per server (up to two processors) and $79 per user. DB2 Everyplace Database Edition (without DB2 synchronization) is $49 per user. (Prices are based on those when IBM announced version 8 in November 2003.)
** Sybase's iAnywhere subsidiary calls SQL Anywhere 'the industry's leading mobile database.' A Sybase SQL Anywhere Personal DB seat license with synchronization to SQL Anywhere Server is $119; the cost without synchronization wasn't available from the Sybase Web site. Sybase SQL Anywhere and IBM DB2 Everyplace perform similar replication functions.
Sun's Java DB, another commercial version of Derby, comes with the Solaris Enterprise Edition, which bundles Solaris 10, the Java Enterprise System, developer tools, desktop infrastructure and N1 management software. A recent Between the Lines blog entry by ZDNet's David Berlind waxes enthusiastic over the use of Java DB embedded in a browser to provide offline persistence. RedMonk analyst James Governor and eWeek's Lisa Vaas wrote about the use of Java DB as a local data store when Tim Bray announced Sun's Derby derivative and Francois Orsini demonstrated Java DB embedded in the Firefox browser at the ApacheCon 2005 conference.
Firebird is derived from Borland's InterBase 6.0 code, the first commercial relational database management system (RDBMS) to be released as open source. Firebird has excellent support for SQL-92 and comes in three versions: Classic, SuperServer and Embedded for Windows, Linux, Solaris, HP-UX, FreeBSD and MacOS X. The embedded version has a 1.4-MB footprint. Release Candidate 1 for Firebird 2.0 became available on March 30, 2006 and is a major improvement over earlier versions. Borland continues to promote InterBase, now at version 7.5, as a small-footprint, embedded database with commercial Server and Client licenses.
SQLite is a featherweight C library for an embedded database that implements most SQL-92 entry- and transitional-level requirements (some through the JDBC driver) and supports transactions within a tiny 250-KB code footprint. Wrappers support a multitude of languages and operating systems, including Windows CE, SmartPhone, Windows Mobile, and Win32. SQLite's primary SQL-92 limitations are lack of nested transactions, inability to alter a table design once committed (other than with RENAME TABLE and ADD COLUMN operations), and foreign-key constraints. SQLite provides read-only views, triggers, and 256-bit encryption of database files. A downside is the the entire database file is locked when while a transaction is in progress. SQLite uses file access permissions in lieu of GRANT and REVOKE commands. Using SQLite involves no license; its code is entirely in the public domain.The Mozilla Foundation's Unified Storage wiki says this about SQLite: 'SQLite will be the back end for the unified store [for Firefox]. Because it implements a SQL engine, we get querying 'for free', without having to invent our own query language or query execution system. Its code-size footprint is moderate (250k), but it will hopefully simplify much existing code so that the net code-size change should be smaller. It has exceptional performance, and supports concurrent access to the database. Finally, it is released into the public domain, meaning that we will have no licensing issues.'
Vieka Technology, Inc.'s eSQL 2.11 is a port of SQLite to Windows Mobile (Pocket PC and Smartphone) and Win32, and includes development tools for Windows devices and PCs, as well as a .NET native data provider. A conventional ODBC driver also is available. eSQL for Windows (Win32) is free for personal and commercial use; eSQL for Windows Mobile requires a license for commercial (for-profit or business) use.
HSQLDB isn't on most reviewers' radar, which is surprising because it's the default database for OpenOffice.org (OOo) 2.0's Base suite member. HSQLDB 1.8.0.1 is an open-source (BSD license) Java dembedded database engine based on Thomas Mueller's original Hypersonic SQL Project. Using OOo's Base feature requires installing the Java 2.0 Runtime Engine (which is not open-source) or the presence of an alternative open-source engine, such as Kaffe. My prior posts about OOo Base and HSQLDB are here, here and here.
The HSQLDB 1.8.0 documentation on SourceForge states the following regarding SQL-92 and later conformance:
Other less well-known embedded databases designed for or suited to mobile deployment are Mimer SQL Mobile and VistaDB 2.1 . Neither product is open-source and require paid licensing; VistaDB requires a small up-front payment by developers but offers royalty-free distribution.HSQLDB 1.8.0 supports the dialect of SQL defined by SQL standards 92, 99 and 2003. This means where a feature of the standard is supported, e.g. left outer join, the syntax is that specified by the standard text. Many features of SQL92 and 99 up to Advanced Level are supported and here is support for most of SQL 2003 Foundation and several optional features of this standard. However, certain features of the Standards are not supported so no claim is made for full support of any level of the standards.
Java DB, Firebird embedded, SQLite and eSQL 2.11 are contenders for lightweight PC and mobile device database projects that aren't Windows-only.
SQL Server 2005 Everywhere
If you're a Windows developer, SQL Server Mobile is the logical embedded database choice for mobile applications for Pocket PCs and Smartphones. Microsoft's April 19, 2006 press release delivered the news that SQL Server 2005 Mobile Editon (SQL Mobile or SSM) would gain a big brotherâSQL Server 2005 Everywhere Edition.
Currently, the SSM client is licensed (at no charge) to run in production on devices with Windows CE 5.0, Windows Mobile 2003 for Pocket PC or Windows Mobile 5.0, or on PCs with Windows XP Tablet Edition only. SSM also is licensed for development purposes on PCs running Visual Studio 2005. Smart Device replication with SQL Server 2000 SP3 and later databases has been the most common application so far for SSM.
By the end of 2006, Microsoft will license SSE for use on all PCs running any Win32 version or the preceding device OSs. A version of SQL Server Management Studio Express (SSMSE)âupdated to support SSEâis expected to release by the end of the year. These features will qualify SSE as the universal embedded database for Windows client and smart-device applications.
For more details on SSE, read John Galloway's April 11, 2006 blog post and my 'SQL Server 2005 Mobile Goes Everywhere' article for the FTPOnline Special Report on SQL Server."(Via OakLeaf Systems.)
The engineers bring computers and the software programs they are working on and literally plug them together to see how their programs interoperate. “We work around the clock for a week. We torture our machines in the pursuit of interoperability,” he told a rapt courtroom.
“Can you do this test with Microsoft?” Judge Cooke asked.
“Yes, but they don’t turn up,” Tridgell said.
In an interview after the court had adjourned for the day, Tridgell explained that for the past six years Microsoft has boycotted the event.
“They used to come. It used to be held in Seattle, close to Microsoft’s headquarters,” he said.
But the software giant turned its back on the rest of the software community in the late 1990s once it had developed a server operating system it believed it could corner the market with. This marked a turning point for the software industry, Tridgell said. He spoke nostalgically about the days before Microsoft went its separate way. “It’s not like it used to be. I’d like it to get back to that,” he said.
The market for workgroup server operating systems lies at the heart of the European Commission’s antitrust decision against Microsoft. Sun Microsystems Inc., a player in this market, complained to the European competition regulator in 1998 that Microsoft was competing unfairly. That complaint sparked the five year-long antitrust investigation.
To remedy the situation, the Commission ordered Microsoft to divulge interoperability protocols within its own Windows workgroup server operating system. With this information, rival server systems should be able to communicate as fluently with Windows on PCs as Microsoft’s own server system.
Two years on from the historic antitrust ruling, the Commission contends that Microsoft still hasn’t provided the necessary information, and the Commission is poised to issue a new antitrust ruling against the company for failing to comply with its 2004 decision.
Even if Microsoft does comply, it isn’t certain that Tridgell and others from the free and open source sides of the software community will be granted access to the information.
At the time of the antitrust ruling, Microsoft said the remedy proposed by the then competition commissioner, Mario Monti, would result in its valuable intellectual property being given away if it fell into the hands of open source developers.
Andrew's testimony reflects an experience familiar to many ISV's that worked closely with Microsoft in the "early to mid 90's". In our case, the technology was ODBC (Open Database Connectivity). The cost of achieving ODBC compliance and interoperability grew exponentially as Microsoft veered towards a platform and database specific monoculture.
]]>There is a countdown that is pretty much in motion as a result of the latest move by Oracle. If Open Source developers want to alleviate the inevitable despair, they will need to revisit the issue of decoupling those MySQL specific applications via a re-binding effort to database independent call level interfaces such as ODBC (using iODBC or unixODBC).
For those MySQL users that think binding to ODBC is too hard, simply take a look at the two year old MySQL2ODBC SDK. It will not stop you from using MySQL, it simply separates your intellectual capital (the application logic) from the data storage (DBMS engine). You can still use ODBC to talk to MySQL, but you won't be locked into an inheritance tree that is susceptible to the inevitable strategic assaults on MySQL.
]]>In the past I have expressed views that echo the essence of John's piece. It has been pretty darn clear to me that Microsoft is struggling as a result of its inability to handle challenges associated with the metaphoric "computing vase" which it sought to own solely as a result of its proclivity for crushing and/or alienating erstwhile technology partners as part of this quest (a process that commenced a long time ago culminating the contradiction and ultimate paradox called IE7; remember not too long ago it was impossible to separate IE from Windows! It could only exist as an OS extension etc.).
Windows in its current incarnation fails to provide a productive working environment, you either have a plethora of viruses and spyware contending for you computing resources, or you have all the software in place to protect against these assaults rendering the computing resources equally busy. The computing power lag is simply too much when using windows, and this is its achilles heel!
I have been using Windows since version 2.0, and although I have always found the Mac OS variations to be superior on the UI front, I never found any of the historic versions viable alternatives. In my case, this is all about providing a productive work environment across the following usage modes, in descending order of priority:
1. Power User (OutLook, Excel, WORD, and other desktop productivity tools)
2. Product Testing and QA
3. Programmer Buddy (a Microsoft term)
4. Programming (for the most part prototyping)
The release of Mac OS X Tiger lead me down an evaluation path that I have repeated many times in the past: test the viability of moving wholesale from Windows to Mac OS X and remain functional (if really lucky, exceed existing productivity levels). This time around I found that I could actually migrate over 6 years worth of emails, contacts, presentations, documents, spreadsheets from Windows to Mac OS X. I also discovered that success extended all the way to my data linked documents that are transparently bound to back-end databases (in my case the norm rather the exception via ODBC).
I now use Mac OS X as my prime working platform (I still have to use Windows as the platform remains strategic for all our product offerings), and I am absolutely loving it! The joint feelings of euphoria and confusion that I experienced post migration were similar to how I felt after making the transition from "stick shift" to "automatic" geared cars (as I transitioned my residence from the UK to the U.S). At the time I couldn't understand why anyone (other than a grand prix driver) would ever drive a "stick shift" by choice.
Today, I can't understand why I stuck with Windows for so long at the expense of my daily working productivity. The biggest bonus from this transition is that Mac OS X has made it easier for me to engage less technical individuals (family & friends) in the sheer joy and potential of Information Technology across a variety of realms as opposed to being confined to the "business computing" realm solely. I can demonstrate the power and potential of the Internet, Web, Web Services, Blogosphere, Wikispehere, with much more sanity and coherence now that my machine responds in a timely fashion during these demos amongst other benefits.
Some may deem this windows bashing, but if they take the time to look a little deeper, this is simply about "straight shooting" from a real computer user (I like my computers to do deliver on their hugh potential promised; I don't compromise this basic expectation; my computer and associate software should save me time and ramp up my productivity!) . If Microsoft is the company that it once was, then it would simply use this kind of commentary to rally its troops and get its act together! That's what I would do if a customer felt so badly about our technology (UDA or Virtuoso).
]]>Here are a few links that resolve any confusion about this matter:
Or simple google on PHP and ODBC or PHP and iODBC ...
]]>There are a whopping 44,000 SAP customers running on Oracle databases, and IBM wants them. To get them, for the first time ever, it's optimized its enterprise database for a specific vendor's applications. The new version of DB, 8.2.2, will include a slew of SAP-optimized features, including self-tuning, self-configuration, silent install, dynamic storage allocation and more.
Wouldn't SAP be better served by simply making their application database independent via ODBC? This process really could have commenced years ago and prevented today's dilema: Your Partner has become Your most aggressive Competitor!
SAP tuned for specifically for DB2 or SAP tuned likewise for Microsoft SQL simply reeks of: "Same Sh*t different Pile". Microsoft and IBM will emulate Oracle in due course regarding their assault on SAP's market if DBMS specificity remains the SAP data access API strategy (this is a simple fact).
SAP should be using its quest for DBMS independence to stimulate or contribute ODBC enhancements (should ODBC be lacking in areas critical to its application needs; it is available in Open Source form and across all major platforms). Should the ODBC API not be the problem, then it can push ODBC Driver vendors (DBMS vendors such as IBM included) to get their Drivers in shape (should they be lacking, I know our ODBC Drivers are absolutely fine for this kind of task).
Database specificity gets application vendors nowhere. You can only control your business development destiny by being database independent. When applications are database independent the intellectual capital that drives your applications is preserved. This is akin to building physical and logical firewalls around the ecosystem created by your products. This is much better that being a pseudo DBMS engine reseller for a future competitor.
]]>
Have RSS feeds killed the email star? silicon.com Feb 28 2005 12:58PM GMT
DB2 users of PeopleSoft and IBM (the DB2 developer and vendor) suspect that Oracle will obviously try to use its ownership of PeopleSoft to covertly coerce DB2 users into becoming Oracle DBMS users. This strategy would take the form of new features and fixes discrimination as somewhat echoed in these excerpts:
"..In the crescendo surrounding the Oracle-PeopleSoft merger, one question has been repeatedly drowned out: What happens to users of PeopleSoft's DB2 database? Oracle chief Larry Ellison has repeatedly assured DB2 users--and IBM--that Oracle will continue to support DB2 and PeopleSoft's interfaces to IBM's WebSphere platform. But IBM isn't taking any chances, announcing an initiative to alter DB2 to work with products from Oracle rival SAP."
"..IBM has good reason to be concerned. Oracle vies with SAP as the leading vendor for enterprise applications, but it's under pressure to show concrete benefits from the merger by combining assets and pumping up revenue. One obvious tactic will be to use the PeopleSoft applications to steer enterprise customers toward the Oracle database by optimizing performance and features toward the Oracle back end."
If PeopleSoft's application core was ODBC based, the vulnerability to this predictable competitive tactic would at the very least be significantly alleviated. DB2 end-users and IBM the product vendor would have a much stronger basis for countering Oracle by taking them to task about their claimed inability to implement new application functionality enhancements against DB2 etc. especially as this would have morphed into a generic database issue as opposed to a DB2 specific issue -- by virtue of the application and data access layer seperation provided by ODBC's architecture.
]]>
Anyway, back to cognitive dissonance. Could this be the reason for the following?
And more...
]]>Email As A Platform It looks like more people are starting to realize that email is more than it seems. Especially given the drastic increase in storage size of web-based email applications, more people are realizing that email is basically a personal database. People simply store information in their email, from contact information that was emailed to them to schedule information to purchase tracking from emailed receipts. Lots of people email messages to themselves, realizing that email is basically the best "permanent" filing system they have. That's part of the reason why good email search is so important. Of course, what the article doesn't discuss is the next stage of this evolution. If you have a database of important information, the next step is to build useful applications on top of it. In other words, people are starting to realize that email, itself, is a platform for personal information management.
How secure is your data? Looking at your information management resources through a would-be intruder's eyes can help you find (and fix) vulnerabilities.
Naturally :-)
When E. F. Codd developed his relational data model in 1970, the business world was a different place. Almost 35 years after his seminal work appeared, RDBMSs that sprung from Codd's ideas are the standard for storing corporate information. And, with government and industry regulations dictating what kinds of information companies have to store, manage, and audit (and for how long), protecting this information is more important than ever. Unfortunately, it's also more challenging
Even in 1985, when Dr. Codd published 12 guidelines for RDBMSs, there was little concern for data security. In those days, gaining access to a database was so difficult that advanced security features on the database were irrelevant.
Today, RDBMSs carry the lifeblood of every organization. Note the use of the plural: Organizations now have many databases that are decentralized in terms of use and security controls. E-business demands that data access be extended to customers, partners, suppliers, and other parties who were rarely considered in the early data management days. With all this availability ? not to mention pressure from an array of government and industry regulations (see the sidebar, "Security and Compliance") ? the need to control exactly who can access or modify data is becoming paramount.
Absolute facts, that are still partially understood at best. For instance we are still in a so called "Information Age" in which standards based data access remains an issue of contempt instead of absolute necessity.
There are a number of prevailing myths about standards based data access that continue to cloak reality:
Even if the above were true (which I refute strongly), how about the general security vulnerabilities that affect both Native, and Standards compliant, data access interfaces?
Aaron's article does a good job of highlighting 6 areas of vulnerability:
What I have been able to do very quickly (thanks to blogging, and the power of a blog engine that supports WebDAV), is write a tabulated response to each of the items (bar Fixpaks) indicating how the OpenLink Multi-Tier Data Access Drivers (for ODBC, JDBC, ADO.NET, and OLEDB) protect corporate databases from each of these vulnerabilities.
To cut a long story short, we are increasingly living a contradiction where the terms "simple" and "free" are supposed to lead us to products that can adequately handle the challenges of an increasingly sophisticated grid of inter-connecting point.
I have been asked on numerous occassions, "How can you build a company and business based on data access technology?". My reply is the same as usual, "because everything comes down to data". If the data is compromised in anyway, then kiss Information, Knowledge, and everything else goodbye!
]]>
So we all buy and deploy copies of InfoPath, and then get rid of our non SQL Server and ACCESS databases? Wow!
How about InfoPath emitting XForms compliant forms? Even better, what about
]]>So we all buy and deploy copies of InfoPath, and then get rid of our non SQL Server and ACCESS databases? Wow!
How about InfoPath emitting XForms compliant forms? Even better, what about
]]>We have got to take time to understand the Data Access Layer, if we don't we will utlimately pay a hefty price (IMHO).
This blog post is also hillarious, especially if you have encountered the mercurial "Murphy" during live product demos.
So, today I went to hell. And then I came back. It was a short trip.
This year, I am giving a presentation on Mono at Brainshare in Salt Lake City, an intro to Mono for developers. I got a pretty good turnout with a few ximian people in the back (including Joe whom I saw for the first time without a hat).
So I plug in my PowerBook 12" as I always do but for some reason I have a hard time getting the projector to display its output. After struggling a little I resort to using the desktop provided by Novell, running Ximian Desktop 2 (and some version Suse Linux).
So I upload my presentation to www.frenchguys.com from my mac and then download it back to the desktop. Now I can make my presentation, which goes well. Then I get to a slide that just says : DEMO. Hmmm. Demo. I don't have Mono installed on that generic machine I was just given. I am going to need magic. So to magic I resort.
[via Monologue]
]]>MySQL and the GPL Interesting read and thoughts and discussion about MySQL and "their" interpretation (backed by FSF) of the GPL on Sterling's Blog
Unfortunately I couldn't locate the Kingsley Idehen vs. Richard Stallman FreeODBC mailing list debate archive re. iODBC anywhere on the net, so this interview link will have to suffice).
note: ODBC Driver developers that use the 4.1 client libraries are "derivative work" and they will have to release source code which means we won't be updating our MySQL ODBC Drivers because we won't be forced into release the source code of our ODBC Drivers.
Databases get a grip on XML
From Inforworld.
The
next iteration of the SQL standard was supposed to arrive in 2003. But
SQL standardization has always been a glacially slow process, so nobody
should be surprised that SQL:2003 ? now known as SQL:200n ? isn?t ready
yet. Even so, 2003 was a year in which XML-oriented data management,
one of the areas addressed by the forthcoming standard, showed up on
more and more developers? radar screens.ÃÂ >> READ MORE
This article rounds up product for 2003 in the critical area of Enterprise Database Technology. It's certainly provides an apt reflection of how Virtuoso compares with offerings from some the larger (but certainly slower to implement) database vendors in this space. As usual Jon Udell's quote pretty much sums this up:
"While the spotlight shone on the heavyweight contenders, a couple of agile innovators made noteworthy advances in 2003. OpenLink Software?s Virtuoso 3.0, which we reviewed in March, stole thunder from all three major players. Like Oracle, it offers a WebDAV-accessible XML repository. Like DB2 Information Integrator, it functions as database middleware that can perform federated ?joins? across SQL and XML sources. And like the forthcoming Yukon, it embeds the .Net CLR (Common Language Runtime), or in the case of Linux, Novell/Ximian?s Mono."
Albeit still somewhat unknown to the broader industry we have remained true our "innovator" discipline, which still remains our chosen path to market leadership. Thus, its worth a quick Virtuoso release history, and featuresÃÂ recap as we get set to up the ante even further in 2004:
1998 - Virtuoso's initial public beta release with functional emphasis on Virtual Database Engine for ODBC and JDBC Data Sources.
1999 - Virtuoso's official commercial release, with emphasis stillÃÂ on Virtual Database functionality for ODBC, JDBC accessible SQL Databases.
2000 - Virtuoso 2.0 adds XML Storage, XPath, XML Schema, XQuery, XSL-T, WebDAV, SOAP, UDDI, HTTP, Replication, Free Text Indexing (*feature update*), POP3, and NNTP support.
2002 - Virtuoso 2.7 extends Virtualization prowess beyond data access via enhancements to its Web Services protocol stack implementation by enabling SQL Stored Procedures to be published as Web Services. It also debutsÃÂ its Object-Relational engine enhancements that include theÃÂ incorporation of Java and Microsoft .NET Objects into its User Defined Type, User Defined Functions, and Stored ProcedureÃÂ offerings.
2003 - Virtuoso 3.0 extends data and application logic virtualization into the Application Server realm (basically a Virtual Application server too!), by adding support for ASP.NET, PHP, Java Server Pages runtime hosting (making applications built using any of these languages deployable using Virtuoso across all supported platforms).
Collectively each of these releases have contributed to a very premeditated architecture and vision that will ultimately unveil the inherent power of critical I.S infrastructure virtualizationÃÂ along the following lines; data storage, data access , and application logic via coherent integration of SQL, XML, Web Services, and Persistent Stored Modules (.NET, Java, and other object based component building blocks).
ÃÂ
]]>Reading the Longhorn SDK docs is a disorienting experience. Everything's familiar but different. Consider these three examples:
[Full story: Replace and defend via Jon's Radio]
"Replace & Defend" is certainly a strategy that would have awakened the entire non Microsoft Developer world during the recent PDC event. I know these events are all about preaching to the choir (Windows only developers), but as someone who has worked with Microsoft technologies as an ISV since the late 80's there is something about this events announcements that leave me concerned.
Ironically these concerns aren't about the competitive aspects of their technology disruptions, but more along the lines of how Microsoft (I hope inadvertently) generates the kinds of sentiments echoed in the comments thread from Scobles recent "How to hate Microsoft" post. As indicated in my response to this post, I don't believe Microsoft is as bad or evil as is instinctively assumed in many quarters, but I can certainly understand why they are hated by others which is really unfortunate, especially bearing in mind that they have done more good than harm to date (in my humble opinion) .
Anyway, back to my concerns post PDC which I break down as follows:
WinFS needs to architecturally separate the System Provider from the Data Provider (pretty much the OLE-DB architecture) with Microsoft naturally providing reference System Provider (pretty much what was demonstrated at PDC) and Data Provider (ADO.NET, OLE DB, and ODBC) implementations. Third parties can choose to produce custom WinFS Service or Data Providers which serve their data access needs. It's impractical to want to force every non SQL Server customer over to SQL Server in order them to exploit WinFS, and I certainly hope this isn't the definitive strategy at Microsoft.
]]>There is a new HOWTO document that addresses an area of frequent confusion on Mac OS X, which is how do you build PHP with an ODBC data access layer binding (iODBC variant) using Mac OS X Frameworks as opposed to Darwin Shared Libraries.
This document basically brings clarity to both the Frameworks and Darwin Shared library approaches.
]]>NETWORK WORLD NEWSLETTER: MARK GIBBS ON WEB APPLICATIONS
Today's focus: A Virtuoso of a server
By Mark Gibbs
One of the bigger drags of Web applications development is that building a system of even modest complexity is a lot like herding cats - you need a database, an applications server, an XML engine, etc., etc. And as they all come from different vendors you are faced with solving the constellation of integration issues that inevitably arise.
If you are lucky, your integration results in a smoothly functioning system. If not, you have a lot of spare parts flying in loose formation with the risk of a crash and burn at any moment.
An alternative is to look for all of these features and services in a single package but you'll find few choices in this arena.
One that is available and looks very promising is OpenLink's Virtuoso (see links below).
Virtuoso is described as a cross platform (runs on Windows, all Unix flavors, Linux, and Mac OS X) universal server that provides databases, XML services, a Web application server and supporting services all in a single package.
OpenLink's list of supported standards is impressive and includes .Net, Mono, J2EE, XML Web Services (Simple Object Application Protocol, Web Services Description Language, WS-Security, Universal Description, Discovery and Integration), XML, XPath, XQuery, XSL-T, WebDav, HTTP, SMTP, LDAP, POP3, SQL-92, ODBC, JDBC and OLE-DB.
Virtuoso provides an HTTP-compliant Web Server; native XML document creation, storage and management; a Web services platform for creation, hosting and consumption of Web services; content replication and synchronization services; free text index server, mail delivery and storage and an NNTP server.
Another interesting feature is that with Virtuoso you can create Web services from existing SQL Stored Procedures, Java classes,
C++ classes, and 'C' functions as well as create dynamic XML
documents from ODBC and JDBC data sources.
This is an enormous product and implies a serious commitment on the part of adopters due to its scope and range of services.
]]>Virtuoso is enormous by virtue of its architectural ambitions, but actual disk requirements are
Howl is Rendezvous for Windows and Linux. [via Scripting News]
In the case of UDA you can configure ODBC and JDBC consumable data source names that are hosted on the server. Users can nownbspsimply picknbspDSNs from anbspcombo box and they are ready to make connections to remote databases from any ODBC, JDBC, OLE DB, or ADO.NET application.nbspAnother benefit ofnbspZeroconfignbspis that it facilitates centralized server side configuration which further enhances our server side session rules book;nbspwhich serves all our Multi-Tier data access drivers.
In the case of Virtuoso you are able to bind to pre-configured Virtuoso instances in exactly the same way.
Our Zeroconfig support has beennbspimplemented across Solaris, AIX, Digital UNIX, IRIX, HP-UX amongst others, but this is a project of interest all the same, and we may end up contributing to this effort.
]]>Email: Killer App Or Just A Killer? While many people consider email to be the "killer app" that brought the internet into homes and businesses, now some are saying that email has become annoying and costly. The most obvious issue is with spam, but there are other things as well. Maintaining an email server is a pain, and keeping email free from viruses is an additional cost. For companies that monitor email (and there are more and more), that's yet another expense. Finally, since there are questions about email security, some companies are telling employees not to use email for sensitive material. Thus, for many companies, email is only useful for informal communications, and you can only find those messages once you wade through all the spam and viruses - or so this article would have you believe. It's really not that bad - and there are reasonable technology solutions that should be able to keep most companies afloat with minimal costs. Yes, it's annoying, but the benefits of having email certainly outweigh the annoyances associated with it. [via Techdirt]
One of the reasons for e-mail enabling Virtuoso (circa. 2000) was to set the stage for addressing what I anticipated would ultimately become the Spam Monster. This is how the solution was envisioned.
Build a Driver/Sink that could be attached to the SMTP Agents such as Sendmail, Excim, Exchange etc.. such that the mail received is actually stored in a DBMS Engine (in this case Virtuoso or an ODBC accessible database). Once the mail is in the database it is then possible for Triggers to handle filtering of the Mime headers and mail body (using regular experessions). The end result being that Spam and Virulent mails are already filtered prior to POP or IMAP retrieval.
With the emergence of Bayesian Spam Filters and other Anti Spam solutions there remains a possibility for this pursuing the best of both worlds. Enhance the DB Engine via its extensions API (In the case of Virtuosowhich supports Python, Perl etc..), or enhance the Mail Driver/Sink by extending it in a similar manner (a little more work if extensibility isn't part of the original Mail Sink design). My preference is obviously to handle this at the database level so that the Bayesian spam filter becomes a Trigger on the table into which the mail is stored.
With a database in the mix I pretty much have a rules based engine for e-mail and also a pretty flexible mechanism for dealing with false positives (nothing's perfect!) since they remain in the database too, but not automatically part of the IMAP or POP retrieval process.
At the end of the day e-mail is data and we simply need to look at data
]]>[via The Scobleizer Weblog]
Now this is good news from Microsoft! This means that products like Virtuoso can now compete head-on with Yukon (on a level playing field when it arrives) as far as Visual Studio.NET integration goes. Hopefully I will no longer have to rant about any of the following:
I wonder if the same degree of openness could extend to Web Matrix? That would be something indeed!
]]>Tim O'Reilly wrote some thoughts about network aware software. Good sumup and nice ideas, why not only blogs should be net-aware (and where even blogs can be improved ;) )
"For the desktop, my personal vision is to see existing software instrumented to become increasingly web aware. It seems that Apple are doing a good job with this. (What does web aware mean for me? Being able to grok URIs, speaking WebDAV, and using open standard data formats.)" -- Edd Dumbill[via Bitflux Blog]
Rendezvous-like functionality for automatic discovery of and potential synchronization with other instances of the application on other computers. Apple is showing the power of this idea with iChat and iTunes, but it really could be applied in so many other places. For example, if every PIM supported this functionality, we could have the equivalent of "phonester" where you could automatically ask peers for contact information. Of course, that leads to guideline 2.
Another application is discovery of ODBC data sources, and database servers. Rendezvous can also simply security and administration of data sources accessible by either one of these standards data access mechanisms. It can also apply to XML databases and data sources exposed by XML Databases.
The very point I continue to make about Internet Points of Presence beingactual data acces points, in short these end points should be served by database serverprocesses. This is the very basis of Virtuoso, the inevitability of this realization remains the undepinings of this product. There are other products out there that have some sense of this vision too, but there is a little snag (at least so far in my research efforts), and that is the tendency to create dedicated independent server per protocol (an ultimate integration, administration, and maintenance nightmare).
The performance issues arenbspnow long forgotten (at least as far as OpenLink Software's contribution to ODBC goes). But the ODBC Drivers must be FREE as they offer little or no value problem rages on.
The Usenet posting below pretty much sums up why I decided that OpenLink needednbspto get into the ODBC Driver business in the first place. We anticipated significant problems in the area of usability, configurability and security if all a driver had to offer was query fulfillment in the form of a result set.
The excerpt below shows an all too common dilemma with ODBC (should you reach rollout and put ODBC in the hands of information and knowledge workers):
nbspHi all,
I set up an Excel spreadsheet to our production database through ODBC driver to get a report. Everything was working fine, and life was good until I found a little problem with the SQL tool in Excel.
Normally, to get a report Excel will write a select statement according to criterias that the users input/ choose. It also allows anyone to Edit the select statement it writes in a little box. What I did was changing that select statement to delete/update statement. And it ran.
What surprised me was that it actually ran the statement against the database and delete/update tables accordingly. This is not what we want. I have not been able to find any options to turn this thing off so that the user cannot edit the generated select SQL.
I know all the permissions the user has are defined through the username that is defined in ODBC. We don't want to change all the user permissions on the database side. Is there any other way ? MS Excel 2000 Informix IDS 9.30 UC1 Dynix/ptx V4.5.3 Thanks N.
The user's ODBC usage requirements are unconventional to a database engine. What do I mean? Well relational databases fundamentally handle security on a user or role basis, and this security schemes can be applied to tables and rows, but it does nothing for this scenario.
The ODBC Drivers from OpenLink Software were built (in 1993 I might add)nbspwith thisnbspmiddleware predicamentnbspand more in mind. As you might imagine, most ODBC vendors will tell you to sort out the security either at thenbspdatabase end or the client application end.nbsp
OurnbspDrivers (the Multi-Tier variant) on the other hand enable you to configure a set of rules that will enforce read-only access on an application basis such that in this particular case when Excel is used the session is read-only irrespective of what exits MS Query. The rules can even enable read-write or read-only access to Excel (or other ODBC compliant application) and the basis of any combination of the following: username, client ip, machine alias, application, lan subnet, and any user definable profile (we call these domains).
Additional reading as this is only the tip of the iceberg.
]]>Burlington, MA. 25 June 2003 - OpenLink Software, Inc., industry and technology leader in the development and deployment of secure, high-performance database connectivity drivers for ODBC
]]>Apple's benchmarks under fire ZDNet Jun 25 2003 7:13AM ET
[via Moreover - ZDNet]
OpenLink Software has provided an Open Source benchmark utility that support Mac OS X, Linux, and UNIX. Thus, if mission critical database oriented performance is what is most relevant to your needs (as opposed to Photoshop) then simply download either one, or both of the following:
OpenLink ODBC Bench (you can test TPC-A and TPC-C like performance of the G5 and compare against other platforms) via ODBC)
OpenLink JDBC Bench (same thing using JDBC)
]]>... that between MS and open source initiatives, there will be no room in the marketplace for small software companies.
This is an interesting article, one that brings up many good points. Overall, I don't really agree with him. I think, for the most part, commercial software companies will still be able to co-exist with MS and open source in the market place.
Open source software is great, and I think its a great resource of applications for technologically savvy people (especially developers), but it's still not targeted at novice end users (like my mom). Will this change in the future? Probably to some extent, but I still think open source will lag behind commercial software vendors due to lack of marketing and polishing.
On the other hand, I do see a specific niche of the software development market dieing out in the next few years due to the open source movement: custom control vendors. These guys have been lucky for the past decade because most development shops don't want to spend the time writing the next cool button bar, but many developers on their own time love this kind of development. Just look at Code Project. Anytime we need a control for one of our apps, Code Project is the first place we go. Why? Well, not only do we get great controls, we get the source! This way, if something is screwed up (which is has been), we can fix the bug and move on (which we do [and send it to the control developer]).
BTW, if anyone thinks that a small development shop can't exist when competing against the likes of MS and open source, check out fog creek software. Joel has managed to create a profitable software company, that sells a few great products, make a little bit of money and provide his developers offices.
One day maybe I'll be fortunate enough to work for Joel!
[via WebLogs @ ASP.NET]
It continues to amaze me that the fundamental implications of corporate data access remains misunderstood by all parties in the ITsphere. How can any organization afford to be ambivalent about where data is stored, and their ability to transform this data into information and knowledge (ultimate competitive advantage)? Data is the most valuable company asset (we even had data in the enterprise before computers!).
Mac OS X is attempting to make a serious push into the enterprise, but how can this be taken seriously if solving one of the biggest problems in the enterprise today isn't a flagship item driving the enterprise marketing strategy? The excerpt below simply sums this up:
One of the new, albeit virtually undocumented features included in Jaguar is ODBC, or Open Database Connectivity. ODBC allows programs to connect to databases from different vendors using the same set of connectivity protocols. This allows for simplified database programming as well as database access from programs that normally would not allow such access. For instance, with ODBC you can use Excel to get data from MySQL, or you can use FileMaker to get data from Oracle.
From article titled Open Database Connectivity in Jaguar by Andrew Anderson
Open Database Connectivity is the only mechanism today that will enable any application to connect to any database without compromising choices across the following lines: Operating System, Programming Language, Desktop Productivity Tools, and Database Engine. All alternatives fail in one of the listed areas, with the ultimate destination being the painful realization that you are down a technology cul-de-sac (and these cost money via integration and data access quagmires).
]]>
Easy Data UI Generation. Web Matrix makes it easy to create data bound pages without writing code. Drop SQL/MSDE or Access tables on your page to create data-bound grids, or start with Data Page templates for reports or Master/Detail pages. Code builders help you generate code to select, insert, update and delete SQL/MSDE or Access data.
[via WebLogs @ ASP.NET]
Ingres (technically, Advantage Ingres Enterprise) is, arguably, the forgotten database. There used to be five major databases: Oracle, DB2, Sybase, Informix and Ingres. Then along came Microsoft and, if you listened to most press comment (or the lack of it), you would think that there were only two of these left, plus SQL Server. [From IT-Director]
]]>Oracle, Microsoft, and IBM would certainly like the illusion of a 3 horse race, as this is the only way they can induce Ingres, Informix, and Sybase users to jump ship, and this, even though database migrations are by far the most risk prone and problematic aspects of any IT infrastructure.
Here is the interesting logic from the self-made big three, if you want to take advanatage of new paradigms and technologies such as XML, Web Services, and anything else in the pipeline you have to move all your data out of these databases, and then get all the mission critical applications re-associated with one of these databases, and by the way when you do so it is advisable that you use native interfaces (so that sometime in the future you have no chance whatsoever of repeating this folly at their expense).
The simple fact of the matter (which the self-made big three do not want you to know) is that you can put ODBC, JDBC, even platform specific data access APIs such as OLE DB and ADO.NET atop any of these databases, and then explore and exploit the benefits of new technologies and paradigms as long as the tool pool supports one of more of these standards.
Unfortunately the no-brainer above appears to be the more difficult of the choices before decision makers. In other words, many would rather dig themselves into a deeper hole (unknowingly i can only presume) that ultimately leads to technology lock-in.
The biggest challenge before any RDBMS based infrastructure today isn't which of the self-made big three to migrate to wholesale, rather, how to make progressive use of the pool of disparate applications, and application databases that proliferate the enterprise.
This is another way of understanding the burgeoning market for Virtual Databases, which in my opiion present the new frontier in database technology.
Borland, Microsoft Move Closer on Databases The two competitors put differences aside to push a new environment on the .NET Framework. [via Internet Product News]
Ah! Now I get it, let's be database specific by bundling developer copies of all the supported databases. So we get one really huge product simply becuase we don't realize that ADO.NET pretty much ensures some degree of DB independence (ODBC delivers the real thing for SQL. but that's old hat! Not!).
Ah! Now I get it, let's be database specific by bundling developer copies of all the supported databases. So we get one really huge product simply becuase we don't realize that ADO.NET pretty much ensure some degree of DB independence (ODBC delivers the real thing for SQL. but that's old hat! Not).
]]>
Yes, but I don't see how any of the data access mechanisms listed have anything to do with the problem as each serves a different application domain.
Well, guess what? You won't have to wait too much longer to see this dream become reality. All major vendors are not only working on this issue seriously but have already started releasing SDKs for exactly this purpose. The point of commonality between them is XML. Like it or not, XML is becoming a standard part of programmers' lives because it's simple and easy to understand structure make it a suitable format for transferring data between disparate applications and platforms. But the real power of XML lies in its ability to not only extend itself, but also define itself.
This is all well and good but it is XML, yes a uniforming technology, but the opening paragraph undermines the pont!
XML For Analysis is just another addition to the pool of data access mechanisms, it is an XML./SOAP based data access mechanism for Tabular and Mulitdimensional Data and without our implementation XMLA simply offers little to anyone trying to get at tabular data (SQL Data) as many of the current implementers fall into the OLAP camp, and there is no standard query language for persistenf CUBES (MDX from Microsoft is de facto at best). Thus, without Virtuoso tabular data (where the real data actually resides) is out of the picture.
So we say goodbye to ODBC, JDBC, OLE DB to get what exactly? From our perspective, another showcase opportunity, but for the innocent readers of this article I haven't got the faintest idea.
]]>
 http://www.sqlsecurity.com/DesktopDefault.aspx?tabindex=2&;tabid=3
Are other SQL Servers (Sybase, Oracle, DB2) subject to SQL injection? Yes, to varying degrees. Here is a site that can get you more details on some of the issues with other SQL Servers. http://www.owasp.org What is SQL Injection and why is all this information not included in the regular FAQ?SQL Injection is simply a term describing the act of passing SQL code into an application that was not intended by the developer.  Since this topic is not specifically restricted to SQL Server it is not included in the normal FAQ. In fact, much of the problems that allow SQL injection are not the fault of the database server per-se but rather are due to poor input validation and coding at other code layers. However, due to the serious nature and prevalence of this problem I feel its inclusion in a thorough discussion of SQL Server security is warranted. What causes SQL Injection? SQL injection is usually caused by developers who use "string-building" techniques in order to execute SQL code. For example, in a search page, the developer may use the following code to execute a query (VBScript/ASP sample shown): Set myRecordset = myConnection.execute("SELECT * FROM myTable WHERE someText ='" & request.form("inputdata") & "'") The reason this statement is likely to introduce an SQL injection problem is that the developer has made a classic mistake - poor input validation. We are trusting that user has not entered something malicious - something like the innocent looking single quote ('). Let's consider what would happen if a user entered the following text into the search form: ' exec master..xp_cmdshell 'net user test testpass /ADD' -- Then, when the query string is assembled and sent to SQL Server, the server will process the following code: SELECT * FROM myTable WHERE someText ='' exec master..xp_cmdshell 'net user test testpass /ADD'--' Notice, the first single quote entered by the user closed the string and SQL Server eagerly executes the next SQL statements in the batch including a command to add a new user to the local accounts database. If this application were running as 'sa' and the MSSQLSERVER service is running with sufficient privileges we would now have an account with which to access this machine. Also note the use of the comment operator (--) to force the SQL Server to ignore the trailing quote placed by the developer's code. Very intresting that these are all Native Interface based exploits. So the security issue isn't ODBC, JDBC, ADO.NET, or OLE DB specific (although they certainly increase the potential damage that can be unleashed via metadata analysis en route to that huge Cartesian Product ; the mother of all Exploits!). Our Session Rules Book was devised in 1993 with many of these issues in mind, and to this date there are no other ODBC/JDBC/OLE DB products out there that even come close to acknowledging this reality. |
http://www.sqlsecurity.com/DesktopDefault.aspx?tabindex=2&;tabid=3
Are other SQL Servers (Sybase, Oracle, DB2) subject to SQL injection? Yes, to varying degrees. Here is a site that can get you more details on some of the issues with other SQL Servers. http://www.owasp.org What is SQL Injection and why is all this information not included in the regular FAQ?SQL Injection is simply a term describing the act of passing SQL code into an application that was not intended by the developer. Since this topic is not specifically restricted to SQL Server it is not included in the normal FAQ. In fact, much of the problems that allow SQL injection are not the fault of the database server per-se but rather are due to poor input validation and coding at other code layers. However, due to the serious nature and prevalence of this problem I feel its inclusion in a thorough discussion of SQL Server security is warranted. What causes SQL Injection? SQL injection is usually caused by developers who use "string-building" techniques in order to execute SQL code. For example, in a search page, the developer may use the following code to execute a query (VBScript/ASP sample shown): Set myRecordset = myConnection.execute("SELECT * FROM myTable WHERE someText ='" & request.form("inputdata") & "'") The reason this statement is likely to introduce an SQL injection problem is that the developer has made a classic mistake - poor input validation. We are trusting that user has not entered something malicious - something like the innocent looking single quote ('). Let's consider what would happen if a user entered the following text into the search form: ' exec master..xp_cmdshell 'net user test testpass /ADD' -- Then, when the query string is assembled and sent to SQL Server, the server will process the following code: SELECT * FROM myTable WHERE someText ='' exec master..xp_cmdshell 'net user test testpass /ADD'--' Notice, the first single quote entered by the user closed the string and SQL Server eagerly executes the next SQL statements in the batch including a command to add a new user to the local accounts database. If this application were running as 'sa' and the MSSQLSERVER service is running with sufficient privileges we would now have an account with which to access this machine. Also note the use of the comment operator (--) to force the SQL Server to ignore the trailing quote placed by the developer's code. Very intresting that these are all Native Interface based exploits. So the security issue isn't ODBC, JDBC, ADO.NET, or OLE DB specific (although they certainly increase the potential damage that can be unleashed via metadata analysis en route to that huge Cartesian Product ; the mother of all Exploits!). Our Session Rules Book was devised in 1993 with many of these issues in mind, and to this date there are no other ODBC/JDBC/OLE DB products out there that even come close to acknowledging this reality. |