In this post I provide a brief re-introduction to this essential aspect of Virtuoso.
This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).
In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.
The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!
You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).
You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).
It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.
As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.
When trying to understand HTTP based Linked Data, especially if you're well versed in DBMS technology use (User, Power User, Architect, Analyst, DBA, or Programmer) think:
Remember the need for Data Access & Integration technology is the by product of the following realities:
As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
See: AVF Pyramid Diagram.The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
See: RDBMS Primacy Diagram.For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
See: New EAV/CR Primacy Diagram.As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.
It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.
The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.
Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.
You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.
Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.
A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.
From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:
From a Middleware perspective it provides:
From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering
From the general System Administrator's perspective it provides:
Higher level user oriented offerings include:
For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:
Like Apache, Virtuoso is a bona-fide Web Application Server for PHP based applications. Unlike Apache, Virtuoso is also the following:
As result of the above, when you deploy a PHP application using Virtuoso, you inherit the following benefits:
As indicated in prior posts, producing RDF Linked Data from the existing Web, where a lot of content is deployed by PHP based content managers, should simply come down to RDF Views over the SQL Schemas and deployment / publishing of the RDF Views in RDF Linked data form. In a nutshell, this is what Virtuoso delivers via its PHP runtime hosting and pre packaged VADs (Virtuoso Application Distribution packages), for popular PHP based applications such as: phpBB3, Drupal, WordPress, and MediaWiki.
In addition, to the RDF Linked Data deployment, we've also taken the traditional LAMP installation tedium out of the typical PHP application deployment process. For instance, you don't have to rebuild PHP 3.5 (32 or 64 Bit) on Windows, Mac OS X, or Linux to get going, simply install Virtuoso, and then select a VAD package for the relevant application and you're set. If the application of choice isn't pre packaged by us, simply install as you would when using Apache, which comes dow to situating the PHP files in your Web structure under the Web Application's root directory.
At the current time, I've only provided links to ZIP files containing the Virtuoso installation "silent movies". This approach is a short-term solution to some of my current movie publishing challenges re. YouTube and Vimeo -- where the compressed output hasn't been of acceptable visual quality. Once resolved, I will publish much more "Multimedia Web" friendly movies :-)
]]>From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).
Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)
CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).
CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.
CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.
CrunchBase: Can you put into laymanâs terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.
CrunchBase: On your website you wrote that âRDF and SPARQL as productivity boosters in everyday web developmentâ. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).
Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.
Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:
]]>By coincidence, Glenn and I presented at this month's Cambridge Semantic Web Gathering.
I've provided a dump of Glenn's issues and my responses below:
RDF is a Graph based Data Model it stands for Resource Description Framework. The Metadata data angle comes from it's Meta Content Framework (MCF) origins. You can express and serialize data based on the RDF Data Model using: Turtle, N3, TriX, N-Triples, and RDF/XML.
These are just appeasement:
- old query paradigm: fishing in dark water with superstitiously tied lures; only works well in carefully stocked lakes
- we don't ask questions by defining answer shapes and then hoping they're dredged up whole.
SPARQL, MQL, and Entity-SQL are Graph Model oriented Query Languages. Query Languages always accompany Database Engines. SQL is the Relational Model equivalent.
Noble attempt to ground the abstract, but:
- URI dereferencing/namespace/open-world issues focus too much technical attention on cross-source cases where the human issues dwarf the technical ones anyway
- FOAF query over the people in this room? forget it.
- link asymmetry doesn't scale
- identity doesn't scale
- generating RDF from non-graph sources: more appeasement, right where the win from actually converting could be biggest!
Innovative use of HTTP to deliver "Data Access by Reference" to the Linked Data Web.
When you have a Data Model, Database Engine, and Query Language, the next thing you need is a Data Access mechanism that provides "Data Access by Reference". ODBC and JDBC (amongst others) provide "Data Access by Reference" via Data Source Names. Linked Data is about the same thing (URIs are Data Source Names) with the following differences:
Hugely motivating and powerful idea, worthy of a superhero (Graphius!), but:
- giant and global parts are too hard, and starting global makes every problem harder
- local projects become unmanageable in global context (Cyc, Freebase data-modeling lists...).
And my thus my plea, again. Forget "semantic" and "web", let's fix the database tech first:
- node/arc data-model, path-based exploratory query-model
- data-graph applications built easily on top of this common model; building them has to be easy, because if it's hard, they'll be bad
- given good database tech, good web data-publishing tech will be trivial!
- given good tools for graphs, the problems of uniting them will be only as hard as they have to be.
Giant Global Graph is just another moniker for a "Web of Linked Data" or "Linked Data Web".
Multi-Model Database technology that meshes the best of the Graph & Relational Models exist. In a nutshell, this is what Virtuoso is all about and it's existed for a very long time :-)
Virtuoso is also a Virtual DBMS engine (so you can see Heterogeneous Relational Data via Graph Model Context Lenses). Naturally, it is also a Linked Data Deployment platform (or Linked Data Sever).
The issue isn't the "Semantic Web" moniker per se., it's about how Linked Data (foundation layer of Semantic Web) gets introduced to users. As I said during the MIT Gathering: "The Web is experienced via Web Browsers primarily, so any enhancement to the Web must be exposed via traditional Web Browsers", which is why we've opted to simply add "View Linked Data Sources" to the existing set of common Browser options that includes:
By exposing the Linked Data Web option as described above, you enable the Web user to knowingly transition from the traditional Rendered (X)HTML page view to the Linked Data View (i.e., structured data behind the page). This simple "User Interaction" tweak makes the notion of exploiting a Structured Web becomes somewhat clearer.
The Linked Data Web isn't a panacea. It's just an addition to the existing Web that enrichens the things you can do with the Web. It's predominance, like any application feature, will be subject to the degrees to which it delivers tangible value or matrializes internal and external opportunity costs.
Note: The Web isn't ubiquitous today becuase all it's users groked HTML Markup. It's ubquitity is a function of opportunity costs: there simply came a point in the Web boostrap when nobody could afford the opportunity costs associated with being off the Web. The same thing will play out with Linked Data and the broader Semantic Web vision.
Links:Naturally, we've decided to join the Crunchbase RDFization party, and have just completed a Virtuoso Sponger Cartridge (an RDFizer) for Crouncbase. What we add in our particular cartridge is additional meshing with DBpedia and Wikicompany Linked Data Spaces, plus RDFizaton of the Crunchbase (X)HTML pages :-)
As I've postulated for a while, Linked Data is about data "Meshing" and "Meshups". This isn't a buzzword play. I am pointing out an important distinction between "Mashups" and "Meshpus". Which goes as follows: "Mashups" are about code level joining devoid of structured modelling, hence the revelation of code as opposed to data when you look behind a "Mashup". "Meshups" on the other hand, are about joining disparate structured data sources across the Web. And when you look behind a "Meshup" you see structured data (preferably Linked Data) that enables further "Meshing".
I truly believe that we are now inches away from critical mass re. Linked Data, and because we are dealing with data, the network-effect will be sky-high! I shudder to think about the state of the Linked Data Web in 12 months time. Yes, I am giving the explosion 12 months (or less). These are very exciting times.
Demo Links:
For best experience I encourage you to look at the OpenLink Data Explorer extension for Firefox (2.x - 3.x). This enables you to go to Crunchbase (X)HTML pages (and other sites on the Web of course), and then simply use the "View | Linked Data Sources" main or context menu sequence to unveil the Linked Data Sources associated with any Web Page.
Of course there is much more to come!
]]>My use of "old media" implies: a place that still seeks subscriber data (no OpenID etc..), for the umpteenth time, as the toll fee for discourse development and participation on the Web.
Anyway, here is what I attempted to post as a comment to Dan Grigorovici's post titled: Where is the Semantic Web Killer App?
Dan,
An intriguing post to say the least :-)
"Linked Data" and "Semantic Web" aren't synonymous, they are simply connected, infrastructure DNA-wise. You can have "Semantic Web" style graphs (i.e RDF Data) and not have "Linked Data" as per Linked Data deployment tenets and best practices, a very important point.
I've stated repeatedly, the "Linked Data" emphasis has more to do with focusing on a point of crystallization within the larger "Semantic Web" vision, so here is a quick recap:
A term coined by TimBL that describes an application of HTTP to the time-tested process of "Data Access by Reference". "Linked Data" adds vital items to the "Data Access by Reference" pattern that have been erstwhile unattainable:
So we have HTTP based URIs as the Data Sources Names for a "Linked Data Web" i.e a Web of inter-connected Data Source Names that de-emphasize the importance of their host containers (Compound Documents / Information Resources).
The business case or value proposition of "Linked Data" is synonymous with the value proposition of data access technologies such as ODBC, JDBC. ADO.NET, OLE-DB, XMLA, and others (enterprise or consumer) in relation to the Individual and Enterprise pursuit of agility; in a realm where data is growing exponentially, and the maximum processing time in a single day remains 24 hrs. Data Access & Data Integration are timeless challenges due to the following constants:
Note: The line between the Enterprise & Individuals continue to blur by the second, this is something I covered during my Linked Data Planet keynote, which is like most things I put on the Web (via this blog data space), is a live and practical demonstration of the virtues of Linked Data courtesy of RDFa, the Bibliographic Ontology, and dereferencable URIs (i.e. HTTP based Data Source Names for Documents and the Entities they host).
ODBC identifies data sources using Data Source Names (DSNs).
WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.
ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.
WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).
ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).
WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!
So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!
Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.
URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.
I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.
The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.
By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)
]]>The Linked Data Web (aka. Linked Data) describes RDF data injected into the Web, where the Data Object Identifiers (URIs) in an RDF graph (collection of RDF triples) are endowed with HTTP based URIs. The net effect of this approach to Data Object Identity is that it facilitates "Open Data Access by Reference" on the Web (aka data dereferencing).
If you recall pre Web ubiquity, in the enterprise realm for instance, Open Database Connectivity (ODBC) emerged as a mechanism for separating Data Access and Data Management in the database oriented Client-Sever model. Although ODBC gave you access to data, the data access entry point took the form of a data access specific naming mechanism called a "Data Source Name" (DSN). ODBC DSNs typically exposed Tables or Views. The same thing applies to JDBC where a non HTTP based URN scheme applies.
Zip forward to where we are today on the Web; the Web is evolving from a Document centric Database to a Distributed Object Database, and you should see that in Linked Data we are now truly looking at the best of all worlds: Web Open Database Connectivity (WODBC) with the following advantages:
To conclude, we now have "Semantics Inside" (RDF or non RDF), "Semantic Web" (RDF graphs with Object Identifiers that may or may not be HTTP based), and "Linked Data Web" (RDF graphs with Object Identifiers that must be HTTP based and dereferencable) oriented applications, in the emerging landscape associated with the "Semantics" moniker.
As per usual, this post is a record in my Blog oriented Data Space on the Web. The permalink of this post is a URI constructed with Giant Global Graph enrichment in mind :-)
]]>Daniel simplifies my post by using diagrams to depict the different paths for PHP based applications exposing Linked Data - especially those that already provide a significant amount of the content that drives Web 2.0.
If all the content in Web 2.0 information resources are distillable into discrete data objects endowed with HTTP based IDs (URIs), with zero "RDF handcrafting Tax", what do we end up with? A Giant Global Graph of Linked Data; the Web as a Database.
So, what used to apply exclusively, within enterprise settings re. Oracle, DB2, Informix, Ingres, Sybase, Microsoft SQL Server, MySQL, PostrgeSQL, Progress Open Edge, Firebird, and others, now applies to the Web. The Web becomes the "Distributed Database Bus" that connects database records across disparate databases (or Data Spaces). These databases manage and expose records that are remotely accessible "by reference" via HTTP.
As I've stated at every opportunity in the past, Web 2.0 is the greatest thing that every happened to the Semantic Web vision :-) Without the "Web 2.0 Data Silo Conundrum" we wouldn't have the cry for "Data Portability" that brings a lot of clarity to some fundamental Web 2.0 limitations that end-users ultimately find unacceptable.
In the late '80s, the SQL Access Group (now part of X/Open) addressed a similar problem with RDBMS silos within the enterprise that lead to the SAG CLI which is exists today as Open Database Connectivity.
In a sense we now have WODBC (Web Open Database Connectivity), comprised of Web Services based CLIs and/or traditional back-end DBMS CLIs (ODBC, JDBC, ADO.NET, OLE-DB, or Native), Query Language (SPARQL Query Language), and a Wire Protocol (HTTP based SPARQL Protocol) delivering Web infrastructure equivalents of SQL and RDA, but much better, and with much broader scope for delivering profound value due to the Web's inherent openness. Today's PHP, Python, Ruby, Tcl, Perl, ASP.NET developer is the enterprise 4GL developer of yore, without enterprise confinement. We could even be talking about 5GL development once the Linked Data interaction is meshed with dynamic languages (delivering higher levels of abstraction at the language and data interaction levels). Even the underlying schemas and basic design will evolve from Closed World (solely) to a mesh of Closed & Open World view schemas.
]]>In the form above (the norm), Wordpress data can be injected into the Linked Data Web via RDFization middleware such as theVirtuoso Sponger (built into all Virtuoso instances) and Triplr. The downside of this approach is that the blog owner doesn't necessary possess full control over their contributions to the emerging Giant Global Graph or Linked Data.
Another route to Linked Data exposure is via Virtuoso's Metaschema Language for producing RDF Views over ODBC/JDBC accessible Data Sources, that enables the following setup:
Alternatively, you can also exploit Virtuoso as the SQL DBMS, RDF DBMS, Application Server, and Linked Data Deployment platform:
How Do I map the WordPress SQL Schema to RDF using Virtuoso?
Read the Meta Schema Language guide or simply apply our "WordPress SQL Schema to RDF" script to your Virtuoso hosted instance. Of course, there are other mappings that cover other PHP applications deployed via Virtuoso:
Now I can simply state the following using Linked Data (hyperdata) links:
OpenLink Software's product porfolio is comprised of the following product families:We no longer have to explain (repeatedly) why our drivers exist in Express, Lite, and Multi-Tier Edition formats, or why you ultimately need Multi-Tier Drivers over Single Tier Drivers (Express or Lite Editions) since you ultimately heed high-performance, data encryption, and policy based security across each of the data access driver formats.
]]>How Do I create the missing Bitmap Indexes?
Go to the HTML based Virtuoso Conductor, iSQL command line interface, or an ODBC / JDBC / ADO.NET / OLE DB client and execute:
CREATE BITMAP index RDF_QUAD_POGS on DB.DBA.RDF_QUAD (P,O,G,S);
CREATE BITMAP index RDF_QUAD_PSOG on DB.DBA.RDF_QUAD (P,S,O,G);
CREATE BITMAP index RDF_QUAD_SOPG on DB.DBA.RDF_QUAD (S,O,P,G);
Writing a JDBC Driver for SPARQL is a little overkill. OpenOffice.org simply needs to make XML or Web Data (HTML, XHTML, and XML) bonafide data sources within its "Pivot Table" functionality realm. Then all that would then be required is a SPARQL SELECT Query transported via the SPARQL Protocol with results sent back using the SPARQL XML results serialization format (all part of a single SPARQL Protocol URL).
Excel successfully consumes the following information resource URI: http://tinyurl.com/yvoccj (a tiny url for a SPARQL SELECT against my FOAF file).
Alternatively, and currently achievable, you could simply use SPASQL (SPARQL within SQL) using a DBMS engine that supports SQL, SPARQL, and SPARQL e.g. Virtuoso.
Virtuoso SPASQL support is exposed via it's ODBC and/or JDBC Drivers. Thus you can do things such as:
BTW - My News Years Resolution: get my act together and shrink the ever increasing list of "simple & practical Virtuoso use case demos" on my todo which now spans all the way back to 2006 :-(
]]>On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS.
Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol.
The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers).
HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc).
Examples of Information Resource and Data Source URIs:
Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL).
Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following:
The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as an Information Resource. It's comprised of granular data items called "Entities", that expose fine grained properties values, individual and/or group characteristics (attributes), and relationships (associations) with other Entities.
The Web is in the final stages of the 3rd phase of it's evolution. A phase characterized by the shuttling of structured data payloads (RDF) alongside less data oriented payloads (HTML, XHTML, XML etc.). As you can see, Linked Data and Structured Data are both terms used to describe the addition of more data centric payloads to the Web. Thus, you could view the process of creating a Structured Web of Linked Data as follows:
The Semantic Data Web is an evolution of the current Web (an Information Space) that adds structured data payloads (RDF) to current, less data oriented, structured payloads (HTML, XHTML, XML, and others).
The Semantic Data Web is increasingly seen as an inevitability because it's rapidly reaching the point of critical mass (i.e. network effect kick-in). As a result, Data Web emphasis is moving away from: "What is the Semantic Data Web?" To: "How will Semantic Data Web make our globally interconnected village an even better place?", relative to the contributions accrued from the Web thus far. Remember, the initial "Document Web" (Web 1.0) bootstrapped because of the benefits it delivered to blurb-style content publishing (remember the term electronic brochure-ware?). Likewise, in the case of the "Services Web" (Web 2.0), the bootstrap occurred because it delivered platform independence to Web Application Developers - enabling them to expose application logic behind Web Services. It is my expectation that the Data Integration prowess of the Data Web will create a value exchange realm for data architects and other practitioners from the database and data access realms.
A vital component of the new Virtuoso release is the finalization of our SQL to RDF mapping functionality -- enabling the declarative mapping of SQL Data to RDF. Additional technical insight covering other new features (delivered and pending) is provided by Orri Erling, as part of a series of post-Banff posts.
A majority of the world's data (especially in the enterprise realm) resides in SQL Databases. In addition, Open Access to the data residing in said databases remains the biggest challenge to enterprises for the following reasons:
Enterprises have known from the beginning of modern corporate times that data access, discovery, and manipulation capabilities are inextricably linked to the "Real-time Enterprise" nirvana (hence my use of 0.0 before this becomes 3.0).
In my experience, as someone whose operated in the data access and data integration realms since the late '80s, I've painfully observed enterprises pursue, but unsuccessfully attain, full control over enterprise data (the prized asset of any organization) such that data-, information-, knowledge-workers are just a click away from commencing coherent platform and database independent data drill-downs and/or discovery that transcend intranet, internet, and extranet boundaries -- serendipitous interaction with relevant data, without compromise!
Okay, situation analysis done, we move on..
At our most recent (12th June) monthly Semantic Web Gathering, I unveiled to TimBL and a host of other attendees a simple, but powerful, demonstration of how Linked Data, as an aspect of the Semantic Data Web, can be applied to enterprise data integration challenges.
The vision of data, information, or knowledge at your fingertips is nigh! Thanks to the infrastructure provided by the Semantic Data Web (URIs, RDF Data Model, variety of RDF Serialization Formats[1][2][3], and Shared Data Dictionaries / Schemas / Ontologies [1][2][3][4][5]) it's now possible to Virtualize enterprise data from the Physical Storage Level, through the Logical Data Management Levels (Relational), up to a Concrete Conceptual Model (Graph) without operating system, development environment or framework, or database engine lock-in.
We produce a shared ontology for the CRM and Business Reporting Domains. I hope this experiment clarifies how this is quite achievable by converting XML Schemas to RDF Data Dictionaries (RDF Schemas or Ontologies). Stay tuned :-)
Also watch TimBL amplify and articulate Linked Data value in a recent interview.
To deliver a mechanism that facilitates the crystallization of this reality is a contribution of boundless magnitude (as we shall all see in due course). Thus, it is easy to understand why even "her majesty", the queen of England, simply had to get in on the act and appoint TimBL to the "British Order of Merit" :-)
Note: All of the demos above now work with IE & Safari (a "remember what Virtuoso is epiphany") by simply putting Virtuoso's DBMS hosted XSLT engine to use :-) This also applies to my earlier collection of demos from the Hello Data Web and other Data Web & Linked Data related demo style posts.
]]>OAT: OpenAjax Alliance Compliant Toolkit: "
Ondrej Zara and his team at Openlink Software have created a Openlink Software JS Toolkit, known as OAT. It is a full-blown JS framework, suitable for developing
rich applications with special focus to data access.
OAT works standalone, offers vast number of widgets and has some rarely seen features, such as on-demand library loading (which reduces the total amount of downloaded JS code).
OAT is one of the first JS toolkits which show full OpenAjax Alliance conformance: see the appropriate wiki page and conformance test page.
There is a lot to see with this toolkit:
You can see some of the widgets in a Kitchen sink application
Sample data access applications:
OAT is Open Source and GPLâed over at sourceforge and the team has recently managed to incorporate our OAT data access layer as a
module to dojo datastore.
(Via Ajaxian Blog.)
This is a corrected version of the initial post. Unfortunately, the initial post was inadvertently littered with invalid links :-( Also, since the original post we have released OAT 1.2 that includes integration of our iSPARQL QBE into the OAT Form Designer application.
Re. Data Access, It is important to note that OAT's Ajax Database Connectivity layers supports data binding to the following data source types:
OAT also includes a number of prototype applications that are completely developed using OAT Controls and Libraries:
Note: Pick "Local DSN" from page initialization dialog's drop-down list control when prompted
]]>A declarative language adapted from SPARQL's graph pattern language (N3/Turtle) for mapping SQL Data to RDF Ontologies. We currently refer to this as a Graph Pattern based RDF VIEW Definition Language.
It provides an effective mechanism for exposing existing SQL Data as virtual RDF Data Sets (Graphs) negating the data duplication associated with generating physical RDF Graphs from SQL Data en route to persistence in a dedicated Triple Store.
Enterprise applications (traditional and web based) and most Web Applications (Web 1.0 and Web 2.0) sit atop relational databases, implying that SQL/RDF model and data integration is an essential element of the burgeoning "Data Web" (Semantic Web - Layer 1) comprehension and adoption process.
In a nutshell, this is a quick route for non disruptive exposure of existing SQL Data to SPARQL supporting RDF Tools and Development Environments.
CREATE GRAPH IRI("http://myopenlink.net/dataspace")
CREATE IRI CLASS odsWeblog:feed_iri "http://myopenlink.net/dataspace/kidehen/weblog/MyFeeds" ( in memb varchar not null, in inst varchar not null)
The screencasts covered the following functionality realms:
To bring additional clarity to the screencasts demos and OAT in general, I have saved a number of documents that are the by products of activities in the screenvcasts:
Notes:
You can see a full collection of saved documents at the following locations:
Virtuoso extends its SQL3 implementation with syntax for integrating SPARQL into queries and subqueries.Thus, as part of a SQL SELECT query or subquery, one can write the SPARQL keyword and a SPARQL query as part of query text processed by Virtuoso's SQL Query Processor.
Using Virtuoso's Command line or the Web Based ISQL utility type in the following (note: "SQL>" is the command line prompt for the native ISQL utility):
SQL> sparql select distinct ?p where { graph ?g { ?s ?p ?o } };
Which will return the following:
p varchar ---------- http://example.org/ns#b http://example.org/ns#d http://xmlns.com/foaf/0.1/name http://xmlns.com/foaf/0.1/mbox ...
SQL> select distinct subseq (p, strchr (p, '#')) as fragment from (sparql select distinct ?p where { graph ?g { ?s ?p ?o } } ) as all_predicates where p like '%#%' ;
fragment varchar ---------- #query #data #name #comment ...
You can pass parameters to a SPARQL query using a Virtuoso-specific syntax extension. '??' or '$?' indicates a positional parameter similar to '?' in standard SQL. '??' can be used in graph patterns or anywhere else where a SPARQL variable is accepted. The value of a parameter should be passed in SQL form, i.e. this should be a number or an untyped string. An IRI ID can not be passed, but an absolute IRI can. Using this notation, a dynamic SQL capable client (ODBC, JDBC, ADO.NET, OLEDB, XMLA, or others) can execute parametrized SPARQL queries using parameter binding concepts that are common place in dynamic SQL. Which implies that existing SQL applications and development environments (PHP, Ruby, Python, Perl, VB, C#, Java, etc.) are capable of issuing SPARQL queries via their existing SQL bound data access channels against RDF Data stored in Virtuoso.
Note: This is the Virtuoso equivalent of a recently published example using Jena (a Java based RDF Triple Store).
Create a Virtuoso Function by execting the following:
SQL> create function param_passing_demo (); { declare stat, msg varchar; declare mdata, rset any; exec ('sparql select ?s where { graph ?g { ?s ?? ?? }}', stat, msg, vector ('http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#int1', 4 ), -- Vector of two parameters 10, -- Max. result-set rows mdata, -- Variable for handling result-set metadata rset -- Variable for handling query result-set ); return rset[0][0]; }Test new "param_passing_demo" function by executing the following:
SQL> select param_passing_demo ();
Which returns:
callret VARCHAR _______________________________________________________________________________http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#four1 Rows. -- 00000 msec.
A SPARQL ASK query can be used as an argument of the SQL EXISTS predicate.
create function sparql_ask_demo () returns varchar { if (exists (sparql ask where { graph ?g { ?s ?p 4}})) return 'YES'; else return 'NO'; };
Test by executing:
SQL> select sparql_ask_demo ();
Which returns:
_________________________ YES]]>
I added the missing piece regarding the "Virtuoso Conductor" (the Web based Admin UI for Virtuoso) to the original post below. I also added a link to our live SPARQL Demo so that anyone interested can start playing around with SPARQL and SPARQL integrated into SQL right away.
Another good thing about this post is the vast amount of valuable links that it contains. To really appreciate this point simply visit my Linkblog (excuse the current layout :-) - a Tab if you come in via the front door of this Data Space (what I used to call My Weblog Home Page).
]]>"Free" Databases: Express vs. Open-Source RDBMSs: "Open-source relational database management systems (RDBMSs) are gaining IT mindshare at a rapid pace. As an example, BusinessWeek's February 6, 2006 ' Taking On the Database Giants ' article asks 'Can open-source upstarts compete with Oracle, IBM, and Microsoft?' and then provides the answer: 'It's an uphill battle, but customers are starting to look at the alternatives.'
There's no shortage of open-source alternatives to look at. The BusinessWeek article concentrates on MySQL, which BW says 'is trying to be the Ikea of the database world: cheap, needs some assembly, but has a sleek, modern design and does the job.' The article also discusses Postgre[SQL] and Ingres, as well as EnterpriseDB, an Oracle clone created from PostgreSQL code*. Sun includes PostgreSQL with Solaris 10 and, as of April 6, 2006, with Solaris Express.**
*Frank Batten, Jr., the investor who originally funded Red Hat, invested a reported $16 million into Great Bridge with the hope of making a business out of providing paid support to PostgreSQL users. Great Bridge stayed in business only 18 months , having missed an opportunity to sell the business to Red Hat and finding that selling $50,000-per-year support packages for an open-source database wasn't easy. As Batten concluded, 'We could not get customers to pay us big dollars for support contracts.' Perhaps EnterpriseDB will be more successful with a choice of $5,000, $3,000, or $1,000 annual support subscriptions .
**Interestingly, Oracle announced in November 2005 that Solaris 10 is 'its preferred development and deployment platform for most x64 architectures, including x64 (x86, 64-bit) AMD Opteron and Intel Xeon processor-based systems and Sun's UltraSPARC(R)-based systems.'
There is a surfeit of reviews of current MySQL, PostgreSQL andâto a lesser extentâIngres implementations. These three open-source RDBMSs come with their own or third-party management tools. These systems compete against free versions of commercial (proprietary) databases: SQL Server 2005 Express Edition (and its MSDE 2000 and 1.0 predecessors), Oracle Database 10g Express Edition, IBM DB2 Express-C, and Sybase ASE Express Edition for Linux where database size and processor count limitations aren't important. Click here for a summary of recent InfoWorld reviews of the full versions of these four databases plus MySQL, which should be valid for Express editions also. The FTPOnline Special Report article, 'Microsoft SQL Server Turns 17,' that contains the preceding table is here (requires registration.)
SQL Server 2005 Express Edition SP-1 Advanced Features
SQL Server 2005 Express Edition with Advanced Features enhances SQL Server 2005 Express Edition (SQL Express or SSX) dramatically, so it deserves special treatment here. SQL Express gains full text indexing and now supports SQL Server Reporting Services (SSRS) on the local SSX instance. The SP-1 with Advanced Features setup package, which Microsoft released on April 18, 2006, installs the release version of SQL Server Management Studio Express (SSMSE) and the full version of Business Intelligence Development Studio (BIDS) for designing and editing SSRS reports. My 'Install SP-1 for SQL Server 2005 and Express' article for FTPOnline's SQL Server Special Report provides detailed, illustrated installation instructions for and related information about the release version of SP-1. SP-1 makes SSX the most capable of all currently available Express editions of commercial RDBMSs for Windows.
OpenLink Software's Virtuoso Open-Source Edition
OpenLink Software announced an open-source version of it's Virtuoso Universal Server commercial DBMS on April 11, 2006. On the initial date of this post, May 2, 2006, Virtuoso Open-Source Edition (VOS) was virtually under the radar as an open-source product. According to this press release, the new edition includes:VOS only lacks the virtual server and replication features that are offered by the commercial edition. VOS includes a Web-based administration tool called the "Virtuoso Conductor" According to Kingsley Idehen's Weblog, 'The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).'
- SPARQL compliant RDF Triple Store
- SQL-200n Object-Relational Database Engine (SQL, XML, and Free Text)
- Integrated BPEL Server and Enterprise Service Bus
- WebDAV and Native File Server
- Web Application Server that supports PHP, Perl, Python, ASP.NET, JSP, etc.
- Runtime Hosting for Microsoft .NET, Mono, and Java
InfoWorld's Jon Udell has tracked Virtuoso's progress since 2002, with an additional article in 2003 and a one-hour podcast with Kingsley Idehen on April 26, 2006. A major talking point for Virtuoso is its support for Atom 0.3 syndication and publication, Atom 1.0 syndication and (forthcoming) publication, and future support for Google's GData protocol, as mentioned in this Idehen post. Yahoo!'s Jeremy Zawodny points out that the 'fingerprints' of Adam Bosworth, Google's VP of Engineering and the primary force behind the development of Microsoft Access, 'are all over GData.' Click here to display a list of all OakLeaf posts that mention Adam Bosworth.
One application for the GData protocol is querying and updating the Google Base database independently of the Google Web client, as mentioned by Jeremy: 'It's not about building an easier onramp to Google Base. ... Well, it is. But, again, that's the small stuff.' Click here for a list of posts about my experiences with Google Base. Watch for a future OakLeaf post on the subject as the GData APIs gain ground.
Open-Source and Free Embedded Database Contenders
Open-source and free embedded SQL databases are gaining importance as the number and types of mobile devices and OSs proliferate. Embedded databases usually consist of Java classes or Windows DLLs that are designed to minimize file size and memory consumption. Embedded databases avoid the installation hassles, heavy resource usage and maintenance cost associated with client/server RDBMSs that run as an operating system service.
Andrew Hudson's December 2005 'Open Source databases rounded up and rodeoed' review for The Enquirer provides brief descriptions of one commercial and eight open source database purveyors/products: Sleepycat, MySQL, PostgreSQL, Ingres, InnoBase, Firebird, IBM Cloudscape (a.k.a, Derby), Genezzo, and Oracle. Oracle Sleepycat* isn't an SQL Database, Oracle InnoDB* is an OEM database engine that's used by MySQL, and Genezzo is a multi-user, multi-server distributed database engine written in Perl. These special-purpose databases are beyond the scope of this post.
* Oracle purchased Sleepycat Software, Inc. in February 2006 and purchased Innobase OY in October 2005 . The press release states: 'Oracle intends to continue developing the InnoDB technology and expand our commitment to open source software.'
Derby is an open-source release by the Apache Software Foundation of the Cloudscape Java-based database that IBM acquired when it bought Informix in 2001. IBM offers a commercial release of Derby as IBM Cloudscape 10.1. Derby is a Java class library that has a relatively light footprint (2 MB), which make it suitable for client/server synchronization with the IBM DB2 Everyplace Sync Server in mobile applications. The IBM DB2 Everyplace Express Edition isn't open source or free*, so it doesn't qualify for this post. The same is true for the corresponding Sybase SQL Anywhere components.**
* IBM DB2 Everyplace Express Edition with synchronization costs $379 per server (up to two processors) and $79 per user. DB2 Everyplace Database Edition (without DB2 synchronization) is $49 per user. (Prices are based on those when IBM announced version 8 in November 2003.)
** Sybase's iAnywhere subsidiary calls SQL Anywhere 'the industry's leading mobile database.' A Sybase SQL Anywhere Personal DB seat license with synchronization to SQL Anywhere Server is $119; the cost without synchronization wasn't available from the Sybase Web site. Sybase SQL Anywhere and IBM DB2 Everyplace perform similar replication functions.
Sun's Java DB, another commercial version of Derby, comes with the Solaris Enterprise Edition, which bundles Solaris 10, the Java Enterprise System, developer tools, desktop infrastructure and N1 management software. A recent Between the Lines blog entry by ZDNet's David Berlind waxes enthusiastic over the use of Java DB embedded in a browser to provide offline persistence. RedMonk analyst James Governor and eWeek's Lisa Vaas wrote about the use of Java DB as a local data store when Tim Bray announced Sun's Derby derivative and Francois Orsini demonstrated Java DB embedded in the Firefox browser at the ApacheCon 2005 conference.
Firebird is derived from Borland's InterBase 6.0 code, the first commercial relational database management system (RDBMS) to be released as open source. Firebird has excellent support for SQL-92 and comes in three versions: Classic, SuperServer and Embedded for Windows, Linux, Solaris, HP-UX, FreeBSD and MacOS X. The embedded version has a 1.4-MB footprint. Release Candidate 1 for Firebird 2.0 became available on March 30, 2006 and is a major improvement over earlier versions. Borland continues to promote InterBase, now at version 7.5, as a small-footprint, embedded database with commercial Server and Client licenses.
SQLite is a featherweight C library for an embedded database that implements most SQL-92 entry- and transitional-level requirements (some through the JDBC driver) and supports transactions within a tiny 250-KB code footprint. Wrappers support a multitude of languages and operating systems, including Windows CE, SmartPhone, Windows Mobile, and Win32. SQLite's primary SQL-92 limitations are lack of nested transactions, inability to alter a table design once committed (other than with RENAME TABLE and ADD COLUMN operations), and foreign-key constraints. SQLite provides read-only views, triggers, and 256-bit encryption of database files. A downside is the the entire database file is locked when while a transaction is in progress. SQLite uses file access permissions in lieu of GRANT and REVOKE commands. Using SQLite involves no license; its code is entirely in the public domain.The Mozilla Foundation's Unified Storage wiki says this about SQLite: 'SQLite will be the back end for the unified store [for Firefox]. Because it implements a SQL engine, we get querying 'for free', without having to invent our own query language or query execution system. Its code-size footprint is moderate (250k), but it will hopefully simplify much existing code so that the net code-size change should be smaller. It has exceptional performance, and supports concurrent access to the database. Finally, it is released into the public domain, meaning that we will have no licensing issues.'
Vieka Technology, Inc.'s eSQL 2.11 is a port of SQLite to Windows Mobile (Pocket PC and Smartphone) and Win32, and includes development tools for Windows devices and PCs, as well as a .NET native data provider. A conventional ODBC driver also is available. eSQL for Windows (Win32) is free for personal and commercial use; eSQL for Windows Mobile requires a license for commercial (for-profit or business) use.
HSQLDB isn't on most reviewers' radar, which is surprising because it's the default database for OpenOffice.org (OOo) 2.0's Base suite member. HSQLDB 1.8.0.1 is an open-source (BSD license) Java dembedded database engine based on Thomas Mueller's original Hypersonic SQL Project. Using OOo's Base feature requires installing the Java 2.0 Runtime Engine (which is not open-source) or the presence of an alternative open-source engine, such as Kaffe. My prior posts about OOo Base and HSQLDB are here, here and here.
The HSQLDB 1.8.0 documentation on SourceForge states the following regarding SQL-92 and later conformance:
Other less well-known embedded databases designed for or suited to mobile deployment are Mimer SQL Mobile and VistaDB 2.1 . Neither product is open-source and require paid licensing; VistaDB requires a small up-front payment by developers but offers royalty-free distribution.HSQLDB 1.8.0 supports the dialect of SQL defined by SQL standards 92, 99 and 2003. This means where a feature of the standard is supported, e.g. left outer join, the syntax is that specified by the standard text. Many features of SQL92 and 99 up to Advanced Level are supported and here is support for most of SQL 2003 Foundation and several optional features of this standard. However, certain features of the Standards are not supported so no claim is made for full support of any level of the standards.
Java DB, Firebird embedded, SQLite and eSQL 2.11 are contenders for lightweight PC and mobile device database projects that aren't Windows-only.
SQL Server 2005 Everywhere
If you're a Windows developer, SQL Server Mobile is the logical embedded database choice for mobile applications for Pocket PCs and Smartphones. Microsoft's April 19, 2006 press release delivered the news that SQL Server 2005 Mobile Editon (SQL Mobile or SSM) would gain a big brotherâSQL Server 2005 Everywhere Edition.
Currently, the SSM client is licensed (at no charge) to run in production on devices with Windows CE 5.0, Windows Mobile 2003 for Pocket PC or Windows Mobile 5.0, or on PCs with Windows XP Tablet Edition only. SSM also is licensed for development purposes on PCs running Visual Studio 2005. Smart Device replication with SQL Server 2000 SP3 and later databases has been the most common application so far for SSM.
By the end of 2006, Microsoft will license SSE for use on all PCs running any Win32 version or the preceding device OSs. A version of SQL Server Management Studio Express (SSMSE)âupdated to support SSEâis expected to release by the end of the year. These features will qualify SSE as the universal embedded database for Windows client and smart-device applications.
For more details on SSE, read John Galloway's April 11, 2006 blog post and my 'SQL Server 2005 Mobile Goes Everywhere' article for the FTPOnline Special Report on SQL Server."(Via OakLeaf Systems.)
Have RSS feeds killed the email star? silicon.com Feb 28 2005 12:58PM GMT
Anyway, back to cognitive dissonance. Could this be the reason for the following?
And more...
]]>Email As A Platform It looks like more people are starting to realize that email is more than it seems. Especially given the drastic increase in storage size of web-based email applications, more people are realizing that email is basically a personal database. People simply store information in their email, from contact information that was emailed to them to schedule information to purchase tracking from emailed receipts. Lots of people email messages to themselves, realizing that email is basically the best "permanent" filing system they have. That's part of the reason why good email search is so important. Of course, what the article doesn't discuss is the next stage of this evolution. If you have a database of important information, the next step is to build useful applications on top of it. In other words, people are starting to realize that email, itself, is a platform for personal information management.
How secure is your data? Looking at your information management resources through a would-be intruder's eyes can help you find (and fix) vulnerabilities.
Naturally :-)
When E. F. Codd developed his relational data model in 1970, the business world was a different place. Almost 35 years after his seminal work appeared, RDBMSs that sprung from Codd's ideas are the standard for storing corporate information. And, with government and industry regulations dictating what kinds of information companies have to store, manage, and audit (and for how long), protecting this information is more important than ever. Unfortunately, it's also more challenging
Even in 1985, when Dr. Codd published 12 guidelines for RDBMSs, there was little concern for data security. In those days, gaining access to a database was so difficult that advanced security features on the database were irrelevant.
Today, RDBMSs carry the lifeblood of every organization. Note the use of the plural: Organizations now have many databases that are decentralized in terms of use and security controls. E-business demands that data access be extended to customers, partners, suppliers, and other parties who were rarely considered in the early data management days. With all this availability ? not to mention pressure from an array of government and industry regulations (see the sidebar, "Security and Compliance") ? the need to control exactly who can access or modify data is becoming paramount.
Absolute facts, that are still partially understood at best. For instance we are still in a so called "Information Age" in which standards based data access remains an issue of contempt instead of absolute necessity.
There are a number of prevailing myths about standards based data access that continue to cloak reality:
Even if the above were true (which I refute strongly), how about the general security vulnerabilities that affect both Native, and Standards compliant, data access interfaces?
Aaron's article does a good job of highlighting 6 areas of vulnerability:
What I have been able to do very quickly (thanks to blogging, and the power of a blog engine that supports WebDAV), is write a tabulated response to each of the items (bar Fixpaks) indicating how the OpenLink Multi-Tier Data Access Drivers (for ODBC, JDBC, ADO.NET, and OLEDB) protect corporate databases from each of these vulnerabilities.
To cut a long story short, we are increasingly living a contradiction where the terms "simple" and "free" are supposed to lead us to products that can adequately handle the challenges of an increasingly sophisticated grid of inter-connecting point.
I have been asked on numerous occassions, "How can you build a company and business based on data access technology?". My reply is the same as usual, "because everything comes down to data". If the data is compromised in anyway, then kiss Information, Knowledge, and everything else goodbye!
]]>
Databases get a grip on XML
From Inforworld.
The
next iteration of the SQL standard was supposed to arrive in 2003. But
SQL standardization has always been a glacially slow process, so nobody
should be surprised that SQL:2003 ? now known as SQL:200n ? isn?t ready
yet. Even so, 2003 was a year in which XML-oriented data management,
one of the areas addressed by the forthcoming standard, showed up on
more and more developers? radar screens.ÃÂ >> READ MORE
This article rounds up product for 2003 in the critical area of Enterprise Database Technology. It's certainly provides an apt reflection of how Virtuoso compares with offerings from some the larger (but certainly slower to implement) database vendors in this space. As usual Jon Udell's quote pretty much sums this up:
"While the spotlight shone on the heavyweight contenders, a couple of agile innovators made noteworthy advances in 2003. OpenLink Software?s Virtuoso 3.0, which we reviewed in March, stole thunder from all three major players. Like Oracle, it offers a WebDAV-accessible XML repository. Like DB2 Information Integrator, it functions as database middleware that can perform federated ?joins? across SQL and XML sources. And like the forthcoming Yukon, it embeds the .Net CLR (Common Language Runtime), or in the case of Linux, Novell/Ximian?s Mono."
Albeit still somewhat unknown to the broader industry we have remained true our "innovator" discipline, which still remains our chosen path to market leadership. Thus, its worth a quick Virtuoso release history, and featuresÃÂ recap as we get set to up the ante even further in 2004:
1998 - Virtuoso's initial public beta release with functional emphasis on Virtual Database Engine for ODBC and JDBC Data Sources.
1999 - Virtuoso's official commercial release, with emphasis stillÃÂ on Virtual Database functionality for ODBC, JDBC accessible SQL Databases.
2000 - Virtuoso 2.0 adds XML Storage, XPath, XML Schema, XQuery, XSL-T, WebDAV, SOAP, UDDI, HTTP, Replication, Free Text Indexing (*feature update*), POP3, and NNTP support.
2002 - Virtuoso 2.7 extends Virtualization prowess beyond data access via enhancements to its Web Services protocol stack implementation by enabling SQL Stored Procedures to be published as Web Services. It also debutsÃÂ its Object-Relational engine enhancements that include theÃÂ incorporation of Java and Microsoft .NET Objects into its User Defined Type, User Defined Functions, and Stored ProcedureÃÂ offerings.
2003 - Virtuoso 3.0 extends data and application logic virtualization into the Application Server realm (basically a Virtual Application server too!), by adding support for ASP.NET, PHP, Java Server Pages runtime hosting (making applications built using any of these languages deployable using Virtuoso across all supported platforms).
Collectively each of these releases have contributed to a very premeditated architecture and vision that will ultimately unveil the inherent power of critical I.S infrastructure virtualizationÃÂ along the following lines; data storage, data access , and application logic via coherent integration of SQL, XML, Web Services, and Persistent Stored Modules (.NET, Java, and other object based component building blocks).
ÃÂ
]]>NETWORK WORLD NEWSLETTER: MARK GIBBS ON WEB APPLICATIONS
Today's focus: A Virtuoso of a server
By Mark Gibbs
One of the bigger drags of Web applications development is that building a system of even modest complexity is a lot like herding cats - you need a database, an applications server, an XML engine, etc., etc. And as they all come from different vendors you are faced with solving the constellation of integration issues that inevitably arise.
If you are lucky, your integration results in a smoothly functioning system. If not, you have a lot of spare parts flying in loose formation with the risk of a crash and burn at any moment.
An alternative is to look for all of these features and services in a single package but you'll find few choices in this arena.
One that is available and looks very promising is OpenLink's Virtuoso (see links below).
Virtuoso is described as a cross platform (runs on Windows, all Unix flavors, Linux, and Mac OS X) universal server that provides databases, XML services, a Web application server and supporting services all in a single package.
OpenLink's list of supported standards is impressive and includes .Net, Mono, J2EE, XML Web Services (Simple Object Application Protocol, Web Services Description Language, WS-Security, Universal Description, Discovery and Integration), XML, XPath, XQuery, XSL-T, WebDav, HTTP, SMTP, LDAP, POP3, SQL-92, ODBC, JDBC and OLE-DB.
Virtuoso provides an HTTP-compliant Web Server; native XML document creation, storage and management; a Web services platform for creation, hosting and consumption of Web services; content replication and synchronization services; free text index server, mail delivery and storage and an NNTP server.
Another interesting feature is that with Virtuoso you can create Web services from existing SQL Stored Procedures, Java classes,
C++ classes, and 'C' functions as well as create dynamic XML
documents from ODBC and JDBC data sources.
This is an enormous product and implies a serious commitment on the part of adopters due to its scope and range of services.
]]>Virtuoso is enormous by virtue of its architectural ambitions, but actual disk requirements are
Howl is Rendezvous for Windows and Linux. [via Scripting News]
In the case of UDA you can configure ODBC and JDBC consumable data source names that are hosted on the server. Users can nownbspsimply picknbspDSNs from anbspcombo box and they are ready to make connections to remote databases from any ODBC, JDBC, OLE DB, or ADO.NET application.nbspAnother benefit ofnbspZeroconfignbspis that it facilitates centralized server side configuration which further enhances our server side session rules book;nbspwhich serves all our Multi-Tier data access drivers.
In the case of Virtuoso you are able to bind to pre-configured Virtuoso instances in exactly the same way.
Our Zeroconfig support has beennbspimplemented across Solaris, AIX, Digital UNIX, IRIX, HP-UX amongst others, but this is a project of interest all the same, and we may end up contributing to this effort.
]]>Tim O'Reilly wrote some thoughts about network aware software. Good sumup and nice ideas, why not only blogs should be net-aware (and where even blogs can be improved ;) )
"For the desktop, my personal vision is to see existing software instrumented to become increasingly web aware. It seems that Apple are doing a good job with this. (What does web aware mean for me? Being able to grok URIs, speaking WebDAV, and using open standard data formats.)" -- Edd Dumbill[via Bitflux Blog]
Rendezvous-like functionality for automatic discovery of and potential synchronization with other instances of the application on other computers. Apple is showing the power of this idea with iChat and iTunes, but it really could be applied in so many other places. For example, if every PIM supported this functionality, we could have the equivalent of "phonester" where you could automatically ask peers for contact information. Of course, that leads to guideline 2.
Another application is discovery of ODBC data sources, and database servers. Rendezvous can also simply security and administration of data sources accessible by either one of these standards data access mechanisms. It can also apply to XML databases and data sources exposed by XML Databases.
The very point I continue to make about Internet Points of Presence beingactual data acces points, in short these end points should be served by database serverprocesses. This is the very basis of Virtuoso, the inevitability of this realization remains the undepinings of this product. There are other products out there that have some sense of this vision too, but there is a little snag (at least so far in my research efforts), and that is the tendency to create dedicated independent server per protocol (an ultimate integration, administration, and maintenance nightmare).
Apple's benchmarks under fire ZDNet Jun 25 2003 7:13AM ET
[via Moreover - ZDNet]
OpenLink Software has provided an Open Source benchmark utility that support Mac OS X, Linux, and UNIX. Thus, if mission critical database oriented performance is what is most relevant to your needs (as opposed to Photoshop) then simply download either one, or both of the following:
OpenLink ODBC Bench (you can test TPC-A and TPC-C like performance of the G5 and compare against other platforms) via ODBC)
OpenLink JDBC Bench (same thing using JDBC)
]]>Ingres (technically, Advantage Ingres Enterprise) is, arguably, the forgotten database. There used to be five major databases: Oracle, DB2, Sybase, Informix and Ingres. Then along came Microsoft and, if you listened to most press comment (or the lack of it), you would think that there were only two of these left, plus SQL Server. [From IT-Director]
]]>Oracle, Microsoft, and IBM would certainly like the illusion of a 3 horse race, as this is the only way they can induce Ingres, Informix, and Sybase users to jump ship, and this, even though database migrations are by far the most risk prone and problematic aspects of any IT infrastructure.
Here is the interesting logic from the self-made big three, if you want to take advanatage of new paradigms and technologies such as XML, Web Services, and anything else in the pipeline you have to move all your data out of these databases, and then get all the mission critical applications re-associated with one of these databases, and by the way when you do so it is advisable that you use native interfaces (so that sometime in the future you have no chance whatsoever of repeating this folly at their expense).
The simple fact of the matter (which the self-made big three do not want you to know) is that you can put ODBC, JDBC, even platform specific data access APIs such as OLE DB and ADO.NET atop any of these databases, and then explore and exploit the benefits of new technologies and paradigms as long as the tool pool supports one of more of these standards.
Unfortunately the no-brainer above appears to be the more difficult of the choices before decision makers. In other words, many would rather dig themselves into a deeper hole (unknowingly i can only presume) that ultimately leads to technology lock-in.
The biggest challenge before any RDBMS based infrastructure today isn't which of the self-made big three to migrate to wholesale, rather, how to make progressive use of the pool of disparate applications, and application databases that proliferate the enterprise.
This is another way of understanding the burgeoning market for Virtual Databases, which in my opiion present the new frontier in database technology.
Yes, but I don't see how any of the data access mechanisms listed have anything to do with the problem as each serves a different application domain.
Well, guess what? You won't have to wait too much longer to see this dream become reality. All major vendors are not only working on this issue seriously but have already started releasing SDKs for exactly this purpose. The point of commonality between them is XML. Like it or not, XML is becoming a standard part of programmers' lives because it's simple and easy to understand structure make it a suitable format for transferring data between disparate applications and platforms. But the real power of XML lies in its ability to not only extend itself, but also define itself.
This is all well and good but it is XML, yes a uniforming technology, but the opening paragraph undermines the pont!
XML For Analysis is just another addition to the pool of data access mechanisms, it is an XML./SOAP based data access mechanism for Tabular and Mulitdimensional Data and without our implementation XMLA simply offers little to anyone trying to get at tabular data (SQL Data) as many of the current implementers fall into the OLAP camp, and there is no standard query language for persistenf CUBES (MDX from Microsoft is de facto at best). Thus, without Virtuoso tabular data (where the real data actually resides) is out of the picture.
So we say goodbye to ODBC, JDBC, OLE DB to get what exactly? From our perspective, another showcase opportunity, but for the innocent readers of this article I haven't got the faintest idea.
]]>
 http://www.sqlsecurity.com/DesktopDefault.aspx?tabindex=2&;tabid=3
Are other SQL Servers (Sybase, Oracle, DB2) subject to SQL injection? Yes, to varying degrees. Here is a site that can get you more details on some of the issues with other SQL Servers. http://www.owasp.org What is SQL Injection and why is all this information not included in the regular FAQ?SQL Injection is simply a term describing the act of passing SQL code into an application that was not intended by the developer.  Since this topic is not specifically restricted to SQL Server it is not included in the normal FAQ. In fact, much of the problems that allow SQL injection are not the fault of the database server per-se but rather are due to poor input validation and coding at other code layers. However, due to the serious nature and prevalence of this problem I feel its inclusion in a thorough discussion of SQL Server security is warranted. What causes SQL Injection? SQL injection is usually caused by developers who use "string-building" techniques in order to execute SQL code. For example, in a search page, the developer may use the following code to execute a query (VBScript/ASP sample shown): Set myRecordset = myConnection.execute("SELECT * FROM myTable WHERE someText ='" & request.form("inputdata") & "'") The reason this statement is likely to introduce an SQL injection problem is that the developer has made a classic mistake - poor input validation. We are trusting that user has not entered something malicious - something like the innocent looking single quote ('). Let's consider what would happen if a user entered the following text into the search form: ' exec master..xp_cmdshell 'net user test testpass /ADD' -- Then, when the query string is assembled and sent to SQL Server, the server will process the following code: SELECT * FROM myTable WHERE someText ='' exec master..xp_cmdshell 'net user test testpass /ADD'--' Notice, the first single quote entered by the user closed the string and SQL Server eagerly executes the next SQL statements in the batch including a command to add a new user to the local accounts database. If this application were running as 'sa' and the MSSQLSERVER service is running with sufficient privileges we would now have an account with which to access this machine. Also note the use of the comment operator (--) to force the SQL Server to ignore the trailing quote placed by the developer's code. Very intresting that these are all Native Interface based exploits. So the security issue isn't ODBC, JDBC, ADO.NET, or OLE DB specific (although they certainly increase the potential damage that can be unleashed via metadata analysis en route to that huge Cartesian Product ; the mother of all Exploits!). Our Session Rules Book was devised in 1993 with many of these issues in mind, and to this date there are no other ODBC/JDBC/OLE DB products out there that even come close to acknowledging this reality. |
http://www.sqlsecurity.com/DesktopDefault.aspx?tabindex=2&;tabid=3
Are other SQL Servers (Sybase, Oracle, DB2) subject to SQL injection? Yes, to varying degrees. Here is a site that can get you more details on some of the issues with other SQL Servers. http://www.owasp.org What is SQL Injection and why is all this information not included in the regular FAQ?SQL Injection is simply a term describing the act of passing SQL code into an application that was not intended by the developer. Since this topic is not specifically restricted to SQL Server it is not included in the normal FAQ. In fact, much of the problems that allow SQL injection are not the fault of the database server per-se but rather are due to poor input validation and coding at other code layers. However, due to the serious nature and prevalence of this problem I feel its inclusion in a thorough discussion of SQL Server security is warranted. What causes SQL Injection? SQL injection is usually caused by developers who use "string-building" techniques in order to execute SQL code. For example, in a search page, the developer may use the following code to execute a query (VBScript/ASP sample shown): Set myRecordset = myConnection.execute("SELECT * FROM myTable WHERE someText ='" & request.form("inputdata") & "'") The reason this statement is likely to introduce an SQL injection problem is that the developer has made a classic mistake - poor input validation. We are trusting that user has not entered something malicious - something like the innocent looking single quote ('). Let's consider what would happen if a user entered the following text into the search form: ' exec master..xp_cmdshell 'net user test testpass /ADD' -- Then, when the query string is assembled and sent to SQL Server, the server will process the following code: SELECT * FROM myTable WHERE someText ='' exec master..xp_cmdshell 'net user test testpass /ADD'--' Notice, the first single quote entered by the user closed the string and SQL Server eagerly executes the next SQL statements in the batch including a command to add a new user to the local accounts database. If this application were running as 'sa' and the MSSQLSERVER service is running with sufficient privileges we would now have an account with which to access this machine. Also note the use of the comment operator (--) to force the SQL Server to ignore the trailing quote placed by the developer's code. Very intresting that these are all Native Interface based exploits. So the security issue isn't ODBC, JDBC, ADO.NET, or OLE DB specific (although they certainly increase the potential damage that can be unleashed via metadata analysis en route to that huge Cartesian Product ; the mother of all Exploits!). Our Session Rules Book was devised in 1993 with many of these issues in mind, and to this date there are no other ODBC/JDBC/OLE DB products out there that even come close to acknowledging this reality. |