Kingsley Idehen's Blog Data Spacehttp://www.openlinksw.com/weblog/public/search.vspx?blogid=127&q=xquery&type=text&output=htmlFri, 29 Mar 2024 09:21:18 GMTKingsley Uyi Idehen<kidehen@openlinksw.com>About xquery33 1 10 A very interesting exchange that came through my RSS feeds this morning - starting with Jon's piece

]]>
No Remote XQuery Concerns Herehttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/451Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
for $o in document("day1.opml")//outline return <tr><td> {string($o/@text)}</td> <td><a href={string($o/@url)}>{string($o/@url)}</a></td></tr> ]]>Dynamic (XQuery Based) BloggerCon Attendee Listhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/377Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>When industry standards emerge one of the very first things I do (instinctively) is commence a quest to understand the essence of the standard's value proposition and then unravel implementation challenges as they affect existing IT infrastructure. The quest comprises the following steps:

  1. What is this standard
  2. Why is it important
  3. What are the implementation challenges

When XQuery first came across my radar (late 90s even before "XQuery" became the moniker for an XML Query Language) I arrived at the following conclusions using the steps listed above:

  1. What is XQuery about? Its about querying XML Documents (at the time real or virtual) in a repository. Basically, its the SQL equivalent for the XML based Infobase;
  2. Why is it important? Because we will need to access, repurpose, and disseminate the contents of the Infobase for a myriad of reasons which ultimately culminate in knowledge creation;
  3. What are the implementation challenges? Where do I start? Anway, here are a few:
    • Content Creation - we need to create the Infobase; for an XML based Infobase
]]>
XQuery: Almost Here? Should You Care?http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/664Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>

This is an imput form that will post to Syncato and then bring you back to the Virtuoso based XQuery post (assuming you spot the comment post I made earlier) re. BloggerCon


PostID:

Your Name:


Blog or Web URL:


Comments:




]]>
Variation on The Syncato Themehttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/378Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
I have embellished a number of weblogs that I oversee (Personal, Virtuoso, and UDA) as part of an OpenLink technology "dog-fooding" effort. We now have SQL-XML based RSS 2.0 feeds that make an array of content available for RSS Aggregators as well as ad hoc XQuery and XPath queries over HTTP/WebDAV.


Feed Description
Virtuoso Documentation Product documentation available as a collection RSS feeds per chapter with a feed catalog in an OPML file.
Data Access Driver Suite Documentation
]]>
Using SQL-XML Based RSS Feeds to Syndicate Documentation, Tutorials, and Demoshttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/392Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
When Virtuoso first unleashed support for XML (in-built XSL, Native XML Storage, Validating XML Parser, XPath, and XQuery) the core message was the delivery of a single server solution that would address the challenges of creating XML data.

In the year 2000 the question of the shape and form of XML data was unclear to many, and reading the article below basically took me back in time to when we released Virtuoso 2.0 (we are now at release 3.0 commercially with a 3.2 beta dropping any minute).

RSS is a great XML application, and it does a great job of demonstrating how XML --the new data access foundation layer-- will galvanize the next generation Web (I refer to this as Web 2.0.).

RSS: INJAN (It's not just about news)

RSS is not just about news, according to Ian Davis on rss-dev.
He presents a nice list of alternatives, which I reproduce here (and to which I�d add, of course, bibliography management)

  • Sitemaps: one of the S�s in RSS stands for summary. A sitemap is a summary of the content on a site, the items are pages or content areas. This is clearly a non-chronological ordering of items. Is a hierarchy of RSS sitemaps implied here � how would the linking between them work? How hard would it be to hack a web browser to pick up the RSS sitemap and display it in a sidebar when you visit the site?
  • Small ads: also known as classifieds. These expire so there�s some kind of dynamic going on here but the ordering of items isn�t necessarily chronological. How to describe the location of the seller, or the condition of the item or even the price. Not every ad is selling something � perhaps it�s to rent out a room.
  • Personals: similar model to the small ads. No prices though (I hope). Comes with a ready made vocabulary of terms that could be converted to an RDF schema. Probably should do that just for the hell of it anyway � gsoh
  • Weather reports: how about a week�s worth of weather in an RSS channel. If an item is dated in the future, should an aggregator display it before time? Alternate representations include maps of temperature and pressure etc.
  • Auctions: again, related to small ads, but these are much more time limited since there is a hard cutoff after which the auction is closed. The sequence of bids could be interesting � would it make sense to thread them like a discussion so you can see the tactics?
  • TV listings: this is definitely chronological but with a twist � the items have durations. They also have other metadata such as cast lists, classification ratings, widescreen, stereo, program type. Some types have additional information such as director and production year.
  • Top ten listings: top ten singles, books, dvds, richest people, ugliest, rear of the year etc. Not chronological, but has definate order. May update from day to day or even more often.
  • Sales reporting: imagine if every department of a company reported their sales figures via RSS. Then the divisions aggregate the departmental figures and republish to the regional offices, who aggregate and add value up the chain. The chairman of the company subscribes to one super-aggregate feed.
  • Membership lists / buddy lists: could I publish my buddy list from Jabber or other instant messengers? Maybe as an interchange format or perhaps could be used to look for shared contacts. Lots of potential overlap with FOAF here.
  • Mailing lists: or in fact any messaging system such as usenet. There are some efforts at doing this already (e.g. yahoogroups) but we need more information � threads; references; headers; links into archives.
  • Price lists / inventory: the items here are products or services. No particular ordering but it�d be nice to be able to subscribe to a catalog of products and prices from a company. The aggregator should be able to pick out price rises or bargains given enough history.

Thus, if we can comprehend RSS (the blog article below does a great job) we should be able to see the fundamental challenges that are before any organization seeking to exploit the potential of the imminent Web 2.0 inflection; how will you cost-effectively create XML data from existing data sources? Without upgrading or switching database engines, operating systems, programming languages? Put differently how can you exploit this phenomenon without losing your ever dwindling technology choices (believe me choices are dwindling fast but most are oblivious to this fact).

 

]]>
RSS: INJAN (It's not just about news)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/241Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Here is another article titled "IBM Flexes XML Muscle" that covers the same general theme: IBM's appreciation of Unified Storage.

As indicated in an earlier post: IBM is clearly validating what we have done with Virtuoso (as was the case initially with their Virtual / Federated DBMS initiative ala DB2 Integrator). Here is an excerpt from today's eWeek article supporting this position:

To achieve maximum XML performance, bolstered indexing attributes in the technology will enable advanced search functions and a higher degree of filtering. IBM is also adding support for XPath and XQuery data models. This will allow users to create views that involve SQL and XQuery by sending the protocol through DB2's query optimizer for a unified query plan.

Read on..

Virtuoso has been doing this since 2000; unfortunately a lot of

]]>
IBM Flexes XML Musclehttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/657Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Microsoft to do for Usenet what it did for Email & The Web?

Netscan is an interesting NNTP based project and it is pretty much along the same lines of what Virtuoso has provided (albeit with an inferior UI) for NNTP since 1999.

Using Virtuoso the data presented by Netscan could very easily be presented as XML which could then be further processed using XPath, XQuery, and XSL-T with the final result RDF (since this is metadata afterall - another contribution to the Semantic Web)

]]>
Microsoft to do for Usenet what it did for Email & The Web?http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/228Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
The first salvo of what we've been hinting about re. server side faceted browsing over Unlimited Data within configurable Interactive Time-frames is now available for experimentation at: http://b3s.openlinksw.com/fct/facet.vsp.

Simple example / demo:

Enter search pattern: Microsoft

You will get the usual result from a full text pattern search i.e., hits and text excerpts with matching patterns in boldface. This first step is akin to throwing your net out to sea while fishing.

Now you have your catch, what next? Basically, this is where traditional text search value ends since regex or xpath/xquery offer little when the structure of literal text is the key to filtering or categorization based analysis of real-world entities. Naturally, this is where the value of structured querying of linked data starts, as you seek to use entity descriptions (combination of attribute and relationship properties) to "Find relevant things".

Continuing with the demo.

Click on "Properties" link within the Navigation section of the browser page which results in a distillation and aggregation of the properties of the entities associated with the search results. Then use the "Next" link to page through the properties until to find the properties that best match what you seek. Note, this particular step is akin to using the properties of the catch (using fishing analogy) for query filtering, with each subsequent property link click narrowing your selection further.

Using property based filtering is just one perspective on the data corpus associated with the text search pattern; thus, you can alter perspectives by clicking on the "Class" link so that you can filter you search results by entity type. Of course, in a number of scenarios you would use a combination of entity types and entity properties filters to locate the entities of interest to you.

A Few Notes about this demo instance of Virtuoso:

  • Lookup Data Size (Local Linked Data Corpus): 2 Billion+ Triples (entity-attribute-value tuples)
  • This is a *temporary* teaser / precursor to the LOD (Linking Open Data Cloud) variant of our Linked Data driven "Search" & "Find" service; we decided to implement this functionality prior to commissioning a larger and more up to date instance based on the entire LOD Cloud
  • The browser is simply using a Virtuoso PL function that also exists in Web Service form for loose binding by 3rd parties that have a UI orientation and focus (our UI is deliberately bare boned).
  • The properties and entity types (classes) links expose formal definitions and dictionary provenance information materialized in an HTML page (of course your browser or any other HTTP user agent can negotiation alternative representations of this descriptive information)
  • UMBEL based inference rules are enabled, giving you a live and simple demonstration of the virtues of Linked Data Dictionaries for example: click on the description link of any property or class from the foaf (friend-of-a-friend vocabulary), sioc (semantically-interlinked-online-communities ontology), mo (music ontology), bibo (bibliographic data ontology) namespaces to see how the data between these lower level vocabularies or ontologies are meshed with OpenCyc's upper level ontology.

Related

]]>
A Linked Data Web Approach To Semantic "Search" & "Find" (Updated)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1517Sat, 10 Jan 2009 18:55:56 GMT22009-01-10T13:55:56.000001-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Note: An updated version of a previously unpublished blog post:

Continuing from our recent Podcast conversation, Jon Udell sheds further insight into the essence of our conversation via a “Strategic Developer” column article titled: Accessing the web of databases.

Below, I present an initial dump of a DataSpace FAQ below that hopefully sheds light on the DataSpace vision espoused during my podcast conversation with Jon.

What is a DataSpace?

A moniker for Web-accessible atomic containers that manage and expose Data, Information, Services, Processes, and Knowledge.

What would you typically find in a Data Space? Examples include:

  • Raw Data - SQL, HTML, XML (raw), XHTML, RDF etc.

  • Information (Data In Context) - XHTML (various microformats), Blog Posts (in RSS, Atom, RSS-RDF formats), Subscription Lists (OPML, OCS, etc), Social Networks (FOAF, XFN etc.), and many other forms of applied XML.
  • Web Services (Application/Service Logic) - REST or SOAP based invocation of application logic for context sensitive and controlled data access and manipulation.
  • Persisted Knowledge - Information in actionable context that is also available in transient or persistent forms expressed using a Graph Data Model. A modern knowledgebase would more than likely have RDF as its Data Language, RDFS as its Schema Language, and OWL as its Domain  Definition (Ontology) Language. Actual Domain, Schema, and Instance Data would be serialized using formats such as RDF-XML, N3, Turtle etc).

How do Data Spaces and Databases differ?
Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled.

How do Data Spaces and Content Management Systems differ?
Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity.

How do Data Spaces and Knowledgebases differ?
A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models.

What Architectural Components make up a Data Space?

  • ORDBMS Engine - for Data Modeling agility (via complex purpose specific data types and data access methods), Data Atomicity, Data Concurrency, Transaction Isolation, and Durability (aka ACID).

  • Virtual Database Engine - for creating a single view of, and access point to, heterogeneous SQL, XML, Free Text, and other data. This is all about Virtualization at the Data Access Level.
  • Web Services Platform - enabling controlled access and manipulation (via application, service, or protocol logic) of Virtualized or Disparate Data. This layer handles the decoupling of functionality from monolithic wholes for function specific invocation via Web Services using either the SOAP or REST approach.

Where do Data Spaces fit into the Web's rapid evolution?
They are an essential part of the burgeoning Data Web / Semantic Web. In short, they will take us from data “Mash-ups” (combining web accessible data that exists without integration and repurposing in mind) to “Mesh-ups” (combining web accessible data that exists with integration and repurposing in mind).

Where can I see a DataSpace along the lines described, in action?

Just look at my blog, and take the journey as follows:

What about other Data Spaces?

There are several and I will attempt to categorize along the lines of query method available:
Type 1 (Free Text Search over HTTP):
Google, MSN, Yahoo!, Amazon, eBay, and most Web 2.0 plays .

Type 2 (Free Text Search and XQuery/XPath over HTTP)
A few blogs and Wikis (Jon Udell's and a few others)

Type 3 (RDF Data Sets and SPARQL Queryable):
Type 4 (Generic Free Text Search, OpenSearch, GData, XQuery/XPath, and SPARQL):
Points of Semantic Web presence such as the Data Spaces at:

What About Data Space aware tools?

  •    OpenLink Ajax Toolkit - provides Javascript Control level binding to Query Services such as XMLA for SQL, GData for Free Text, OpenSearch for Free Text, SPARQL for RDF, in addition to service specific Web Services (Web 2.0 hosted solutions that expose service specific APIs)
  •    Semantic Radar - a Firefox Extension
  •    PingTheSemantic - the Semantic Webs equivalent of Web 2.0's weblogs.com
  •    PiggyBank - a Firefox Extension

]]>
Data Spaces and Web of Databaseshttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1030Mon, 04 Sep 2006 22:58:56 GMT52006-09-04T18:58:56.000001-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Structured data is boring and useless.. This article provides insight into a serious point of confusion about what exactly is structured vs. unstructured data. Here is a key excerpt:
"We all know that structured data is boring and useless; while unstructured data is sexy and chock full of value. Well, only up to a point, Lord Copper. Genuinely unstructured data can be a real nuisance - imagine extracting the return address from an unstructured letter, without letterhead and any of the formatting usually applied to letters. A letter may be thought of as unstructured data, but most business letters are, in fact, highly-structured." ....
Duncan Pauly, founder and chief technology officer of Coppereye add's eloquent insight to the conversation:
"The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure:
    * The structure of the data itself.
    * The structure of the container that hosts the data.
    * The structure of the access method used to access the data.
These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."

Data understanding and appreciation is dwindling at a time when the reverse should be happening. We are supposed to be in the throws of the "Information Age", but for some reason this appears to have no correlation with data and "data access" in the minds of many -- as reflected in the broad contradictory positions taken re. unstructured data vs structured data, structured is boring and useless while unstructured is useful and sexy....

The difference between "Structured Containers" and "Structured Data" are clearly misunderstood by most (an unfortunate fact).

For instance all DBMS products are "Structured Containers" aligned to one or more data models (typically one). These products have been limited by proprietary data access APIs and underlying data model specificity when used in the "Open-world" model that is at the core of the World Wide Web. This confusion also carries over to the misconception that Web 2.0 and the Semantic/Data Web are mutually exclusive.

But things are changing fast, and the concept of multi-model DBMS products is beginning to crystalize. On our part, we have finally released the long promised "OpenLink Data Spaces" application layer that has been developed using our Virtuoso Universal Server. We have structured unified storage containment exposed to the data web cloud via endpoints for querying or accessing data using a variety of mechanisms that include; GData, OpenSearch, SPARQL, XQuery/XPath, SQL etc..

To be continued....

]]>
Structured Data vs. Unstructured Datahttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/991Tue, 27 Jun 2006 05:39:09 GMT12006-06-27T01:39:09-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
I am pleased to unveil (officially) the fact that Virtuoso is now available in Open Source form.

What Is Virtuoso?

A powerful next generation server product that implements otherwise distinct server functionality within a single server product. Think of Virtuoso as the server software analog of a dual core processor where each core represents a traditional server functionality realm.

Where did it come from?

The Virtuoso History page tells the whole story.

What Functionality Does It Provide?

The following:
    1. Object-Relational DBMS Engine (ORDBMS like PostgreSQL and DBMS engine like MySQL)
    2. XML Data Management (with support for XQuery, XPath, XSLT, and XML Schema)
    3. RDF Triple Store (or Database) that supports SPARQL (Query Language, Transport Protocol, and XML Results Serialization format)
    4. Service Oriented Architecture (it combines a BPEL Engine with an ESB)
    5. Web Application Server (supports HTTP/WebDAV)
    6. NNTP compliant Discussion Server
And more. (see: Virtuoso Web Site)

90% of the aforementioned functionality has been available in Virtuoso since 2000 with the RDF Triple Store being the only 2006 item.

What Platforms are Supported

The Virtuoso build scripts have been successfully tested on Mac OS X (Universal Binary Target), Linux, FreeBSD, and Solaris (AIX, HP-UX, and True64 UNIX will follow soon). A Windows Visual Studio project file is also in the works (ETA some time this week).

Why Open Source?

Simple, there is no value in a product of this magnitude remaining the "best kept secret". That status works well for our competitors, but absolutely works against the legions of new generation developers, systems integrators, and knowledge workers that need to be aware of what is actually achievable today with the right server architecture.

What Open Source License is it under?

GPL version 2.

What's the business model?

Dual licensing.

The Open Source version of Virtuoso includes all of the functionality listed above. While the Virtual Database (distributed heterogeneous join engine) and Replication Engine (across heterogeneous data sources) functionality will only be available in the commercial version.

Where is the Project Hosted?

On SourceForge.

Is there a product Blog?

Of course!

Up until this point, the Virtuoso Product Blog has been a covert live demonstration of some aspects of Virtuoso (Content Management). My Personal Blog and the Virtuoso Product Blog are actual Virtuoso instances, and have been so since I started blogging in 2003.

Is There a product Wiki?

Sure! The Virtuoso Product Wiki is also an instance of Virtuoso demonstrating another aspect of the Content Management prowess of Virtuoso.

What About Online Documentation?

Yep! Virtuoso Online Documentation is hosted via yet another Virtuoso instance. This particular instance also attempts to demonstrate Free Text search combined with the ability to repurpose well formed content in a myriad of forms (Atom, RSS, RDF, OPML, and OCS).

What about Tutorials and Demos?

The Virtuoso Online Tutorial Site has operated as a live demonstration and tutorial portal for a numbers of years. During the same timeframe (circa. 2001) we also assembled a few Screencast style demos (their look feel certainly show their age; updates are in the works).

BTW - We have also updated the Virtuoso FAQ and also released a number of missing Virtuoso White Papers (amongst many long overdue action items).

]]>
Virtuoso is Officially Open Source!http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/951Fri, 21 Jul 2006 11:22:20 GMT12006-07-21T07:22:20.000001-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
patent filing):
Techniques for presenting and managing syndication XML (feeds) are disclosed. In one embodiment, a user can modify how a feed is displayed, such as which content (and how much) is displayed, in what order, and how it is formatted. In another embodiment, a modification regarding how a feed is displayed is stored so that it can be used again at a later time. In yet another embodiment, a user can create a custom feed through aggregation and/or filtering of existing feeds. Aggregation includes, for example, merging the articles of multiple feeds to form a new feed. Filtering includes, for example, selecting a subset of articles of a feed based on whether they satisfy a search query. In yet another embodiment, a user can find articles by entering a search query into a search engine that searches feeds, which will identify one or more articles that satisfy the query.

Clearly Apple don't seem to understand the world of XML, so let me give them a quick recap:

    1. XML enables separation of Data and Formating
    2. It facilitates Data Representation, Transformation (XSLT), Exchange (syndication and subscription), and Modeling (languages, protocols, data models, amongst other things)
    3. It is inherently open
    4. You can't patent its essence through the back door!

The Blogosphere is a Galaxy within Cyberspace comprised of Solar systems of Blogs that revolve around X-list bloggers, Topics, or more recently Tags; through the gravitational pull of links to RSS (today), Atom (in due course), and RDF (the future).

Unfortunately, Apple (a major late-comer to RSS) doesn't seem to understand that "RSS content search, aggregation and transformation" is practically the same thing as "XML search, aggregation and transformation". Subject matter covered extensively by XML based languages such as XSLT, XPath, XPointer, and XQuery.

Without XML there would be no RSS (as we know it today), and without RSS there would be no Blogosphere.

Repurposing Blogosphere content isn't a novel invention at all. Therefore, filing a patent along such lines is simply uncool by Apple's standards (like the inextricable binding of iWeb to .mac that was touted as innovative and open).

Final note: this blog is driven by a database engine that has understood XML for a long time. This blog has been my live demo of this fact since its inception. Here are a few things that it has done for a very long time (talking prior art here):

    - Repurpose content on the fly from SQL and XML data sources to produce all the syndication and subscription gems you see on the Blog Home Page
    - Offer a search feature that enables visitors to query blog archives using Free Text, XQuery, XPath (all transformation technologies alongside XSLT).
]]>
Apple Patent Application: News Feed Viewer http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/937Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
I am kinda scratching my head a little re. the "Clone Google APIs" call; especially as Amazon's A9 already provides infrastructure for generic search. A9 is open at both ends; you can consume search services via a RESTian API or plug your search engine into A9 (playing the role of A9 search service provider).

Quick Example using my blog:

    3. Hactivism" regarding this matter. Certainly worth a full-post-scrape for my ongoing content annotation efforts (see Linkblog and BlogSummary).

    Digest the rest of Dare's post:

    Clone the Google APIs: Kill That Noise: "

    Yesterday Dave Winer wrote in a post about cloning the Google API Dave Winer wrote

    Let's make the Google API an open standard. Back in 2002, Google took a bold first step to enable open architecture search engines, by creating an API that allowed developers to build applications on top of their search engine. However, there were severe limits on the capacity of these applications. So we got a good demo of what might be, now three years later, it's time for the real thing.

    and earlier that
    If you didn't get a chance to hear yesterday's podcast, it recommends that Microsoft clone the Google API for search, without the keys, and without the limits. When a developer's application generates a lot of traffic, buy him a plane ticket and dinner, and ask how you both can make some money off their excellent booming application of search. This is something Google can't do, because search is their cash cow. That's why Microsoft should do it. And so should Yahoo. Also, there's no doubt Google will be competing with Apple soon, so they should be also thinking about ways to devalue Google's advantage.

    This doesn't seem like a great idea to me for a wide variety of reasons but first, let's start with a history lesson before I tackle this specific issue

    A Trip Down Memory Lane
    This history lesson used to be in is in a post entitled The Tragedy of the API by Evan Williams but seems to be gone now. Anyway, back in the early days of blogging the folks at Pyra [which eventually got bought by Google] created the Blogger API for their service. Since Blogspot/Blogger was a popular service, a the number of applications that used the API quickly grew. At this point Dave Winer decided that since the Blogger API was so popular he should implement it in his weblogging tools but then he decided that he didn't like some aspects of it such as application keys (sound familiar?) and did without them in his version of the API. Dave Winer's version of the Blogger API became the MetaWeblog API. These APIs became de facto standards and a number of other weblogging applications implemented them.

    After a while, the folks at Pyra decided that their API needed to evolve due to various flaws in its design. As Diego Doval put it in his post a review of blogging APIs, The Blogger API is a joke, and a bad one at that. This lead to the creation of the Blogger API 2.0. At this point a heated debate erupted online where Dave Winer berated the Blogger folks for deviating from an industry standard. The irony of flaming a company for coming up with a v2 of their own API seemed to be lost on many of the people who participated in the debate. Eventually the Blogger API 2.0 went nowhere.

    Today the blogging API world is a few de facto standards based on a hacky API created by a startup a few years ago, a number of site specific APIs (LiveJournal API, MovableType API, etc) and a number of inconsistently implemented versions of the Atom API.

    On Cloning the Google Search API
    To me the most salient point in the hijacking of the Blogger API from Pyra is that it didn't change the popularity of their service or even make Radio Userland (Dave Winer's product) catch up to them in popularity. This is important to note since this is Dave Winer's key argument for Microsoft cloning the Google API.

    Off the top of my head, here are my top three technical reasons for Microsoft to ignore the calls to clone the Google Search APIs

    1. Difference in Feature Set: The features exposed by the API do not run the entire gamut of features that other search engines may want to expose. Thus even if you implement something that looks a lot like the Google API, you'd have to extend it to add the functionality that it doesn't provide. For example, compare the features provided by the Google API to the features provided by the Yahoo! search API. I can count about half a dozen features in the Yahoo! API that aren't in the Google API.

    2. Difference in Technology Choice: The Google API uses SOAP. This to me is a phenomenally bad technical decision because it raises the bar to performing a basic operation (data retrieval) by using a complex technology. I much prefer Yahoo!'s approach of providing a RESTful API and MSN Windows Live Search's approach of providing RSS search feeds and a SOAP API for the folks who need such overkill.

    3. Unreasonable Demands: A number of Dave Winer's demands seem contradictory. He asks companies to not require application keys but then advises them to contact application developers who've built high traffic applications about revenue sharing. Exactly how are these applications to be identified without some sort of application ID? As for removing the limits on the services? I guess Dave is ignoring the fact that providing services costs money, which I seem to remember is why he sold weblogs.com to Verisign for a few million dollars. I do agree that some of the limits on existing search APIs aren't terribly useful. The Google API limit of 1000 queries a day seems to guarantee that you won't be able to power a popular application with the service.
    4. Lack of Innovation: Copying Google sucks.

    (Via Dare Obasanjo aka Carnage4Life.)

]]>
Clone the Google APIs: Kill That Noisehttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/892Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Uche Ogbuji comments in his blog about the use of WebDAV and SQLX in my blog as part of his commentary about Pyblosxom & WebDAV. To provide some clarity about Virtuoso and Blogging I have decided to put out this quick step by guide to the workings of my blog (there is a long overdue technical white paper nearing completion that address this subject in more detail).

Here goes:

Blog Editing

I can use any editor that supports the following Blog Post APIs:

- Moveable Type

- Meta Weblog

- Blogger

Typically I use Virtuoso (which has an unreleased WYSIWYG blog post editor), Newzcrawler, ecto, Zempt, or w.bloggar for my posts. If a post is of interest to me, or relevant to our company or customers I tend to perform one of the following tasks:

- Generate a post using the "Blog This" feature of my blog editor

- Write a new post that was triggered by a previously read post etc.

Either way, the posts end up in our company wide blog server that is Virtuoso based (more about this below). The internal blog server automatically categorizes my blog posts, and automagically determines which posts to upstream to other public blogs that I author (e.g http://kidehen.typepad.com ) or co-author (e.g http://www.openlinksw.com/weblogs/uda and http://www.openlinksw.com/weblogs/virtuoso ). I write once and my posts are dispatched conditionally to multiple outlets.

RSS/Atom/RDF Aggregation & Reading

I discover, subscribe to, and view blog feeds using Newzcrawler (primarily), and from time to time for experimentation and evaluation purposes I use RSS BanditFeedDemon, and Bloglines. I am in the process of moving this activity over to Virtuoso completely due to the large number of feeds that I consume on a daily basis (scalability is a bit of a problem with current aggregators).

Blog Publishing

When you visit my blog you are experiencing the  soon to be released Virtuoso Blog Publishing engine first hand, which is how WebDAV, SQLX, XQuery/XPath, and Free Text etc. come into the mix.

Each time I create a post internally, or subscribe to an external feed, the data ends up in Virtuoso's SQL Engine (this is how we handle some of the obvious scalability challenges associated with large subscription counts). This engine is SQL2000N based, which implies that it can transform SQL to XML on the fly using recent extensions to SQL in the form of SQLX (prior to the emergence of this standard we used the FOR XML SQL syntax extensions for the same result). It also has its own in-built XSLT processor (DB Engine resident), and validating XML parser (with support for XML Schema).  Thus, my RSS/RDF/Atom archives, FOAF, BlogRoll, OPML, and OCS blog syndication gems are all live examples of SQLX documents that leverage Virtuoso's WebDAV engine for exposure to Blog Clients.

Blog Search

When you search for blog posts using the basic or advanced search features of my blog, you end up interacting with one of the following methods of querying data hosted in Virtuoso: Free Text Search, XPath, or XQuery. The result sets produced by the search feature uses SQLX to produce subscription gems (RSS/Atom/RDF/ blog home page exists as a result of Virtuoso's Virtual Domain / Multi-Homing Web Server functionality. The entire site resides in an Object Relational DBMS, and I can take my DB file across Windows, Solaris, Linux, Mac OS X, FreeBSD, AIX, HP-UX, IRIX, and SCO UnixWare without missing a single beat! All I have to do is instantiate my Virtuoso server and my weblog is live.

]]>
WebDAV, SQLX, and my Webloghttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/810Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
I came across Shelley Power's blog via a recent post by Dare Obasanjo that shed light on the issue of "Minority Bloggers". After reading his post I visited every blog URI referenced, and in the process I bumped into a gem of an article titled: Guy's Don't Link.

BTW - I took the time to update my public blog-he-roll and new blog-her-roll; both being tiny snapshots of my actual blog subscription collection, which by the way, is actually so large and diverse that it's part of an internal project covering distributed XQuery and scalability :-)

]]>
AutoLink Hoopla Perspective: Guys Don't Linkhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/780Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>

Udell to event promoters on leveraging folksonomy: 'Pick a tag' I'm now trying to figure out why InfoWorld's Jon Udell is a journalist and not a millionaire technologist (or maybe he is). Udell keeps coming up with one brilliant idea after another. The first of these -- which I thought was just plain obvious -- was Udell's idea for vendors ...

[via Berlind's Midnight Oil]
 
I prefer to describe Jon Udell as a Technologist Type 3 (according to Tom Bradford's Technology Types nomenclature) who is also a journalist. His insights, thought stimulation/leadership, and power of articulation defy monetization.
I do know Jon (albeit primarily via emails and phone interviews), he even put me forward for an innovators award in 2003 re. Virtuoso etc.
Full disclosure aside,  you only need to trace back in time to see that he has been a Type 3 Technologist for a very long time. When I read one of Jon's articles I always sense that they are the end product of the following steps:
 
1. Hypothesis Development
2. Hands on Experimentation 
3. Experiment Obersvation
3. Conclusion Attainment
4. Report / Article generation
5. Share findings with interested parties 
 
On the subject of "sharing his findings", the blogosphere has become a very effective dispatch outlet. He starts conversations about Google Maps, Querying Web Data via XQuery/XPath for instance, that stimulate further discussion (in the form of related blog posts of varying relationship density which might discern from these posts by Tom and myself for instance ).
 
Blog conversation replaces the need for a "Jon here is our take on this..." or "Jon here is our implementation of what you demonstrated" phone call or email (you know he sees the discussion threads coalescing around his origninal post exprimentation conversation; most of the time setting up the next batch of experiments).
 
To conclude, Jon is more than likely a tech Thrillionaire  :-) 
 
]]>
Udell to event promoters on leveraging folksonomy: 'Pick a tag'http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/728Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Cognitive dissonance is how Dare Obasanjo aptly describes the emergence of some of the Smart Tags concepts previously introduced by Microsoft and now emulated by the new google toolbar's autolink feature (Greg Linden explains the problem with clarity).

Anyway, back to cognitive dissonance. Could this be the reason for the following?

  1. Open Source products are increasingly database specific even though they could be database independent via Open Source ODBC SDK efforts such as iODBC and unixODBC. We increasingly narrowing our choices down to database specific "Closed Source" or database specific "Open Source" solutions and somehow deem this to be progress
  2. The prevalent use of free standards compliant data access drivers (ODBC, JDBC, and ADO.NET) or their native counterparts that remain vulnerable to simple password hacks (there are databases behind those dynamic web sites!!) as none of these have any notion of "rules based" authentication and data access policy
  3. The time-tested fallacy that: "select * from table" defines a viable RDBMS engine since Transaction Atomicity, Concurrency, Isolation, and Durability (ACID) mean zip! Ditto scrollable cursors, stored procedures, and other presumably useless aspects of any marginably decent RDBMS engine
  4. Failing to comprehend that a Weblog is your property (if you have a personal blog) not the property of the vendor hosting your service (that important issue of separating data ownership and data storage again). You may have heard about, or experienced, total loss of weblog and/or weblog archives arising from weblog engine or blog service provider changeovers
  5. Failing to see the synergy between personal/group/corporate information stores (aka infobase) such as Wikis, Weblogs, and the burgeoning semantic web. Jon Udell for instance, is trying to get the point across via his tireless collection of XQuery/XPath based queries aimed at the blogosphere section of the burgeoning semantic web. Here are some of mine (scoped to this weblog):
    • Security related posts to date (XPath query)
    • Infobase related posts to date (Free Text search)

And more...

]]>
Cognitive Dissonancehttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/695Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
I put this piece together in response to another stimulating post by Dare Obasanjo titled "Is Google the Next Microsoft or the Next Netscape?". I changed the title of this post to project the fact that Web 2.0 provides the appropriate context (IMHO) for Dare's point re. "Web Site Stickiness".

Stickiness is a defining characteristic of Web 1.0 . It's all about eyeballs (site visitors) which implied ultimately that all early Web business models ended up down the advertising route.

I always felt that Web 1.0 was akin to having a crowd of people at your reception area seeking a look at your corporate brochures, and then someone realizes that you could start selling AD space in these brochures in response to the growing crowd size and frequency of congregation. The long-term folly of this approach is now obvious, as many organizations forgot their core value propositions (expressed via product offerings) in the process and wandered blindly down the AD model cul-de-sac, and we all know what happened down there..

Web 2.0 is taking shape (the inflection is in its latter stages), and the defining characteristics of Web 2.0 are:

  1. Fabric of Executable Endpoints
  2. Semantic Content (the RSS/RDF/Atom/FOAF semantic crumbs emerging from the Blogosphere are great examples of things to come re. XQuery queries over HTTP for instance) Migration from the Web Site (defined by static or dynamic HTML page generation) concept, to that of a "Web Point of Presence" (I don't know if this term will catch on, but the conceptual essence here is factual) that enables an organization to achieve the following:
    • Package/catalog value proposition (product and services) using RSS/RDF/Atom
    • Provide SOAP compliant Executable Endpoints (Web Services) for consuming value proposition (as opposed to being distracted by the AD model)
    • Provide Web Services for consummating contracts associated with core value proposition Identification of internal efficiencies, new products/services that leverage Semantic Content and Web Services, and tangibly exploit:
      • Composite Web Services construction from legacy monolithic application pools
      • Standards based (e.g. BPEL) orchestration and integration of disparate composite services (across the Fabric referred to above)

When you factor in all of the above, the real question is whether Google and others are equipped to exploit Web 2.0?  To some degree, is the best answer at the current time as they have commenced the transition from "content only" web site to web platform (via the many Web Services initiatives that expose SOAP and REST interfaces to various services), but there is much more to this journey, and that's the devil in the "competitive landscape details".

From my obviously biased perspective, I think Virtuoso and Yukon+WinFS provide the server models for driving Web 2.0 points of presence (single server instances that implement multiple protocols). Thus, if Google, Yahoo! et al. aren't exploiting these or similar products, then they will be vulnerable over the long term to the competitve challenges that a Web 2.0 landscape will present.

]]>
Is Google Web 2.0's Netscape?http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/611Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
A great post by Dare, especially his bringing into context the essence of this matter refrred to by C.J. Date as "XML the New Database Heresy".

I have little to add to this matter as our understanding and vision is aptly expressed via the architecture and feature set of Virtuoso (this area was actually addressed circa 1999).

We are heading into a era of multi-model databases, these are single database engines that are capable of effectively serving the requirements of the Hierarchical, Network, Relational, and Object database models . As we get closer to the unravelling of universal storage, hopefully this will get clearer.

Back to Dare's commentary:

C.J. Date, one of the most influential names in the relational database world, had some harsh words about XML's encroachment into the world of relational databases in a recent article entitled Date defends relational model  that appeared on SearchDatabases.com. Key parts of the article are excerpted below

Date reserved his harshest criticism for the competition, namely object-oriented and XML-based DBMSs. Calling them "the latest fashions in the computer world," Date said he rejects the argument that relational DBMSs are yesterday's news. Fans of object-oriented database systems "see flaws in the relational model because they don't fully understand it," he said.

Date also said that XML enthusiasts have gone overboard.

"XML was invented to solve the problem of data interchange, but having solved that, they now want to take over the world," he said. "With XML, it's like we forget what we are supposed to be doing, and focus instead on how to do it."

Craig S. Mullins, the director of technology planning at BMC Software and a SearchDatabase.com expert, shares Date's opinion of XML. It can be worthwhile, Mullins said, as long as XML is only used as a method of taking data and putting it into a DBMS. But Mullins cautioned that XML data that is stored in relational DBMSs as whole documents will be useless if the data needs to be queried, and he stressed Date's point that XML is not a real data model.

Craig Mullins points are more straightforward to answer since his comments don't jibe with the current state of the art in the XML world. He states that you can't query XML documents stored in databases but this is untrue. Almost three years ago, I was writing articles about querying XML documents stored in relational databases. Storing XML in a relational database doesn't mean it has to be stored in as an opaque binary BLOB or as a big, bunch of text which cannot effectively be queried. The next version of SQL Server will have extensive capabilities for querying XML data in relational database and doing joins across relational and XML data, a lot of this functionality is described in the article on XML Support in SQL Server 2005. As for XML not having a data model, I beg to differ. There is a data model for XML that many applications and people adhere to, often without realizing that they are doing so. This data model is the XPath 1.0 data model, which is being updated to handled typed data as the XQuery and XPath 2.0 data model.

Now to tackle the meat of C.J. Date's criticisms which is that XML solves the problem of data interchange but now is showing up in the database. The thing first point I'd like point out is that there are two broad usage patterns of XML, it  is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The latter type of data will only grow now that office productivity software like Microsoft Office have enabled users to save their documents as XML instead of proprietary binary formats. In many cases, these documents cannot simply shredded into relational tables. Sure you can shred an Excel spreadsheet written in spreadsheetML into relational tables but is the same really feasible for a Word document written in WordprocessingML? Many enterprises would rather have their important business data being stored and queried from a unified location instead of the current situation where some data is in document management systems, some hangs around as random files in people's folders while some sits in a database management system.

As for stating that critics of the relational model don't understand it, I disagree. One of the major benefits of using XML in relational databases is that it is a lot easier to deal with fluid schemas or data with sparse entries with XML. When the shape of the data tends to change or is not fixed the relational model is simply not designed to deal with this. Constantly changing your database schema is simply not feasible and there is no easy way to provide the extensibility of XML where one can say "after the X element, any element from any namespace can appear". How would one describe the capacity to store “any data” in a traditional relational database without resorting to an opaque blob?

I do tend to agree that some people are going overboard and trying to model their data hierarchically instead of relationally which experience has thought us is a bad idea. Recently on the XML-DEV mailing list entitled Designing XML to Support Information Evolution where Roger L. Costello described his travails trying to model his data which was being transferred as XML in a hierarchical manner. Micheal Champion accurately described the process Roger Costello went through as having "rediscovered the relational model". In a response to that thread I wrote "Hierarchical databases failed for a reason".

Using hierarchy as a primary way to model data is bad for at least the following reasons

  1. Hierarchies tend to encourage redundancy. Imagine I have a <Customer> element who has one or more <ShippingAddress> elements as children as well as one or more <Order> elements as children as well. Each order was shipped to an address, so if modelled hierarchically each <Order> element also will have a <ShippingAddress> element which leads to a lot of unnecessary duplication of data.
  2. In the real world, there are often multiple groups to which a piece of data belongs which often cannot be modelled with a single hierarchy.  
  3. Data is too tightly coupled. If I delete a <Customer> element, this means I've automatically deleted his entire order history since all the <Order> elements are children of <Customer>. Similarly if I query for a <Customer>, I end up getting all the <Order> information as well.

To put it simply, experience has taught the software world that the relational model is a better way to model data than the hierarchical model. Unfortunately, in the rush to embrace XML many a repreating the mistakes from decades ago in the new millenium.

[via Dare Obasanjo aka Carnage4Life]
]]>
XML, the New Database Heresyhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/555Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Dare Obasanjo points out that Microsoft Sharepoint offers "by reference" as opposed to "by value" mail attachment capability that Jon Udell reviewed in a recent blog post, true! So does Virtuoso in a number of ways (most importantly independent of client or server operating system).

This issue really brings WebDAV into scope as this is the protocol that enables this capability (as covered by Jon's piece), and it is one of the many client and server side protocols implemented by OpenLink Virtuoso (the key to how Virtuoso delivers URI based SQL-XML, XQuery, XPath services).

When you install Virtuoso you simply have to start the Virtuoso server instance to the get WebDAV functionality going. All of Virtuoso's services are advertised at ports, and in the case of WebDAV you will find this at port 8890 if you start the demo database.

To exploit the Virtuoso/WebDAV server from any WebDAV client (or point urls at WebDAV hosted resources) simply do the following:

  1. Install Virtuoso and depending on your OS do the following:
    • Windows - create a Web Folder that points to a WebDAV server
    • Mac OS X - mount a WebDAV folder
    • Linux - mount a WebDAV directory (also see the Davfs2 Open Source project)
    • You can also make WebDAV client calls from Virtuoso's Stored Procedure Language (Virtuoso PL) or use WebDAV implementations in any development environment of your choice (.NET, Java, .
  2. Place content that you want to reference in your mails in your WebDAV repository via any of the client side mechanisms described in step 1. You can see the results of this in my earlier blog post, even better pass the url on in an email! Or browse the WebDAV folder (there are some nuggets deliberately left in place :-) )
    • You could simply save an Office Doc (powerpoint, excel, word etc) to this location and the circulate urls in your mails (this has been standard practice at OpenLink for many years; we even have a full blown portal server that would soon be available as a public service to sharing anything via DAV and as usual some more... stay tuned)
  3. That's it for any platform (Windows, Linux, Mac OS X, FreeBSD, Solaris, AIX, HP-UX etc.) once you install Virtuoso!

BTW - This blog is WebDAV based (it's a live instance of Virtuoso doing many things; WebDAV, HTTP, SQL-XML based feed generation for ATOM, RSS, Blog Post APIs support (Moveable Type, Metaweblog, Blogger, ATOM), Free Text, XPath, XQuery, and more). 

]]>
Collaboration Softwarehttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/543Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>

Databases get a grip on XML
From Inforworld.

The next iteration of the SQL standard was supposed to arrive in 2003. But SQL standardization has always been a glacially slow process, so nobody should be surprised that SQL:2003 ? now known as SQL:200n ? isn?t ready yet. Even so, 2003 was a year in which XML-oriented data management, one of the areas addressed by the forthcoming standard, showed up on more and more developers? radar screens.  >> READ MORE

This article rounds up product for 2003 in the critical area of Enterprise Database Technology. It's certainly provides an apt reflection of how Virtuoso compares with offerings from some the larger (but certainly slower to implement) database vendors in this space. As usual Jon Udell's quote pretty much sums this up:

"While the spotlight shone on the heavyweight contenders, a couple of agile innovators made noteworthy advances in 2003. OpenLink Software?s Virtuoso 3.0, which we reviewed in March, stole thunder from all three major players. Like Oracle, it offers a WebDAV-accessible XML repository. Like DB2 Information Integrator, it functions as database middleware that can perform federated ?joins? across SQL and XML sources. And like the forthcoming Yukon, it embeds the .Net CLR (Common Language Runtime), or in the case of Linux, Novell/Ximian?s Mono."

Albeit still somewhat unknown to the broader industry we have remained true our "innovator" discipline, which still remains our chosen path to market leadership. Thus, its worth a quick Virtuoso release history, and features recap as we get set to up the ante even further in 2004:

1998 - Virtuoso's initial public beta release with functional emphasis on Virtual Database Engine for ODBC and JDBC Data Sources.

1999 - Virtuoso's official commercial release, with emphasis still on Virtual Database functionality for ODBC, JDBC accessible SQL Databases.

2000 - Virtuoso 2.0 adds XML Storage, XPath, XML Schema, XQuery, XSL-T, WebDAV, SOAP, UDDI, HTTP, Replication, Free Text Indexing (*feature update*), POP3, and NNTP support.

2002 - Virtuoso 2.7 extends Virtualization prowess beyond data access via enhancements to its Web Services protocol stack implementation by enabling SQL Stored Procedures to be published as Web Services. It also debuts its Object-Relational engine enhancements that include the incorporation of Java and Microsoft .NET Objects into its User Defined Type, User Defined Functions, and Stored Procedure offerings.

2003 - Virtuoso 3.0 extends data and application logic virtualization into the Application Server realm (basically a Virtual Application server too!), by adding support for ASP.NET, PHP, Java Server Pages runtime hosting (making applications built using any of these languages deployable using Virtuoso across all supported platforms).

Collectively each of these releases have contributed to a very premeditated architecture and vision that will ultimately unveil the inherent power of critical I.S infrastructure virtualization along the following lines; data storage, data access , and application logic via coherent integration of SQL, XML, Web Services, and Persistent Stored Modules (.NET, Java, and other object based component building blocks).

 

]]>
Enterprise Databases get a grip on XMLhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/442Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>

NETWORK WORLD NEWSLETTER: MARK GIBBS ON WEB APPLICATIONS

Today's focus: A Virtuoso of a server

By Mark Gibbs

One of the bigger drags of Web applications development is that building a system of even modest complexity is a lot like herding cats - you need a database, an applications server, an XML engine, etc., etc. And as they all come from different vendors you are faced with solving the constellation of integration issues that inevitably arise.

If you are lucky, your integration results in a smoothly functioning system. If not, you have a lot of spare parts flying in loose formation with the risk of a crash and burn at any moment.

An alternative is to look for all of these features and services in a single package but you'll find few choices in this arena.

One that is available and looks very promising is OpenLink's Virtuoso (see links below).

Virtuoso is described as a cross platform (runs on Windows, all Unix flavors, Linux, and Mac OS X) universal server that provides databases, XML services, a Web application server and supporting services all in a single package.

OpenLink's list of supported standards is impressive and includes .Net, Mono, J2EE, XML Web Services (Simple Object Application Protocol, Web Services Description Language, WS-Security, Universal Description, Discovery and Integration), XML, XPath, XQuery, XSL-T, WebDav, HTTP, SMTP, LDAP, POP3, SQL-92, ODBC, JDBC and OLE-DB.

Virtuoso provides an HTTP-compliant Web Server; native XML document creation, storage and management; a Web services platform for creation, hosting and consumption of Web services; content replication and synchronization services; free text index server, mail delivery and storage and an NNTP server.

Another interesting feature is that with Virtuoso you can create Web services from existing SQL Stored Procedures, Java classes,

C++ classes, and 'C' functions as well as create dynamic XML

documents from ODBC and JDBC data sources.

This is an enormous product and implies a serious commitment on the part of adopters due to its scope and range of services.

Virtuoso is enormous by virtue of its architectural ambitions, but actual disk requirements are

]]>
A Virtuoso of a Serverhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/395Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
An interesting piece by Michael Carey architect for Liquid Data at BEA re. Enterprise Information Integration from XML Journal.

Key quote.

Since the dawn of the database era more than three decades ago, enterprises have been amassing an ever-increasing volume of information - both current and historical - about their operations. For the past two of those three decades, the database world has struggled with the problem of somehow integrating information that natively resides in multiple database systems or other information sources (Landers and Rosenberg).

This is the root cause of many of the systems integration challenges facing may IT decsion makers. They want to exploit the new and emerging technologies, but the internal disparity of data and application logic presents many obstacles.

Michael had this to say in his introduction.

The IT world knows this problem today as the enterprise information integration (EII) problem: enterprise applications need to be able to easily access and combine information about a given business entity from a distributed and highly varied collection of information sources. Relevant sources include various relational database systems (RDBMSs); packaged applications from vendors such as Siebel, PeopleSoft, SAP, and others; "homegrown" proprietary systems; and an increasing number of data sources that are starting to speak XML, such as XML files and Web services.

Virtuoso (which coincedentally has been used to build and host this blog) has been developed to address the challenges presented above; by providing a Virtual Database Engine for disparate data and application logic (all the GEMs on this page have been generated on the fly using it's SQL-XML functionality).

Additional article excerpts:
With XQuery, the solution sketched above can be implemented by viewing the enterprise's different data sources all as virtual XML documents and functions. XQuery can stitch the distributed customer information together into a comprehensive, reusable base view.

A critical issue at this point is how sensistive the XML VIEW is to underlying data source changes. Enterprises are dynamic, so static XML VIEWs are going to be suboptimal in many situations. Applications are only as relevant as the underlying data fluidity served up by the data access (this issue is data format agnostic).

Virtuoso addresses this problem through its support of Persistent and Transient forms of XML VIEWs (which are derived from SQL, XML, Web Services, or any combination of these).

Final excerpt:
The relational data sources can be exposed using simple default XML Schemas, and the other sources - SAP and the credit-checking Web service - can be exposed to XQuery as callable XQuery functions with appropriate signatures.

Unfortunately XML Schemas aren't easy, so making this a requirement for producing XML VIEWs is somewhat problematic (or should I say challenging). Of course this approach has it merits, but it does put a significant knowledge acquisition burden on the end-user or developer. This is why Virtuoso also supports an approach based on SQL extensions for generating  XML from SQL that facilitate the production of Well Formed and/or Valid XML documents on the fly from heterogeneous SQL Data Sources (this syntax is identical to the FOR XML RAW | AUTO | EXPLICIT modes of SQL Server). It can also use it's in-built XSL-T engine to further transform other non SQL XML data sources (and then generate an XML Schema for the final product if required and validate against this schema using it's in-build XML Schema validaton engine).

This article certainly sheds light on the kinds of problems that EII based technologies such as Virtual Databases are positioned to address.

There is a live XQuery demo of Virtuoso at: http://demo.openlinksw.com:8890/xqdemo

]]>
<a href="http://www.sys-con.com/xml/article2a.cfm?id=652&amp;count=18437&amp;tot=14&amp;page=12">piece</a>http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/276Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Forbes Magazine Article

Net margins and return on equity are popular metrics that investors turn to in an effort to identify the most profitable companies. One less-used measure is return on invested capital, or ROIC.

Definitions of return on capital vary, but they all try to capture the same thing: how much a company has earned on all the capital it has invested, which includes both equity and debt. By including both, return on capital shows how a company uses all of its financial resources.

For our purposes, we define ROIC as earnings before interest, depreciation and amortization divided by invested capital. Invested capital encompasses shareholders' equity, plus all long-term liabilities and short-term debt.

WOW! We now use this metric to assess companies? Times have really changed!

This could be the basis of an XBRL project, the goal being to produce an XQuery to filter for all companies with a positive ROIC. Watch this space, it would be a great Virtuoso demo!

ROIC Industry Leaders
Company Price Latest 12- Month Sales ($mil) Return On Invested Capital 2003 Estimated P/E 2003 Estimated EPS Growth
Applebee's Int'l (nasdaq: APPB - news - people ) $28.73 $862 25.4% 17 15%
AutoZone (nyse: AZO - news - people ) 86.60 5,407 30.7 17 26
CVS (nyse: CVS - news - people ) 27.08 24,524 16.8 14 10
Dell Computer (nasdaq: DELL - news - people ) 32.45 35,404 32.9 33 24
HCA (nyse: HCA - news - people ) 32.42 20,129 18.1 11 10
McClatchy (nyse: MNI - news - people ) 59.95 1,087 13.4 20 7
PepsiCo (nyse: PEP - news - people ) 43.47 25,541 25.9 20 12
Select Medical (nyse: SEM - news - people ) 19.68 1,167 18.8 16 33
University of Phoenix Online (nasdaq: UOPX - news - people ) 45.16 418 39.1 51 66
Wal-Mart Stores (nyse: WMT - news - people ) 55.49 244,524 15.6 27 13
Prices as of May 13 (with XBRL it would as of last XQuery). Sources would read: Would be my Virtuoso DB instance.

]]>
<a href="http://www.forbes.com/2003/05/14/cz_tm_0514sf.html">Forbes Magazine Article</a>http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/262Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Amazon RSS Feeds

RSS feeds are everywhere, and they are changing the Web landscape fast. The Web is shifting from distributed freeform database, to distributed semi-structured database.

Amazon.com RSS Feeds They never got around to it, so we set up 160+ separate RSS channels for darn near every type of product on Amazon.com for you. If you have any feedback for this new (free) service, please let us know immediately! We're looking to make it an outstanding and permanent part to your collection. Enjoy! (Chris) [via Lockergnome's Bits and Bytes]

Your Web Site is gradually becoming a database (what?). Yes, your Web Site needs to be driven by database software that can rapidly create RSS feeds for your organizations non XML and XML data sources. Your web site needs to provide direct data access to  users, bots, Web Services.

Here is my blog database for instance, you can query the XML data in this database using XQuery, XPath, and Web Services (if I decide to publish any of my XML Query Templates as Web Services).

Note the teaser here, each XML document is zero bytes! This is becuase these are live Virtuoso SQL-XML documents that are producing a variety of XML documents on the fly, which means that they retain a high degree of sensitivity to changes in the underlying databases supplying the data.  I could have chosen to make these persistent XML docs with interval based synchronization with the backen data sources (but I chose not to for maximum effect).

As you can see SQL and XML (Relational and Hierarchical Models) engines can co-exist in a single server, ditto Object-Relational (which might be hidden from view but could be used in the SQL that serves the SQL-XML docs), ditto Full Text (see the search feature of this blog) and finally, ditto directed graph model for accessing my RDF data.(more on this as the RDF data pool increases).

]]>
Amazon.com RSS Feedshttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/181Thu, 22 Jun 2006 12:56:58 GMT12006-06-22T08:56:58-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Here is a tabulated "compare and contrast" of Web usage patterns 1.0, 2.0, and 3.0.

  Web 1.0 Web 2.0 Web 3.0
Simple Definition Interactive / Visual Web Programmable Web Linked Data Web
Unit of Presence Web Page Web Service Endpoint Data Space (named structured data enclave)
Unit of Value Exchange Page URL Endpoint URL for API Resource / Entity / Object URI
Data Granularity Low (HTML) Medium (XML) High (RDF)
Defining Services Search Community (Blogs to Social Networks) Find
Participation Quotient Low Medium High
Serendipitous Discovery Quotient Low Medium High
Data Referencability Quotient Low (Documents) Medium (Documents) High (Documents and their constituent Data)
Subjectivity Quotient High Medium (from A-list bloggers to select source and partner lists) Low (everything is discovered via URIs)
Transclusence Low Medium (Code driven Mashups) HIgh (Data driven Meshups)
What You See Is What You Prefer (WYSIWYP) Low Medium High (negotiated representation of resource descriptions)
Open Data Access (Data Accessibility) Low Medium (Silos) High (no Silos)
Identity Issues Handling Low Medium (OpenID)

High (FOAF+SSL)

Solution Deployment Model Centralized Centralized with sprinklings of Federation Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia)
Data Model Orientation Logical (Tree based DOM) Logical (Tree based XML) Conceptual (Graph based RDF)
User Interface Issues Dynamically generated static interfaces Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath) Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF
Data Querying Full Text Search Full Text Search Full Text Search + Structured Graph Pattern Query Language (SPARQL)
What Each Delivers Democratized Publishing Democratized Journalism & Commentary (Citizen Journalists & Commentators) Democratized Analysis (Citizen Data Analysts)
Star Wars Edition Analogy Star Wars (original fight for decentralization via rebellion) Empire Strikes Back (centralization and data silos make comeback) Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation")

Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)

Related

]]>
Simple Compare & Contrast of Web 1.0, 2.0, and 3.0 (Update 1)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1531Wed, 29 Apr 2009 17:21:25 GMT62009-04-29T13:21:25.000004-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
What is it?

A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.

What does it offer?

From a Web Entrepreneur perspective it offers:
  1. Low cost entry point to a game-changing Web 3.0+ (and beyond) platform that combines SQL, RDF, XML, and Web Services functionality
  2. Flexible variable cost model (courtesy of EC2 DevPay) tightly bound to revenue generated by your services
  3. Delivers federated and/or centralized model flexibility for you SaaS based solutions
  4. Simple entry point for developing and deploying sophisticated database driven applications (SQL or RDF Linked Data Web oriented)
  5. Complete framework for exploiting OpenID, OAuth (including Role enhancements) that simplifies exploitation of these vital Identity and Data Access technologies
  6. Easily implement RDF Linked Data based Mail, Blogging, Wikis, Bookmarks, Calendaring, Discussion Forums, Tagging, Social-Networking as Data Space (data containers) features of your application or service offering
  7. Instant alleviation of challenges (e.g. service costs and agility) associated with Data Portability and Open Data Access across Web 2.0 data silos
  8. LDAP integration for Intranet / Extranet style applications.

From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:

  1. RDF Database (a Quad Store with SPARQL & SPARUL Language & Protocol support)
  2. SQL Database (with ODBC, JDBC, OLE-DB, ADO.NET, and XMLA driver access)
  3. XML Database (XML Schema, XQuery/Xpath, XSLT, Full Text Indexing)
  4. Full Text Indexing.

From a Middleware perspective it provides:

  1. RDF Views (Wrappers / Semantic Covers) over SQL, XML, and other data sources accessible via SOAP or REST style Web Services
  2. Sponger Service for converting non RDF information resources into RDF Linked Data "on the fly" via a large collection of pre-installed RDFizer Cartridges.

From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering

  1. HTTP Web Server
  2. WebDAV Server
  3. Web Application Server (includes PHP runtime hosting)
  4. SOAP or REST style Web Services Deployment
  5. RDF Linked Data Deployment
  6. SPARQL (SPARQL Query Language) and SPARUL (SPARQL Update Language) endpoints
  7. Virtuoso Hosted PHP packages for MediaWiki, Drupal, Wordpress, and phpBB3 (just install the relevant Virtuoso Distro. Package).

From the general System Administrator's perspective it provides:

  1. Online Backups (Backup Set dispatched to S3 buckets, FTP, or HTTP/WebDAV server locations)
  2. Synchronized Incremental Backups to Backup Set locations
  3. Backup Restore from Backup Set location (without exiting to EC2 shell).

Higher level user oriented offerings include:

  1. OpenLink Data Explorer front-end for exploring the burgeoning Linked Data Web
  2. Ajax based SPARQL Query Builder (iSPARQL) that enables SPARQL Query construction by Example
  3. Ajax based SQL Query Builder (QBE) that enables SQL Query construction by Example.

For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:

  1. Point of presence on the Linked Data Web that meshes your Identity and your Data via URIs
  2. System generated Social Network Profile & Contact Data via FOAF?
  3. System generated SIOC (Semantically Interconnected Online Community) Data Space (that includes a Social Graph) exposing all your Web data in RDF Linked Data form
  4. System generated OpenID and automatic integration with FOAF
  5. Transparent Data Integration across Facebook, Digg, LinkedIn, FriendFeed, Twitter, and any other Web 2.0 data space equipped with RSS / Atom support and/or REST style Web Services
  6. In-built support for SyncML which enables data synchronization with Mobile Phones.

How Do I Get Going with It?

]]>
Introducing Virtuoso Universal Server (Cloud Edition) for Amazon EC2http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1489Fri, 28 Nov 2008 21:06:02 GMT22008-11-28T16:06:02.000006-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
For all the one-way feed consumers and aggregators, and readers of the original post, here is a variant equipped hyperlinked phrases as opposed to words. As I stated in the prior post, the post (like most of my posts) was part experiment / dog-fodding of automatic tagging and hyper-linking functionality in OpenLink Data Spaces.

ReadWriteWeb via Alex Iskold's post have delivered another iteration of their "Guide to Semantic Technologies".

If you look at the title of this post (and their article) they seem to be accurately providing a guide to Semantic Technologies, so no qualms there. If on the other hand, this is supposed to he a guide to the "Semantic Web" as prescribed by TimBL then they are completely missing the essence of the whole subject, and demonstrably so I may add, since the entities: "ReadWriteWeb" and "Alex Iskold" are only describable today via the attributes of the documents they publish i.e their respective blogs and hosted blog posts.

Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.

TimBL's vision as espoused via the "Semantic Web" vision is about the production, consumption, and sharing of Data Objects via HTTP based Identifiers called URIs/IRIs (Hyperdata Links / Linked Data). It's how we use the Web as a Distributed Database where (as Jim Hendler once stated with immense clarity): I can point to records (entity instances) in your database (aka Data Space) from mine. Which is to say that if we can all point to data entities/objects (not just data entities of type "Document") using these Location, Value, and Structure independent Object Identifiers (courtesy of HTTP) we end up with a much more powerful Web, and one that is closer to the "Federated and Open" nature of the Web.

As I stated in a prior post, if you or your platform of choice aren't producing de-referencable URIs for your data objects, you may be Semantic (this data model predates the Web), but there is no "World Wide Web" in what you are doing.

What are the Benefits of the Semantic Web?

    Consumer - "Discovery of relevant things" and be being "Discovered by relevant things" (people, places, events, and other things)
    Enterprise - ditto plus the addition of enterprise domain specific things such as market opportunities, product portfolios, human resources, partners, customers, competitors, co-opetitors, acquisition targets, new regulation etc..)

Simple demo:

I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.

As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.

Bottom-line, the Semantic Web is about adding the missing "Open Data Access & Connectivity" feature to the current Document Web (we have to beyond regex, grep, xpath, xquery, full text search, and other literal scrapping approaches). The Linked Data Web of de-referencable data object URIs is the critical foundation layer that makes this feasible.

Remember, It's not about "Applications" it's about Data and actually freeing Data from the "tyranny of Applications". Unfortunately, application inadvertently always create silos (esp. on the Web) since entity data modeling, open data access, and other database technology realm matters, remain of secondary interest to many application developers.

Final comment, RDF facilitates Linked Data on the Web, but all RDF isn't endowed with de-referencable URIs (a major source of confusion and misunderstanding). Thus, you can have RDF Data Source Providers that simply project RDF data silos via Web Services APIs if RDF output emanating from a Web Service doesn't provide out-bound pathways to other data via de-referencable URIs. Of course the same also applies to Widgets that present you with all the things they've discovered without exposing de-referencable URIs for each item.

BTW - my final comments above aren't in anyway incongruent with devising successful business models for the Web. As you may or may not know, OpenLink is not only a major platform provider for the Semantic Web (expressed in our UDA, Virtuoso, OpenLink Data Spaces, and OAT products), we are also actively seeding Semantic Web (tribe: Linked Data of course) startups. For instance, Zitgist, which now has Mike Bergman as it's CEO alongside Frederick Giasson as CTO. Of course, I cannot do Zitgist justice via a footnote in a blog post, so I will expand further in a separate post.

Additional information about this blog post:

  1. I didn't spent hours looking for URIs used in my hyperlinks
  2. The post is best viewed via an RDF Linked Data aware user agents (OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, Tabulator).
]]>
Semantic Web Patterns: A Guide to Semantic Technologies (Update 2)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1329Thu, 17 Jul 2008 01:43:36 GMT42008-07-16T21:43:36-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
ReadWriteWeb via Alex Iskold have delivered another iteration of their "Guide to Semantic Technologies".

If you look at the title of this post (and their article) they seem to be accurately providing a guide to Semantic Technologies, so no qualms there. If on the other hand, this is supposed to he a guide to the "Semantic Web" as prescribed by TimBL then they are completely missing the essence of the whole subject, and demonstrably so I may add, since the entities: "ReadWriteWeb" and "Alex Iskold" are only describable today via the attributes of the documents they publish i.e their respective blogs and hosted blog posts.

Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.

TimBL's vision as espoused via the "Semantic Web" vision is about the production, consumption, and sharing of Data Objects via HTTP based Identifiers called URIs/IRIs (Hyperdata Links / Linked Data). It's how we use the Web as a Distributed Database where (as Jim Hendler once stated with immense clarity): I can point to records (entity instances) in your database (aka Data Space) from mine. Which is to say that if we can all point to data entities/objects (not just data entities of type "Document") using these Location, Value, and Structure independent Object Identifiers (courtesy of HTTP) we end up with a much more powerful Web, and one that is closer to the "Federated and Open" nature of the Web.

As I stated in a prior post, if you or your platform of choice aren't producing de-referencable URIs for your data objects, you may be Semantic (this data model predates the Web), but there is no "World Wide Web" in what you are doing.

What are the Benefits of the Semantic Web?

    Consumer - "Discovery of relevant things" and be being "Discovered by relevant things" (people, places, events, and other things)
    Enterprise - ditto plus the addition of enterprise domain specific things such as market opportunities, product portfolios, human resources, partners, customers, competitors, co-opetitors, acquisition targets, new regulation etc..)

Simple demo:

I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.

As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.

Bottom-line, the Semantic Web is about adding the missing "Open Data Access & Connectivity" feature to the current Document Web (we have to beyond regex, grep, xpath, xquery, full text search, and other literal scrapping approaches). The Linked Data Web of de-referencable data object URIs is the critical foundation layer that makes this feasible.

Remember, It's not about "Applications" it's about Data and actually freeing Data from the "tyranny of Applications". Unfortunately, application inadvertently always create silos (esp. on the Web) since entity data modeling, open data access, and other database technology realm matters, remain of secondary interest to many application developers.

Final comment, RDF facilitates Linked Data on the Web, but all RDF isn't endowed with de-referencable URIs (a major source of confusion and misunderstanding). Thus, you can have RDF Data Source Providers that simply project RDF data silos via Web Services APIs if RDF output emanating from a Web Service doesn't provide out-bound pathways to other data via de-referencable URIs. Of course the same also applies to Widgets that present you with all the things they've discovered without exposing de-referencable URIs for each item.

BTW - my final comments above aren't in anyway incongruent with devising successful business models for the Web. As you may or may not know, OpenLink is not only a major platform provider for the Semantic Web (expressed in our UDA, Virtuoso, OpenLink Data Spaces, and OAT products), we are also actively seeding Semantic Web (tribe: Linked Data of course) startups. For instance, Zitgist, which now has Mike Bergman as it's CEO alongside Frederick Giasson as CTO. Of course, I cannot do Zitgist justice via a footnote in a blog post, so I will expand further in a separate post.

Additional information about this blog post:

  1. I didn't spent hours looking for URIs used in my hyperlinks
  2. The post is best viewed via an RDF Linked Data aware user agents (OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, Tabulator).
]]>
Semantic Web Patterns: A Guide to Semantic Technologies (Update 1)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1328Thu, 17 Jul 2008 01:43:04 GMT112008-07-16T21:43:04-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Daniel Lewis has published another post about OpenLink Data Spaces (ODS) functionality titled:A few new features in OpenLink Data Spaces, that exposes additional features (some hot out the oven).

OpenLink Data Spaces (ODS) now officially supports:

Which means that OpenLink Data Spaces support all of the main standards being discussed in the DataPortability Interest Group!

APML Example:

All users of ODS automatically get a dynamically created APML file, for example: APML profile for Kingsley Idehen

The URI for an APML profile is: http://myopenlink.net/dataspace/<ods-username>/apml.xml

Meaning of a Tag Example:

All users of ODS automatically have tag cloud information embedded inside their SIOC file, for example: SIOC for Kingsley Idehen on the Myopenlink.net installation of ODS.

But even better, MOAT has been implemented in the ODS Tagging System. This has been demonstrated in a recent test blog post by my colleague Mitko Iliev, the blog post comes up on the tag search: http://myopenlink.net/dataspace/imitko/weblog/Mitko%27s%20Weblog/tag/paris

Which can be put through the OpenLink Data Browser:

OAuth Example:

OAuth Tokens and Secrets can be created for any ODS application. To do this:

  1. you can log in to MyOpenlink.net beta service, the Live Demo ODS installation, an EC2 instance, or your local installation
  2. then go to ‘Settings’
  3. and then you will see ‘OAuth Keys’
  4. you will then be able to choose the applications that you have instantiated and generate the token and secret for that app.

Related Document (Human) Links

Remember (as per my most recent post about ODS), ODS is about unobtrusive fusion of Web 1.0, 2.0, and 3.0+ usage and interaction patterns. Thanks to a lot of recent standardization in the Semantic Web realm (e.g SPARQL), we are now employ the MOAT, SKOS, and SCOT ontologies as vehicles for Structured Tagging.

Structured Tagging?

This is how we take a key Web 2.0 feature (think 2D in a sense), bend it over, to create a Linked Data Web (Web 3.0) experience unobtrusively (see earlier posts re. Dimensions of Web). Thus, nobody has to change how they tag or where they tag, just expose ODS to the URLs of your Web 2.0 tagged content and it will produce URIs (Structured Data Object Identifiers) and a lnked data graph for your Tags Data Space (nee. Tag Cloud). ODS will construct a graph which exposes tag subject association, tag concept alignment / intended meaning, and tag frequencies, that ultimately deliver "relative disambiguation" of intended Tag Meaning (i.e. you can easily discern the taggers meaning via the Tags actual Data Space which is associated with the tagger). In a nutshell, the dynamics of relevance matching, ranking, and the like, change immensely without futile timeless debates about matters such as:

    What's the Linked Data value proposition?
    What's the Linked Data business model?
    What's the Semantic Web Killer application?

We can just get on with demonstrating Linked Data value using what exists on the Web today. This is the approach we are deliberately taking with ODS.

Related Items

.

Tip: This post is best viewed via an RDF aware User Agent (e.g. a Browser or Data Viewer). I say this because the permalink of this post is a URI in a Linked Data Space (My Blog) comprised of more data than meets the eye (i.e. what you see when you read this post via a Document Web Browser) :-)

]]>
Additional OpenLink Data Spaces Featureshttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1315Mon, 11 Feb 2008 16:38:03 GMT22008-02-11T11:38:03.000006-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Web Data Spaces

Now that broader understanding of the Semantic Data Web is emerging, I would like to revisit the issue of "Data Spaces".

A Data Space is a place where Data Resides. It isn't inherently bound to a specific Data Model (Concept Oriented, Relational, Hierarchical etc..). Neither is it implicitly an access point to Data, Information, or Knowledge (the perception is purely determined through the experiences of the user agents interacting with the Data Space.

A Web Data Space is a Web accessible Data Space.

Real world example:

Today we increasing perform one of more of the following tasks as part of our professional and personal interactions on the Web:

  1. Blog via many service providers or personally managed weblog platforms
  2. Create Event Calendars via Upcoming.com and Eventful
  3. Maintain and participate in Social Networks (e.g. Facebook, Orkut, MySpace)
  4. Create and Participate in Discussions (note: when you comment on blogs or wikis for instance, you are participating in, or creating, a conversation)
  5. Track news by subscribing to RSS 1.0, RSS 2.0, or Atom Feeds
  6. Share Bookmarks & Tags via Del.icio.us and other Services
  7. Share Photos via Flickr
  8. Buy, Review, or Search for books via Amazon
  9. Participates in auctions via eBay
  10. Search for data via Google (of course!)

John Breslin has nice a animation depicting the creation of Web Data Spaces that drives home the point.

Web Data Space Silos

Unfortunately, what isn't as obvious to many netizens, is the fact that each of the activities above results in the creation of data that is put into some context by you the user. Even worse, you eventually realize that the service providers aren't particularly willing, or capable of, giving you unfettered access to your own data. Of course, this isn't always by design as the infrastructure behind the service can make this a nightmare from security and/or load balancing perspectives. Irrespective of cause, we end up creating our own "Data Spaces" all over the Web without a coherent mechanism for accessing and meshing these "Data Spaces".

What are Semantic Web Data Spaces?

Data Spaces on the Web that provide granular access to RDF Data.

What's OpenLink Data Spaces (ODS) About?

Short History

In anticipation of this the "Web Data Silo" challenge (an issue that we tackled within internal enterprise networks for years) we commenced the development (circa. 2001) of a distributed collaborative application suite called OpenLink Data Spaces (ODS). The project was never released to the public since the problems associated with the deliberate or inadvertent creation of Web Data silos hadn't really materialized (silos only emerged in concreted form after the emergence of the Blogosphere and Web 2.0). In addition, there wasn't a clear standard Query Language for the RDF based Web Data Model (i.e. the SPARQL Query Language didn't exist).

Today, ODS is delivered as a packaged solution (in Open Source and Commercial flavors) that alleviates the pain associated with Data Space Silos that exist on the Web and/or behind corporate firewalls. In either scenario, ODS simply allows you to create Open and Secure Data Spaces (via it's suite of applications) that expose data via SQL, RDF, XML oriented data access and data management technologies. Of course it also enables you to integrates transparently with existing 3rd party data space generators (Blogs, Wikis, Shared Bookmrks, Discussion etc. services) by supporting industry standards that cover:

  1. Content Publishing - Atom, Moveable Type, MetaWeblog, Blogger protocols
  2. Content Syndication Formats - RSS 1.0, RSS 2.0, Atom, OPML etc.
  3. Data Management - SQL, RDF, XML, Free Text
  4. Data Access - SQL, SPARQL, GData, Web Services (SOAP or REST styles), WebDAV/HTTP
  5. Semantic Data Web Middleware - GRDDL, XSLT, SPARQL, XPath/XQuery, HTTP (Content Negotiation) for producing RDF from non RDF Data ((X)HTML, Microformats, XML, Web Services Response Data etc).

Thus, by installing ODS on your Desktop, Workgroup, Enterprise, or public Web Server, you end up with a very powerful solution for creating Open Data access oriented presence on the "Semantic Data Web" without incurring any of the typically assumed "RDF Tax".

Naturally, ODS is built atop Virtuoso and of course it exploits Virtuoso's feature-set to the max. It's also beginning to exploit functionality offered by the OpenLink Ajax Toolkit (OAT).

]]>
Semantic Web Data Spaceshttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1185Fri, 13 Apr 2007 22:19:29 GMT12007-04-13T18:19:29.000001-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Virtuoso joins Boca and ARC 2.0 as RDF Quad or Triple Stores with Full Text Index extensions to SPARQL. Here is our example applied to DBpedia:

PREFIX dbpedia: <http://dbpedia.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?name ?birth ?death
FROM <http://dbpedia.org>
WHERE {
   ?person dbpedia:birthplace <http://dbpedia.org/resource/Berlin> .
   ?person dbpedia:birth ?birth .
   ?person foaf:name ?name .
   ?person dbpedia:death ?death
   FILTER (?birth < "1900-01-01"^^xsd:date and bif:contains (?name,
'otto')) .
}
ORDER BY ?name

You can test further using our SPARQL Endpoint for DBpedia or via the DBPedia bound Interactive SPARQL Query Builder or just click *Here* for results courtesy of the SPARQL Protocol (REST based Web Service).

Note: This is in-built functionality as Virtuoso has possessed Full Text Indexing since 1998-99. This capability applies to physical and virtual graphs managed by Virtuoso.

A per usual, there is more to come as we now have a nice intersection point for SPARQL and XQuery/XPath since Triple Objects (the Literal variety) can take the form of XML Schema based Complex Types :-) A point I alluded too in my podcast interview with Jon Udell last year (*note: mechanical turk based transcript is bad*). The point I made went something like this: "...you use SPARQL to traverse the typed links and then use XPath/XQuery for further granular access to the data if well-formed..."

Anyway, the podcast interview lead to this InfoWorld article titled: Unified Data Theory.

]]>
SPARQL and Full Text Indexing implementations are growinghttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1157Tue, 13 Mar 2007 10:09:43 GMT12007-03-13T06:09:43-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>