A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).
SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.
Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.
Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.
What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:
Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:
Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.
Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:
How Does Linked Data Address This Problem? It provides critical infrastructure for the WebID Protocol that enables an innovative tweak of SSL/TLS.
What about OpenID? The WebID Protocol embraces and extends OpenID (in an open and positive way) via the WebID + OpenID Hybrid variant of the protocol -- basic effect is that OpenID calls are re-routed to the WebID aspect which simply removes Username and Password Authentication from the authentication challenge interaction pattern.
I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.
Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).
It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.
Anyway, Socialtext and Mike 2.0 (they aren't identical and juxtaposition isn't seeking to imply this) provide nice demonstrations of socially enhanced collaboration for individuals and/or enterprises is all about:
As is typically the case in this emerging realm, the critical issue of discrete "identifiers" (record keys in sense) for data items, data containers, and data creators (individuals and groups) is overlooked albeit unintentionally.
Rather than using platform constrained identifiers such as:
It enables you to leverage the platform independence of HTTP scheme Identifiers (Generic URIs) such that Identifiers for:
simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked Data Objects endowed with High SDQ (Serendipitious Discovery Quotient). For example my Personal WebID is all anyone needs to know if they want to explore:
Even when you reach a point of equilibrium where: your daily activities trigger orchestratestration of CRUD (Create, Read, Update, Delete) operations against Linked Data Objects within your socially enhanced collaboration network, you still have to deal with the thorny issues of security, that includes the following:
FOAF+SSL, an application of HTTP based Linked Data, enables you to enhance your Personal HTTP scheme based Identifer (or WebID) via the following steps (peformed by a FOAF+SSL compliant platform):
Contrary to conventional experiences with all things PKI (Public Key Infrastructure) related, FOAF+SSL compliant platforms typically handle the PKI issues as part of the protocol implementation; thereby protecting you from any administrative tedium without compromising security.
Understanding how new technology innovations address long standing problems, or understanding how new solutions inadvertently fail to address old problems, provides time tested mechanisms for product selection and value proposition comprehension that ultimately save scarce resources such as time and money.
If you want to understand real world problem solution #1 with regards to HTTP based Linked Data look no further than the issues of secure, socially aware, and platform independent identifiers for data objects, that build bridges across erstwhile data silos.
If you want to cost-effectively experience what I've outlined in this post, take a look at OpenLink Data Spaces (ODS) which is a distributed collaboration engine (enterprise of individual) built around the Virtuoso database engines. It simply enhances existing collaboration tools via the following capabilities:
Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items (if missing)
In a nutshell, the AWS Cloud infrastructure simplifies the process of generating Federated presence on the Internet and/or World Wide Web. Remember, centralized networking models always end up creating data silos, in some context, ultimately! :-)
]]>Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:
As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:
The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.
I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.
The primary topic of a meme penned by TimBL in the form of a Design Issues Doc (note: this is how TimBL has shared his thoughts since the Beginning of the Web).
There are a number of dimensions to the meme, but its primary purpose is the reintroduction of the HTTP URI -- a vital component of the Web's core architecture.
They possess an intrinsic duality that combines persistent and unambiguous Data Identity with platform & representation format independent Data Access. Thus, you can use a string of characters that look like a contemporary Web URL to unambiguously achieve the following:
Enabling more productive use of the Web by users and developers alike. All of which is achieved by tweaking the Web's Hyperlinking feature such that it now includes Hypertext and Hyperdata as link types.
Note: Hyperdata Linking is simply what an HTTP URI facilitates.
Examples problems solved by injecting Linked Data into the Web:
If all of the above still falls into the technical mumbo-jumbo realm, then simply consider Linked Data as delivering Open Data Access in granular form to Web accessible data -- that goes beyond data containers (documents or files).
The value proposition of Linked Data is inextricably linked to the value proposition of the World Wide Web. This is true, because the Linked Data meme is ultimately about an enhancement of the current Web; achieved by reintroducing its architectural essence -- in new context -- via a new level of link abstraction, courtesy of the Identity and Access duality of HTTP URIs.
As a result of Linked Data, you can now have Links on the Web for a Person, Document, Music, Consumer Electronics, Products & Services, Business Opening & Closing Hours, Personal "WishLists" and "OfferList", an Idea, etc.. in addition to links for Properties (Attributes & Values) of the aforementioned. Ultimately, all of these links will be indexed in a myriad of ways providing the substrate for the next major period of Internet & Web driven innovation, within our larger human-ingenuity driven innovation continuum.
Your Life, Profession, Web, and Internet do not need to become mutually exclusive due to "information overload".
A platform or service that delivers a point of online presence that embodies the fundamental separation of: Identity, Data Access, Data Representation, Data Presentation, by adhering to Web and Internet protocols.
Typical post installation (Local or Cloud) task sequence:
I've just outlined a snippet of the capabilities of the OpenLink Data Spaces platform. A platform built using OpenLink Virtuoso, architected to deliver: open, platform independent, multi-model, data access and data management across heterogeneous data sources.
All you need to remember is your URI when seeking to interact with your data space.
 | Web 1.0 | Web 2.0 | Web 3.0 |
Simple Definition | Interactive / Visual Web | Programmable Web | Linked Data Web |
Unit of Presence | Web Page | Web Service Endpoint | Data Space (named structured data enclave) |
Unit of Value Exchange | Page URL | Endpoint URL for API | Resource / Entity / Object URI |
Data Granularity | Low (HTML) | Medium (XML) | High (RDF) |
Defining Services | Search | Community (Blogs to Social Networks) | Find |
Participation Quotient | Low | Medium | High |
Serendipitous Discovery Quotient | Low | Medium | High |
Data Referencability Quotient | Low (Documents) | Medium (Documents) | High (Documents and their constituent Data) |
Subjectivity Quotient | High | Medium (from A-list bloggers to select source and partner lists) | Low (everything is discovered via URIs) |
Transclusence | Low | Medium (Code driven Mashups) | HIgh (Data driven Meshups) |
What You See Is What You Prefer (WYSIWYP) | Low | Medium | High (negotiated representation of resource descriptions) |
Open Data Access (Data Accessibility) | Low | Medium (Silos) | High (no Silos) |
Identity Issues Handling | Low | Medium (OpenID) | High (FOAF+SSL) |
Solution Deployment Model | Centralized | Centralized with sprinklings of Federation | Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia) |
Data Model Orientation | Logical (Tree based DOM) | Logical (Tree based XML) | Conceptual (Graph based RDF) |
User Interface Issues | Dynamically generated static interfaces | Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath) | Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF |
Data Querying | Full Text Search | Full Text Search | Full Text Search + Structured Graph Pattern Query Language (SPARQL) |
What Each Delivers | Democratized Publishing | Democratized Journalism & Commentary (Citizen Journalists & Commentators) | Democratized Analysis (Citizen Data Analysts) |
Star Wars Edition Analogy | Star Wars (original fight for decentralization via rebellion) | Empire Strikes Back (centralization and data silos make comeback) | Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation") |
Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)
As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
See: AVF Pyramid Diagram.The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
See: RDBMS Primacy Diagram.For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
See: New EAV/CR Primacy Diagram.As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
Enter search pattern: Microsoft
You will get the usual result from a full text pattern search i.e., hits and text excerpts with matching patterns in boldface. This first step is akin to throwing your net out to sea while fishing.
Now you have your catch, what next? Basically, this is where traditional text search value ends since regex or xpath/xquery offer little when the structure of literal text is the key to filtering or categorization based analysis of real-world entities. Naturally, this is where the value of structured querying of linked data starts, as you seek to use entity descriptions (combination of attribute and relationship properties) to "Find relevant things".
Continuing with the demo.
Click on "Properties" link within the Navigation section of the browser page which results in a distillation and aggregation of the properties of the entities associated with the search results. Then use the "Next" link to page through the properties until to find the properties that best match what you seek. Note, this particular step is akin to using the properties of the catch (using fishing analogy) for query filtering, with each subsequent property link click narrowing your selection further.
Using property based filtering is just one perspective on the data corpus associated with the text search pattern; thus, you can alter perspectives by clicking on the "Class" link so that you can filter you search results by entity type. Of course, in a number of scenarios you would use a combination of entity types and entity properties filters to locate the entities of interest to you.
A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.
From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:
From a Middleware perspective it provides:
From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering
From the general System Administrator's perspective it provides:
Higher level user oriented offerings include:
For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:
From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).
Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)
Courtesy of Linked Data, we are now able to extend the "document to document" linking mechanism of the Web (Hypertext Linking) to more granular "entity to entity" level linking. And in doing so, we have a layer of abstraction that in one swoop alleviates all of the infrastructure oriented data access impediments of yore. I know this sounds simplistic, but be rest assured, imbibing Linked Data's value proposition is really just that simple, once you engage solutions (e.g. Virtuoso) that enable you to deploy Linked Data across your enterprise.
Microsoft ACCESS, SQL Server, and Virtuoso all use the Northwind SQL DB Schema as the basis of the demonstration database shipped with each DBMS product. This schema is comprised of common IS/MIS entities that include: Customers, Contacts, Orders, Products, Employees etc.
What we all really want to do as data, information, and knowledge consumers and/or dispatchers, is be no more than a single "mouse click" away from relevant data/information/knowledge data access and/or exploration. Even better (but not always so obvious), we also want anyone in our network (company, division, department, cube-cluster) to inherit these data access efficiencies.
In this example, the Web Page about the Customer "ALKI" provides me with a myriad of exploration and data access paths e.g., when I click on the foaf:primarytopic property value link.
This simple example, via a single Web Page, should put to rest any doubts about the utility of Linked Data. Of course this is an old demo, but this time around the UI is minimalist as my prior attempts skipped a few steps i.e., starting from within a Linked Data explorer/browser.
Important note: I haven't exported SQL into an RDF data warehouse, I am converting the SQL into RDF Linked Data on the fly which has two fundamental benefits:
Enjoy!
By coincidence, Glenn and I presented at this month's Cambridge Semantic Web Gathering.
I've provided a dump of Glenn's issues and my responses below:
RDF is a Graph based Data Model it stands for Resource Description Framework. The Metadata data angle comes from it's Meta Content Framework (MCF) origins. You can express and serialize data based on the RDF Data Model using: Turtle, N3, TriX, N-Triples, and RDF/XML.
These are just appeasement:
- old query paradigm: fishing in dark water with superstitiously tied lures; only works well in carefully stocked lakes
- we don't ask questions by defining answer shapes and then hoping they're dredged up whole.
SPARQL, MQL, and Entity-SQL are Graph Model oriented Query Languages. Query Languages always accompany Database Engines. SQL is the Relational Model equivalent.
Noble attempt to ground the abstract, but:
- URI dereferencing/namespace/open-world issues focus too much technical attention on cross-source cases where the human issues dwarf the technical ones anyway
- FOAF query over the people in this room? forget it.
- link asymmetry doesn't scale
- identity doesn't scale
- generating RDF from non-graph sources: more appeasement, right where the win from actually converting could be biggest!
Innovative use of HTTP to deliver "Data Access by Reference" to the Linked Data Web.
When you have a Data Model, Database Engine, and Query Language, the next thing you need is a Data Access mechanism that provides "Data Access by Reference". ODBC and JDBC (amongst others) provide "Data Access by Reference" via Data Source Names. Linked Data is about the same thing (URIs are Data Source Names) with the following differences:
Hugely motivating and powerful idea, worthy of a superhero (Graphius!), but:
- giant and global parts are too hard, and starting global makes every problem harder
- local projects become unmanageable in global context (Cyc, Freebase data-modeling lists...).
And my thus my plea, again. Forget "semantic" and "web", let's fix the database tech first:
- node/arc data-model, path-based exploratory query-model
- data-graph applications built easily on top of this common model; building them has to be easy, because if it's hard, they'll be bad
- given good database tech, good web data-publishing tech will be trivial!
- given good tools for graphs, the problems of uniting them will be only as hard as they have to be.
Giant Global Graph is just another moniker for a "Web of Linked Data" or "Linked Data Web".
Multi-Model Database technology that meshes the best of the Graph & Relational Models exist. In a nutshell, this is what Virtuoso is all about and it's existed for a very long time :-)
Virtuoso is also a Virtual DBMS engine (so you can see Heterogeneous Relational Data via Graph Model Context Lenses). Naturally, it is also a Linked Data Deployment platform (or Linked Data Sever).
The issue isn't the "Semantic Web" moniker per se., it's about how Linked Data (foundation layer of Semantic Web) gets introduced to users. As I said during the MIT Gathering: "The Web is experienced via Web Browsers primarily, so any enhancement to the Web must be exposed via traditional Web Browsers", which is why we've opted to simply add "View Linked Data Sources" to the existing set of common Browser options that includes:
By exposing the Linked Data Web option as described above, you enable the Web user to knowingly transition from the traditional Rendered (X)HTML page view to the Linked Data View (i.e., structured data behind the page). This simple "User Interaction" tweak makes the notion of exploiting a Structured Web becomes somewhat clearer.
The Linked Data Web isn't a panacea. It's just an addition to the existing Web that enrichens the things you can do with the Web. It's predominance, like any application feature, will be subject to the degrees to which it delivers tangible value or matrializes internal and external opportunity costs.
Note: The Web isn't ubiquitous today becuase all it's users groked HTML Markup. It's ubquitity is a function of opportunity costs: there simply came a point in the Web boostrap when nobody could afford the opportunity costs associated with being off the Web. The same thing will play out with Linked Data and the broader Semantic Web vision.
Links:ODBC identifies data sources using Data Source Names (DSNs).
WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.
ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.
WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).
ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).
WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!
So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!
Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.
URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.
I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.
The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.
By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)
]]>Unfortunately, the cost of completing ZDNet's unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I'll settle for a trackback ping instead.
What follows is the cut and paste of my intended comment contributions to Paul's post.
Paul,
As discussed earlier this week during our podcast session, commercialization of Semantic Web technology shouldn't be a mercurial matter at this stage in the game :-) It's all about looking at how it provides value :-)
From the Linked Data angle, the ability to produce, dispatch, and exploit "Context" across an array of "Perspectives" from a plethora of disparate data sources on the Web and/or behind corporate firewalls, offers immense commercial value.
Yahoo's Searchmonkey effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as "value consumption tickets" (Data Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to data encountered on the Web. Yahoo! is about to put this light on in a big way (imho).
The "self annotating" nature of the Web is what ultimately drives the manifestation of the long awaited Semantic Web. I believe I postulated about "Self Annotation & the Semantic Web" in a number of prior posts which, by the way, should be DataRSS compatible right now due to Yahoo's support of OpenSearch Data Providers (which this Blog Space has been for eons).
Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, Tag, Weblog, Shared Bookmark, Wikiword, Microformat, Microformat++ (eRDF or RDFa), GRDDL stylesheet, and RDFizer etc.. is a piece of structured data.
Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)
Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (information resource), and then you add Structure to the information resource (RSS, Atom, microformats, RDFa, eRDF, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to Linked Data) is a synch thanks to RDF Middleware (as per earlier RDF middleware posts).
]]>The great thing about the Linked Data Web is that it's much easier to discovery and respond to these points of view before the ink dries :-) Ben certainly needs to take a look at the Semantic Web FAQ pre or post assimilation of Daniel's response.
]]>As I can't quite remix Videos on the spur of the moment (yet), I would encourage you to watch the video and then click on the link to my FOAF Profile, then follow the "Linked Data" tab to see how Linked Data oriented platforms (in my case OpenLink Data Spaces) that exist today actually deliver what's explained in the video.
"What You Know" (Data & Friend Networks) ultimately trumps "Who You Know" (Friend only Networks). The exploitation power of this reality is enhanced exponentially via the Linked Data Web once the implications of beaming SPARQL queries down specific URIs (entry points to Linked Data graphs) become clearer :-)
]]>In the form above (the norm), Wordpress data can be injected into the Linked Data Web via RDFization middleware such as theVirtuoso Sponger (built into all Virtuoso instances) and Triplr. The downside of this approach is that the blog owner doesn't necessary possess full control over their contributions to the emerging Giant Global Graph or Linked Data.
Another route to Linked Data exposure is via Virtuoso's Metaschema Language for producing RDF Views over ODBC/JDBC accessible Data Sources, that enables the following setup:
Alternatively, you can also exploit Virtuoso as the SQL DBMS, RDF DBMS, Application Server, and Linked Data Deployment platform:
How Do I map the WordPress SQL Schema to RDF using Virtuoso?
Read the Meta Schema Language guide or simply apply our "WordPress SQL Schema to RDF" script to your Virtuoso hosted instance. Of course, there are other mappings that cover other PHP applications deployed via Virtuoso:
Trent Adams, Steve Greenberg, and I, also had a podcast chat about Web Data Portability and Accessibility (Linked Data). I also remixed Jon Breslin's "Data Portability & Me" presentation to produce: "Data Accessibility & Me".
The podcasts interviews and presentations provide contributions to the broadening discourse about Open Data Access / Connectivity on the Web.
]]>*On* the ubiquitous Web of "Linked Documents", HREF means (by definition and usage): Hypertext Reference to an HTTP accessible Data Object of Type: "Document" (an information resource). Of course we don't make the formal connection of Object Type when dealing with the Web on a daily basis, but whenever you encounter the "resource not found" condition notice the message: HTTP/1.0 404 Object Not Found, from the HTTP Server tasked with retrieving and returning the resource.
*In* the Web of "Linked Data", a complimentary addition to the current Web of "Linked Documents", HREF is used to reference Data Objects that are of a variety of "Types", not just "Documents". And the way this is achieved, is by using Data Object Identifiers (URIs / IRIs that are generated by the Linked Data deployment platform) in the strict sense i.e. Data Identity (URI) is separated from Data Address (URL). Thus, you can reference a Person Data Object (aka an instance of a Person Class) in your HREF and the HTTP Server returns a Description of the Data Object via a Document (again, an information resource). A document containing the Description of a Data Object typically contains HREFs to other Data Objects that expose the Attributes and Relationships of the initial Person Data Object, and it this collection of Data Objects that is technically called a "Graph" -- which is what RDF models.
What I describe above is basic stuff for anyone that's familiar with Object Database or Distributed Objects technology and concepts.
The Linked Document Web is a collection of physical resources that traverse the Web Information Bus in palatable format i.e documents. Thus, Document Object Identity and Document Object Data Address can be the same thing i.e. a URL can serve as the ID/URI of a Document Data Object.
The Linked Data Web on the other hand, is a Distributed Object Database, and each Data Object must be uniquely defined, otherwise we introduce ambiguity that ultimately taints the Database itself (making incomprehensible to reasoning challenged machines). Thus we must have unique Object IDs (URIs / IRIs) for People, Places, Events, and other things that aren't Documents. Once we follow the time tested rules of Identity, People can then be associated with the things they create (blog posts, web pages, bookmarks, wikiwords etc). RDF is about expressing these graph model relationships while RDF serialization formats enables the information resources to transport these data object link ladden information resources to requesting User Agents.
Put in more succinct terms, all documents on the Web are compound documents in reality (e.g. mast contain a least an image these days). The Linked Data Web is about a Web where Data Object IDs (URIs) enable us to distill source data from the information contained in a compound document.
The degree of unobtrusiveness of new technology, concepts, or new applications of existing technology, is what ultimately determines eventual uptake and meme virulence (network effects). For a while, the Semantic Web meme was mired in confusion and general misunderstanding due to a shortage of practical use case scenario demos.
The emergence of the SPARQL Query Language has provided critical infrastructure for a number of products, projects, and demos, that now make the utility of the Semantic Web vision mush clearly via the simplicity of Linked Data, as exemplified by the following:
There are quite a few reasons to use OpenLink Data Spaces (ODS). Here are 10 of the reasons why I use ODS:
- Its native support of DataPortability Recommendations such as RSS, Atom, APML, Yadis, OPML, Microformats, FOAF, SIOC, OpenID and OAuth.
- Its native support of Semantic Web Technologies such as: RDF and SPARQL/SPARUL for querying.
- Everything in ODS is an Object with its own URI, this is due to the underlying Object-Relational Architecture provided by Virtuoso.
- It has all the social media components that you could need, including: blogs, wikis, social networks, feed readers, CRM and a calendar.
- It is expandable by installing pre-configured components (called VADs), or by re-configuring a LAMP application to use Virtuoso. Some examples of current VADs include: MediaWiki, Wordpress and Drupal.
- It works with external webservices such as: Facebook, del.icio.us and Flickr.
- Everything within OpenLink Data Spaces is Linked Data, which provides more meaningful information than just plain structural information. This meaningful information could be used for complex inferencing systems, as ODS can be seen as a Knowledge Base.
- ODS builds bridges between the existing static-document based web (aka âWeb 1.0â), the more dynamic, services-oriented, social and/or user-orientated webs (aka âWeb 2.0â) and the web which we are just going into, which is more data-orientated (aka âWeb 3.0â or âLinked Data Webâ).
- It is fully supportive of Cloud Computing, and can be installed on Amazon EC2.
- Its released free under the GNU General Public License (GPL). [note]However, it is technically dual licensed as it lays on top of the Virtuoso Universal Server which has both Commercial and GPL licensing[/note]
The features above collectively provide users with a Linked Data Junction Box that may reside with corporate intranets or "out in the clouds" (Internet). You can consume, share, and publish data in a myriad of formats using a plethora of protocols, without any programming. ODS is simply about exposing the data from your Web 1.0, 2.0, 3.0 application interactions in structured from, with Linking, Sharing, and ultimately Meshing (not Mashing) in mind.
Note: Although ODS is equipped with a broad array of Web 2.0 style Applications, you do not need to use native ODS apps in order to exploit it's power. It binds to anything that supports the relevant protocols and data formats.
]]>If you want to explore who I know, what I read, and what I've tagged (amongst other things), all you have to do is:
Some Tools that help you comprehend what I am saying:
Jason recently moved to Massachusetts which lead to me pinging him about our earlier blogosphere encounter and the emergence of a Data Portability Community. I also informed him about the fact that TimBL, myself, and a number of other Semantic Web technology enthusiasts, frequently meet on the 2nd Tuesday of each month at the MIT hosted Cambridge Semantic Web Gatherings, to discuss, demonstrate, debate all aspects of the Semantic Web. Luckily (for both of us), Jason attended the last event, and we got to meet each other in person.
Following our face to face meeting in Cambridge, a number of follow-on conversations ensued covering, Linked Data and practical applications of the Semantic Web vision. Jason writes about our exchanges a recent post titled: The Semantic Web. His passion for Data Portability enabled me to use OpenID and FOAF integration to connect the Semantic Web and Data Portability via the Linked Data concept.
During our conversations, Jason also eluded to the fact that he had already encountered OpenLink Software while working with our ODBC Drivers (part of or UDA product family) for IBM Informix (Single-Tier or Multi-Tier Editions) a few years ago (interesting random connection).
As I've stated in the past, I've always felt that the Semantic Web vision will materialize by way of a global epiphany. The count down to this inevitable event started at the birth of the blogosphere, ironically. And accelerated more recently, through the emergence of Web 2.0 and Social Networking, even more ironically :-)
The blogosphere started the process of Data Space coalescence via RSS/Atom based semi-strucutured data enclaves, Web 2.0 RDFpropagated Web Service usage en route to creating service provider controlled, data and information silosRDF, Social NetworkingRDF brought attention to the fact that User Generated Data wasn't actually owned or controlled by the Data Creators etc.
The emergence of "Data Portability" has created a palatable moniker for a clearly defined, and slightly easier to understand, problem: the meshing of Data and Identity in cyberspace i.e. individual points of presence in cyberspace, in the form of "Personal Data Spaces in the Clouds" (think: doing really powerful stuff with .name domains). In a sense, this is the critical inflection point between the document centric "Web of Linked Documents" and the data centric "Web or Linked Data". There is absolutely no other way solve this problem in a manner that alleviates the imminent challenges presented by information overload -- resulting from the exponential growth of user generated data across the Internet and enterprise Intranets.
]]>A query language for the burgeoning Structured & Linked Data Web (aka Semantic Web / Giant Global Graph). Like SQL, for the Relational Data Model, it provides a query language for the Graph based RDF Data Model.
It's also a REST or SOAP based Web Service that exposes SPARQL access to RDF Data via an endpoint.
In addition, it's also a Query Results Serialization format that includes XML and JSON support.
It brings important clarity to the notion of the "Web as a Database" by transforming existing Web Sites, Portals, and Web Services into bona fide corpus of Mesh-able (rather than Mash-able) Data Sources. For instance, you can perform queries that join one or more of the aforementioned data sources in exactly the same manner (albeit different syntax) as you would one or more SQL Tables.
-- SPARQL equivalent of SQL SELECT * against my personal data space hosted FOAF file
SELECT DISTINCT ?s ?p ?o FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s ?p ?o}
-- SPARQL against my social network -- Note: My SPARQL will be beamed across all of contacts in the social networks of my contacts as long as they are all HTTP URI based within each data space
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?Person FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s a foaf:Person; foaf:knows ?Person}
Note: you can use the basic SPARQL Endpoint, SPARQL Query By Example, or SPARQL Query Builder Demo tool to experiment with the demonstration queries above.
SPARQL is implemented by RDF Data Management Systems (Triple or Quad Stores) just as SQL is implemented by Relational Database Management Systems. The aforementioned data management systems will typically expose SPARQL access via a SPARQL endpoint.
A SPARQL implementors Testimonial page accompanies the SPARQL press release. In addition the is a growing collection of implementations on the ESW Wiki Page for SPARQL compliant RDF Triple & Quad Stores.
Yes! SPARQL facilitates an unobtrusive manifestation of a Linked Data Web by way of natural extension of the existing Document Web i.e these Web enclaves co-exist in symbiotic fashion.
As DBpedia very clearly demonstrates, Linked Data makes the Semantic Web demonstrable and much easier to comprehend. Without SPARQL there would be no mechanism for Linked Data deployment, and without Linked Data there is no mechanism for Beaming Queries (directly or indirectly) across the Giant Global Graph of data hosted by Social Networks, Shard Bookmarks Services, Weblogs, Wikis, RSS/Atom/OPML feeds, Photo Galleries and other Web accessible Data Sources (Data Spaces).
Writing a JDBC Driver for SPARQL is a little overkill. OpenOffice.org simply needs to make XML or Web Data (HTML, XHTML, and XML) bonafide data sources within its "Pivot Table" functionality realm. Then all that would then be required is a SPARQL SELECT Query transported via the SPARQL Protocol with results sent back using the SPARQL XML results serialization format (all part of a single SPARQL Protocol URL).
Excel successfully consumes the following information resource URI: http://tinyurl.com/yvoccj (a tiny url for a SPARQL SELECT against my FOAF file).
Alternatively, and currently achievable, you could simply use SPASQL (SPARQL within SQL) using a DBMS engine that supports SQL, SPARQL, and SPARQL e.g. Virtuoso.
Virtuoso SPASQL support is exposed via it's ODBC and/or JDBC Drivers. Thus you can do things such as:
BTW - My News Years Resolution: get my act together and shrink the ever increasing list of "simple & practical Virtuoso use case demos" on my todo which now spans all the way back to 2006 :-(
]]>Jon Udell recently penned a post titled: The Fourth Platform. The post arrives at a spookily coincidental time (this happens quite often between Jon and I as demonstrated last year during our podcast; the "Fourth" in his Innovators Podcast series).
The platform that Jon describes is "Cloud Based" and comprised of Storage and Computation. I would like to add Data Access and Management (native and virtual) under the fourth platform banner with the end product called: "Cloud based Data Spaces".
As I write, we are releasing a Virtuoso AMI (Amazon Image) labeled: virtuoso-dataspace-server. This edition of Virtuoso includes the OpenLink Data Spaces Layer and all of the OAT applications we've been developing for a while.
There's more to come!
]]>Since I am aggressively tracking RDFa developments, I decided to quickly view Ivan's FOAF-in-RDFa file via the OpenLink RDF Browser. The full implications are best understood when you click on each of the Browser's Tabs -- each providing a different perspective on this interesting addition to the Semantic Data Web (note: the Fresnel Tab which demonstrates declarative UI templating using N3).
The OpenLink RDF Browser is a Rich Internet Application built using OAT (OpenLink Ajax Toolkit). In my case, I am deploying the RDF Browser from a Virtuoso instance, which implies that the Browser is able to use the Virtuoso Sponger Middleware (exposed as a REST Service at the Virtuoso instance endpoint: /proxy); which includes an RDFa Cartridge comprised of a metadata extractor and an RDF Schema / OWL Ontology mapper. That's it!
]]>A vital component of the new Virtuoso release is the finalization of our SQL to RDF mapping functionality -- enabling the declarative mapping of SQL Data to RDF. Additional technical insight covering other new features (delivered and pending) is provided by Orri Erling, as part of a series of post-Banff posts.
A majority of the world's data (especially in the enterprise realm) resides in SQL Databases. In addition, Open Access to the data residing in said databases remains the biggest challenge to enterprises for the following reasons:
Enterprises have known from the beginning of modern corporate times that data access, discovery, and manipulation capabilities are inextricably linked to the "Real-time Enterprise" nirvana (hence my use of 0.0 before this becomes 3.0).
In my experience, as someone whose operated in the data access and data integration realms since the late '80s, I've painfully observed enterprises pursue, but unsuccessfully attain, full control over enterprise data (the prized asset of any organization) such that data-, information-, knowledge-workers are just a click away from commencing coherent platform and database independent data drill-downs and/or discovery that transcend intranet, internet, and extranet boundaries -- serendipitous interaction with relevant data, without compromise!
Okay, situation analysis done, we move on..
At our most recent (12th June) monthly Semantic Web Gathering, I unveiled to TimBL and a host of other attendees a simple, but powerful, demonstration of how Linked Data, as an aspect of the Semantic Data Web, can be applied to enterprise data integration challenges.
The vision of data, information, or knowledge at your fingertips is nigh! Thanks to the infrastructure provided by the Semantic Data Web (URIs, RDF Data Model, variety of RDF Serialization Formats[1][2][3], and Shared Data Dictionaries / Schemas / Ontologies [1][2][3][4][5]) it's now possible to Virtualize enterprise data from the Physical Storage Level, through the Logical Data Management Levels (Relational), up to a Concrete Conceptual Model (Graph) without operating system, development environment or framework, or database engine lock-in.
We produce a shared ontology for the CRM and Business Reporting Domains. I hope this experiment clarifies how this is quite achievable by converting XML Schemas to RDF Data Dictionaries (RDF Schemas or Ontologies). Stay tuned :-)
Also watch TimBL amplify and articulate Linked Data value in a recent interview.
To deliver a mechanism that facilitates the crystallization of this reality is a contribution of boundless magnitude (as we shall all see in due course). Thus, it is easy to understand why even "her majesty", the queen of England, simply had to get in on the act and appoint TimBL to the "British Order of Merit" :-)
Note: All of the demos above now work with IE & Safari (a "remember what Virtuoso is epiphany") by simply putting Virtuoso's DBMS hosted XSLT engine to use :-) This also applies to my earlier collection of demos from the Hello Data Web and other Data Web & Linked Data related demo style posts.
]]>The items that follow attempt to demonstrate the point by way of SIOC (Semantically-Interlinked Online Communities Ontology) and MO (Music Ontology) domain exploration:
Linked Data or Dynamic Data Web Pages:
Semantic Web Browser Sessions:
Key point, if you are modeling People, Communities, Organizations, Documents, and other entities in the People, Organizations, Documents etc. Data Space, don't forget to : FOAF-FOAF-FOAF it Up! :-)
]]>Naturally, this triggered an obvious opportunity to demonstrate the prowess of Linked Data on the Semantic Web. What follows is a quick dump of what I sent to the foaf-dev mailing list:
Here are variety of FOAF Views built using:
Enabling you to explore the following lines:
Quick Definitions:
Reasons for the distinction:
Examples:
So what? You may be thinking.
For starters, I can quite easily Mesh data from Googlebase (which emits RSS 2.0 or Atom) and other data sources with the Mapping Services from Yahoo!
I can achieve this in minutes without writing a single line of code. I can do it because of the Data Model prowess of RDF (self-describing instance-data), the data interchange and transformation power of XML and XSLT respectively, the inherent power of XML based Web Services (REST or SOAP), and of course, having a Hybrid Server product like Virtuoso at my disposal that delivers a cross platform solution for exploiting all of these standards coherently.
I can share the self-describing describing data source that serves my Meshup. Try reusing the data presented by a Mashup via the same URL that you used to locate Mashup to get my drift.
Demo Links:
What does this all mean?
"Context" is the catalyst of the burgeoning Data Web (Semantic Web Layer - 1). It's the emerging appreciation of "Context" that is driving the growing desire to increment Web versions from 2.0 to 3.0. It also the the very same "Context" that has been a preoccupation of Semantic Web vision since its inception.
The journey towards a more Semantic Web is all inclusive (all "ANDs" and no "ORs" re. participation).
The Semantic Web is self-annotating. Web 2.0 has provided a huge contribution to the self annotation effort: on the Web we now have Data Spaces for Bookmarks (e.g del.icio.us), Image Galleries ( e.g Flickr), Discussion Forums (remember those comments associated with blog posts? ditto the pingbacks and trackbacks?), People Profiles (FOAF, XFN, del.icio.us, and those crumbling walled-gardens around many Social Networks), and more..
A Web without granular access to Data is simply not a Web worth having (think about the menace of click-fraud and spam).
]]>PREFIX dbpedia: <http://dbpedia.org/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?name ?birth ?death FROM <http://dbpedia.org> WHERE { ?person dbpedia:birthplace <http://dbpedia.org/resource/Berlin> . ?person dbpedia:birth ?birth . ?person foaf:name ?name . ?person dbpedia:death ?death FILTER (?birth < "1900-01-01"^^xsd:date and bif:contains (?name, 'otto')) . } ORDER BY ?name
You can test further using our SPARQL Endpoint for DBpedia or via the DBPedia bound Interactive SPARQL Query Builder or just click *Here* for results courtesy of the SPARQL Protocol (REST based Web Service).
Note: This is in-built functionality as Virtuoso has possessed Full Text Indexing since 1998-99. This capability applies to physical and virtual graphs managed by Virtuoso.
A per usual, there is more to come as we now have a nice intersection point for SPARQL and XQuery/XPath since Triple Objects (the Literal variety) can take the form of XML Schema based Complex Types :-) A point I alluded too in my podcast interview with Jon Udell last year (*note: mechanical turk based transcript is bad*). The point I made went something like this: "...you use SPARQL to traverse the typed links and then use XPath/XQuery for further granular access to the data if well-formed..."
Anyway, the podcast interview lead to this InfoWorld article titled: Unified Data Theory.
Linking personal posted content across communities: "
With the help of Kingsley, Uldis and I have been looking at how SIOC can be used to link the content that a single person posts to a number of community sites. The picture below shows an example of stuff that Iâve created on Flickr, YouTube, etc. through my various user identities on those sites (these match some SIOC types that we want to add to a separate module). We can also say that each Web 2.0 content item is a user-contributed post, with some attached or embedded content (e.g. a file or maybe just some metadata). This is part of a new discussion on the sioc-dev mailing list, and weâd value your contributions.
Edit: The inner layer is a person (semantically described in FOAF), the next layer is their user accounts (described in FOAF, SIOC) and the outer layer is the posted content - text, files, associated metadata - on community sites (again described using SIOC).
No Tags"(Via John Breslin - Cloudlands.)
The point that John is making about the Data Web and Interlinked Data Spaces exposed via URIs (e.g Personal URIs), crystallizes a number of very important issues about the Data Web that may remain unclear. I am hoping that by digesting the post excerpt above, in conjunction with the items below, aids the pursuit of clarity and comprehension about the all important Data Web (Semantic Web - Layer 1):
Examples of some of these principles in practice:
And of course there is more to come such as Grandma's Semantic Web Browser which is coming from Zitgist LLC (pronounced: Zeitgeist) a joint venture of OpenLink Software and Frederick Giasson.
]]>Play Date: What is that thing on the Wall? My Son: Security Alarm Play Date: How does it work My Son: If you click on that top button and then open the door, I will have to enter a code when we come back in or the alarm will go off Play Date: What is the code? My Son: I can't tell you that! Play Date: Why not? My Son: You might come and steal something from our house! Play Date: No I won't! My Son: Well, you might tell someone that might come and steal something from our house! or that person could tell someone who could tell someone that would steal from our house
LOL!! of course! At the same time wondering, how come a majority of adults don't quite see the need for granular access to Web Data in a manner that enables computers and humans to collectively arrive at similar decisions?
Putting Data in context en route to producing actionable knowledge is a transient endeavor that engages a myriad of human senses. We demonstrate comprehension of this fact in our daily existence as social creatures (at a very early age as depicted above). That said, we seem to forget this fact when engaging the Web: If we can't see it then it can't be valuable.
BTW - I just received a ping about the "Sensory Web" (which is just another way of describing a Data Driven Web experience from my vantage point.)
In the popular M-V-C pattern you don't see the "M", but the "M" will kill you if you get it wrong (it is the FORCE)! Coming to think about it, the pattern could have been coined: V-C-M or C-M-V, but isn't for obvious reasons :-)
RDF is the vehicle that enables us tap into the Data aspect of the Web. We started off with pages of blurb linked via hypertext (Web 1.0) and then looked to "Keywords" for some kind of data access; we then isolated some "Verbs" and discovered another dimension of Web Interaction (Web 2.0) but looked to these "Verbs" for data access which left us with Mashups; and now we are starting to extract "Nouns" and "Adjectives" from sentences (Subject, Predicate, Object - Triples) associated with resources on the Web (Data Web / Web 3.0 / Semantic Web Layer 1) which provides a natural data access substrate for Meshups (natural joining of disparate data from a plethora of data sources) while providing the foundation layer for the Semantic Web.
For those who need use-cases that demonstrate tangible value re. the Semantic Web, here are some projects to note courtesy of the Semantic Web Education and Outreach (SWEO) interest group:
The Data in Fred's post is based on FOAF Ontology instance data generated from a myriad of Data Sources.
]]>A declarative language adapted from SPARQL's graph pattern language (N3/Turtle) for mapping SQL Data to RDF Ontologies. We currently refer to this as a Graph Pattern based RDF VIEW Definition Language.
It provides an effective mechanism for exposing existing SQL Data as virtual RDF Data Sets (Graphs) negating the data duplication associated with generating physical RDF Graphs from SQL Data en route to persistence in a dedicated Triple Store.
Enterprise applications (traditional and web based) and most Web Applications (Web 1.0 and Web 2.0) sit atop relational databases, implying that SQL/RDF model and data integration is an essential element of the burgeoning "Data Web" (Semantic Web - Layer 1) comprehension and adoption process.
In a nutshell, this is a quick route for non disruptive exposure of existing SQL Data to SPARQL supporting RDF Tools and Development Environments.
CREATE GRAPH IRI("http://myopenlink.net/dataspace")
CREATE IRI CLASS odsWeblog:feed_iri "http://myopenlink.net/dataspace/kidehen/weblog/MyFeeds" ( in memb varchar not null, in inst varchar not null)
For additional clarity re. my comments above, you can also look at the SPARQL & SIOC Usecase samples document for our OpenLink Data Spaces platform. Bottom line, the Semantic Web and SPARQL aren't BORING. In fact, quite the contrary, since they are essential ingredients of a more powerful Web than the one we work with today!
Enjoy the rest of John's post:
]]>Creating connections between discussion clouds with SIOC:
(Extract from our forthcoming BlogTalk paper about browsers for SIOC.)
SIOC provides a unified vocabulary for content and interaction description: a semantic layer that can co-exist with existing discussion platforms. Using SIOC, various linkages are created between the aforementioned concepts, which allow new methods of accessing this linked data, including:
- Virtual Forums. These may be a gathering of posts or threads which are distributed across discussion platforms, for example, where a user has found posts from a number of blogs that can be associated with a particular category of interest, or an agent identifies relevant posts across a certain timeframe.
- Distributed Conversations. Trackbacks are commonly used to link blog posts to previous posts on a related topic. By creating links in both directions, not only across blogs but across all types of internet discussions, conversations can be followed regardless of what point or URI fragment a browser enters at.
- Unified Communities. Apart from creating a web page with a number of relevant links to the blogs or forums or people involved in a particular community, there is no standard way to define what makes up an online community (apart from grouping the people who are members of that community using FOAF or OPML). SIOC allows one to simply define what objects are constituent parts of a community, or to say to what community an object belongs (using sioc:has_part / part_of): users, groups, forums, blogs, etc.
- Shared Topics. Technorati (a search engine for blogs) and BoardTracker (for bulletin boards) have been leveraging the free-text tags that people associate with their posts for some time now. SIOC allows the definition of such tags (using the subject property), but also enables hierarchial or non-hierarchial topic definition of posts using sioc:topic when a topic is ambiguous or more information on a topic is required. Combining with other Semantic Web vocabularies, tags and topics can be further described using the SKOS organisation system.
- One Person, Many User Accounts. SIOC also aims to help the issue of multiple identities by allowing users to define that they hold other accounts or that their accounts belong to a particular personal identity (via foaf:holdsOnlineAccount or sioc:account_of). Therefore, all the posts or comments made by a particular person using their various associated user accounts across platforms could be identified.
Continuing from our recent Podcast conversation, Jon Udell sheds further insight into the essence of our conversation via a âStrategic Developerâ column article titled: Accessing the web of databases.
Below, I present an initial dump of a DataSpace FAQ below that hopefully sheds light on the DataSpace vision espoused during my podcast conversation with Jon.
What is a DataSpace?
A moniker for Web-accessible atomic containers that manage and expose Data, Information, Services, Processes, and Knowledge.
What would you typically find in a Data Space? Examples include:
How do Data Spaces and Databases differ?
Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled.
How do Data Spaces and Content Management Systems differ?
Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity.
How do Data Spaces and Knowledgebases differ?
A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models.
What Architectural Components make up a Data Space?
Where can I see a DataSpace along the lines described, in action?
Just look at my blog, and take the journey as follows:
What about other Data Spaces?
There are several and I will attempt to categorize along the lines of query method available:
Type 1 (Free Text Search over HTTP):
Google, MSN, Yahoo!, Amazon, eBay, and most Web 2.0 plays .
Type 2 (Free Text Search and XQuery/XPath over HTTP)
A few blogs and Wikis (Jon Udell's and a few others)
What About Data Space aware tools?
]]>
Standards as social contracts: "Looking at Dave Winer's efforts in evangelizing OPML, I try to draw some rough lines into what makes a de-facto standard. De Facto standards are made and seldom happen on their own. In this entry, I look back at the history of HTML, RSS, the open source movement and try to draw some lines as to what makes a standard.
"(Via Tristan Louis.)
I posted a comment to the Tristan Louis' post along the following lines:
Analysis is spot on re. the link between de facto standardization and bootstrapping. Likewise, the clear linkage between boostrapping and connected communities (a variation of the social networking paradigm).
Dave built a community around a XML content syndication and subscription usecase demo that we know today as the blogosphere. Superficially, one may conclude that Semantic Web vision has suffered to date from a lack a similar bootstrap effort. Whereas in reality, we are dealing with "time and context" issues that are critical to the base understanding upon which a "Dave Winer" style bootstrap for the Semantic Web would occur.
Personally, I see the emergence of Web 2.0 (esp. the mashups phenomenon) as the "time and context" seeds from which the Semantic Web bootstrap will sprout. I see shared ontologies such as FOAF and SIOC leading the way (they are the RSS 2.0's of the Semantic Web IMHO).
]]>Virtuoso extends its SQL3 implementation with syntax for integrating SPARQL into queries and subqueries.Thus, as part of a SQL SELECT query or subquery, one can write the SPARQL keyword and a SPARQL query as part of query text processed by Virtuoso's SQL Query Processor.
Using Virtuoso's Command line or the Web Based ISQL utility type in the following (note: "SQL>" is the command line prompt for the native ISQL utility):
SQL> sparql select distinct ?p where { graph ?g { ?s ?p ?o } };
Which will return the following:
p varchar ---------- http://example.org/ns#b http://example.org/ns#d http://xmlns.com/foaf/0.1/name http://xmlns.com/foaf/0.1/mbox ...
SQL> select distinct subseq (p, strchr (p, '#')) as fragment from (sparql select distinct ?p where { graph ?g { ?s ?p ?o } } ) as all_predicates where p like '%#%' ;
fragment varchar ---------- #query #data #name #comment ...
You can pass parameters to a SPARQL query using a Virtuoso-specific syntax extension. '??' or '$?' indicates a positional parameter similar to '?' in standard SQL. '??' can be used in graph patterns or anywhere else where a SPARQL variable is accepted. The value of a parameter should be passed in SQL form, i.e. this should be a number or an untyped string. An IRI ID can not be passed, but an absolute IRI can. Using this notation, a dynamic SQL capable client (ODBC, JDBC, ADO.NET, OLEDB, XMLA, or others) can execute parametrized SPARQL queries using parameter binding concepts that are common place in dynamic SQL. Which implies that existing SQL applications and development environments (PHP, Ruby, Python, Perl, VB, C#, Java, etc.) are capable of issuing SPARQL queries via their existing SQL bound data access channels against RDF Data stored in Virtuoso.
Note: This is the Virtuoso equivalent of a recently published example using Jena (a Java based RDF Triple Store).
Create a Virtuoso Function by execting the following:
SQL> create function param_passing_demo (); { declare stat, msg varchar; declare mdata, rset any; exec ('sparql select ?s where { graph ?g { ?s ?? ?? }}', stat, msg, vector ('http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#int1', 4 ), -- Vector of two parameters 10, -- Max. result-set rows mdata, -- Variable for handling result-set metadata rset -- Variable for handling query result-set ); return rset[0][0]; }Test new "param_passing_demo" function by executing the following:
SQL> select param_passing_demo ();
Which returns:
callret VARCHAR _______________________________________________________________________________http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#four1 Rows. -- 00000 msec.
A SPARQL ASK query can be used as an argument of the SQL EXISTS predicate.
create function sparql_ask_demo () returns varchar { if (exists (sparql ask where { graph ?g { ?s ?p 4}})) return 'YES'; else return 'NO'; };
Test by executing:
SQL> select sparql_ask_demo ();
Which returns:
_________________________ YES]]>
The Dublin Core Metadata Initiative is updating the RDF expression of DC and might add range restrictions to some properties. Mikael Nilsson wondered if we would use the Swoogle Semantic Web search engine to see what types of values are being used with DC properties.
This kind of query is just the ticket for Swoogle. Well, almost. The current web-based interface supports a limited number of query types. Many more can be asked if you use SQL directly to query Swoogle’s underlying databases. We don’t want to provide a direct SQL query service over the main Swoogle database because it’s easy to ask a query that will take a looooooong time to answer and some could even crash the database server. We are planning to put up a second server with a copy of the database and we give Swoogle Power Users (SPUs) access to it.
We ran a simple SQL query to generate some initial data for Mikael showing fall of the DC properties. For each one, we list all of the ranges that values were drawn from and the number of separate documents and triples for each combination. For example
Property
|
Range
|
Documents
|
Triples
|
dc:creater | rdfs:Literal |
32
|
648
|
dc:creator | rdfs:Literal |
234655
|
2477665
|
dc:creator | wn:Person |
2714
|
1138250
|
dc:creator | cc:Agent |
4090
|
6359
|
dc:creator | foaf:Person |
2281
|
5969
|
dc:creator | foaf:Agent |
1723
|
3234
|
Notice that the first property in this partial table is an obvious typo. You can see the complete table as pdf file or as an excel spreadsheet.
[Tim Finin, UMBC ebiquity lab]
(Via Planet RDF.)
]]>"Ok, my first attempt at a round-up (in response to Philâs observation of Planetary damage). Thanks to the conference thereâs loads more here than thereâs likely to be subsequent weeks, although itâs still only a fairly random sample and some of the links here are to heaps of other resourcesâ¦
Incidentally, if anyoneâs got a list/links for SemWeb-related blogs that arenât on Planet RDF, Iâd be grateful for a pointer. PS. Ok, I forget⦠are there any blogs that arenât on Daveâs list yet..?
Quote of the week:
In the Semantic Web, it is not the Semantic which is new, it is the Web which is new.
- Chris Welty, IBM (lifted from TimBLâs slides)
I just noticed the article from Dan Zambonini âIs Web 2.0 killing the Semantic Web?â. From my perspective the article shows a misconception that people seems to have around the Semantic Web: the Semantic Web effort itself is not provide applications (like the Web 2.0 meme indicates) - it rather provides standards to interlink applications.
Blog post title of the week:
Alsoâ¦a new threat to Semantic Web developers has been discovered: typhoid!, and the key to the Webâs full potential isâ¦Tetris."
]]>Anyway, Marc's article is a very refreshing read because it provides a really good insight into the general landscape of a rapidly evolving Web alongside genuine appreciation of our broader timeless pursuit of "Openness".
To really help this document provide additional value have scrapped the content of the original post and dumped it below so that we can appreciate the value of the links embedded within the article (note: thanks to Virtuoso I only had to paste the content into my blog, the extraction to my Linkblog and Blog Summary Pages are simply features of my Virtuoso based Blog Engine):
]]>Breaking the Web Wide Open! (complete story)
Even the web giants like AOL, Google, MSN, and Yahoo need to observe these open standards, or they'll risk becoming the "walled gardens" of the new web and be coolio no more.
Editorial Note: Several months ago, AlwaysOn got a personal invitation from Yahoo founder Jerry Yang "to see and give us feedback on our new social media product, y!360." We were happy to oblige and dutifully showed up, joining a conference room full of hard-core bloggers and new, new media types. The geeks gave Yahoo 360 an overwhelming thumbs down, with comments like, "So the only services I can use within this new network are Yahoo services? What if I don't use Yahoo IM?" In essence, the Yahoo team was booed for being "closed web," and we heartily agreed. With Yahoo 360, Yahoo continues building its own "walled garden" to control its 135 million customersÂan accusation also hurled at AOL in the early 1990s, before AOL migrated its private network service onto the web. As the Economist recently noted, "Yahoo, in short, has old media plans for the new-media era."
The irony to our view here is, of course, that today's AO Network is also a "closed web." In the end, Mr. Yang's thoughtful invitation and our ensuing disappointment in his new service led to the assignment of this article. It also confirmed our existing plan to completely revamp the AO Network around open standards. To tie it all together, we recruited the chief architect of our new site, the notorious Marc Canter, to pen this piece. We look forward to our reader feedback.
Breaking the Web Wide Open!
By Marc Canter
For decades, "walled gardens" of proprietary standards and content have been the strategy of dominant players in mainframe computer software, wireless telecommunications services, and the World Wide WebÂit was their successful lock-in strategy of keeping their customers theirs. But like it or not, those walls are tumbling down. Open web standards are being adopted so widely, with such value and impact, that the web giantsÂAmazon, AOL, eBay, Google, Microsoft, and YahooÂare facing the difficult decision of opening up to what they don't control.
The online world is evolving into a new open web (sometimes called the Web 2.0), which is all about being personalized and customized for each user. Not only open source software, but open standards are becoming an essential component.
Many of the web giants have been using open source software for years. Most of them use at least parts of the LAMP (Linux, Apache, MySQL, Perl/Python/PHP) stack, even if they aren't well-known for giving back to the open source community. For these incumbents that grew big on proprietary web services, the methods, practices, and applications of open source software development are difficult to fully adopt. And the next open source movementsÂwhich will be as much about open standards as about codeÂwill be a lot harder for the incumbents to exploit.
While the incumbents use cheap open source software to run their back-ends systems, their business models largely depend on proprietary software and algorithms. But our view a new slew of open software, open protocols, and open standards will confront the incumbents with the classic Innovator's Dilemma. Should they adopt these tools and standards, painfully cannibalizing their existing revenue for a new unproven concept, or should they stick with their currently lucrative model with the risk that eventually a bunch of upstarts eat their lunch?
Credit should go to several of the web giants who have been making efforts to "open up." Google, Yahoo, eBay, and Amazon all have Open APIs (Application Programming Interfaces) built into their data and systems. Any software developer can access and use them for whatever creative purposes they wish. This means that the API provider becomes an open platform for everyone to use and build on top of. This notion has expanded like wildfire throughout the blogosphere, so nowadays, Open APIs are pretty much required.
Other incumbents also have open strategies. AOL has got the RSS religion, providing a feedreader and RSS search in order to escape the "walled garden of content" stigma. Apple now incorporates podcasts, the "personal radio shows" that are latest rage in audio narrowcasting, into iTunes. Even Microsoft is supporting open standards, for example by endorsing SIP (Session Initiation Protocol) for internet telephony and conferencing over Skype's proprietary format or one of its own devising.
But new open standards and protocols are in use, under construction, or being proposed every day, pushing the envelope of where we are right now. Many of these standards are coming from startup companies and small groups of developers, not from the giants. Together with the Open APIs, those new standards will contribute to a new, open infrastructure. Tens of thousands of developers will use and improve this open infrastructure to create new kinds of web-based applications and services, to offer web users a highly personalized online experience.
A Brief History of Openness
At this point, I have to admit that I am not just a passive observer, full-time journalist or "just some blogger"Âbut an active evangelist and developer of these standards. It's the vision of "open infrastructure" that's driving my company and the reason why I'm writing this article. This article will give you some of the background behind on these standards, and what the evolution of the next generation of open standards will look like.
Starting back in the 1980s, establishing a software standard was a key strategy for any software company. My former company, MacroMind (which became Macromedia), achieved this goal early on with Director. As Director evolved into Flash, the world saw that other companies besides Microsoft, Adobe, and Apple could establish true cross-platform, independent media standards.
Then Tim Berners-Lee and Marc Andreessen came along, and changed the rules of the software business and of entrepreneurialism. No matter how entrenched and "standardized" software was, the rug could still get pulled out from under it. Netscape did it to Microsoft, and then Microsoft did it back to Netscape. The web evolved, and lots of standards evolved with it. The leading open source standards (such as the LAMP stack) became widely used alternatives to proprietary closed-source offerings.
Open standards are more than just technology. Open standards mean sharing, empowering, and community support. Someone floats a new idea (or meme) and the community runs with it â with each person making their own contributions to the standard â evolving it without a moment's hesitation about "giving away their intellectual property."
One good example of this was Dave Sifry, who built the Technorati blog-tracking technology inspired by the Blogging Ecosystem, a weekend project by young hacker Phil Pearson. Dave liked what he saw and he ran with itÂturning Technorati into what it is today.
Dave Winer has contributed enormously to this area of open standards. He defined and personally created several open standards and protocolsÂsuch as RSS, OPML, and XML-RPC. Dave has also helped build the blogosphere through his enthusiasm and passion.
By 2003, hundreds of programmers were working on creating and establishing new standards for almost everything. The best of these new standards have evolved into compelling web services platforms â such as del.icio.us, Webjay, or Flickr. Some have even spun off formal standards â like XSPF (a standard for playlists) or instant messaging standard XMPP (also known as Jabber).
Today's Open APIs are complemented by standardized SchemasÂthe structure of the data itself and its associated meta-data. Take for example a podcasting feed. It consists of: a) the radio show itself, b) information on who is on the show, what the show is about and how long the show is (the meta-data) and also c) API calls to retrieve a show (a single feed item) and play it from a specified server.
The combination of Open APIs, standardized schemas for handling meta-data, and an industry which agrees on these standards are breaking the web wide open right now. So what new open standards should the web incumbentsÂand youÂbe watching? Keep an eye on the following developments:
Identity
Attention
Open Media
Microcontent Publishing
Open Social Networks
Tags
Pinging
Routing
Open Communications
Device Management and Control
1. Identity
Right now, you don't really control your own online identity. At the core of just about every online piece of software is a membership system. Some systems allow you to browse a site anonymouslyÂbut unless you register with the site you can't do things like search for an article, post a comment, buy something, or review it. The problem is that each and every site has its own membership system. So you constantly have to register with new systems, which cannot share dataÂeven you'd want them to. By establishing a "single sign-on" standard, disparate sites can allow users to freely move from site to site, and let them control the movement of their personal profile data, as well as any other data they've created.
With Passport, Microsoft unsuccessfully attempted to force its proprietary standard on the industry. Instead, a world is evolving where most people assume that users want to control their own data, whether that data is their profile, their blog posts and photos, or some collection of their past interactions, purchases, and recommendations. As long as users can control their digital identity, any kind of service or interaction can be layered on top of it.
Identity 2.0 is all about users controlling their own profile data and becoming their own agents. This way the users themselves, rather than other intermediaries, will profit from their ID info. Once developers start offering single sign-on to their users, and users have trusted places to store their dataÂwhich respect the limits and provide access controls over that data, users will be able to access personalized services which will understand and use their personal data.
Identity 2.0 may seem like some geeky, visionary future standard that isn't defined yet, but by putting each user's digital identity at the core of all their online experiences, Identity 2.0 is becoming the cornerstone of the new open web.
The Initiatives:
Right now, Identity 2.0 is under construction through various efforts from Microsoft (the "InfoCard" component built into the Vista operating system and its "Identity Metasystem"), Sxip Identity, Identity Commons, Liberty Alliance, LID (NetMesh's Lightweight ID), and SixApart's OpenID.
More Movers and Shakers:
Identity Commons and Kaliya Hamlin, Sxip Identity and Dick Hardt, the Identity Gang and Doc Searls, Microsoft's Kim Cameron, Craig Burton, Phil Windley, and Brad Fitzpatrick, to name a few.
2. Attention
How many readers know what their online attention is worth? If you don't, Google and Yahoo doÂthey make their living off our attention. They know what we're searching for, happily turn it into a keyword, and sell that keyword to advertisers. They make money off our attention. We don't.
Technorati and friends proposed an attention standard, Attention.xml, designed to "help you keep track of what you've read, what you're spending time on, and what you should be paying attention to." AttentionTrust is an effort by Steve Gillmor and Seth Goldstein to standardize on how captured end-user performance, browsing, and interest data are used.
Blogger Peter Caputa gives a good summary of AttentionTrust:"As we use the web, we reveal lots of information about ourselves by what we pay attention to. Imagine if all of that information could be stored in a nice neat little xml file. And when we travel around the web, we can optionally share it with websites or other people. We can make them pay for it, lease it ... we get to decide who has access to it, how long they have access to it, and what we want in return. And they have to tell us what they are going to do with our Attention data."
So when you give your attention to sites that adhere to the AttentionTrust, your attention rights (you own your attention, you can move your attention, you can pay attention and be paid for it, and you can see how your attention is used) are guaranteed. Attention data is crucial to the future of the open web, and Steve and Seth are making sure that no one entity or oligopoly controls it.
Movers and Shakers:
Steve Gillmor, Seth Goldstein, Dave Sifry and the other Attention.xml folks.
3. Open Media
Proprietary media standardsÂFlash, Windows Media, and QuickTime, to name a few Âhelped liven up the web. But they are proprietary standards that try to keep us locked in, and they weren't created from scratch to handle today's online content. That's why, for many of us, an Open Media standard has been a holy grail. Yahoo's new Media RSS standard brings us one step closer to achieving open media, as do Ogg Vorbis audio codecs, XSPF playlists, or MusicBrainz. And several sites offer digital creators not only a place to store their content, but also to sell it.
Media RSS (being developed by Yahoo with help from the community) extends RSS and combines it with "RSS enclosures" Âadds metadata to any media itemÂto create a comprehensive solution for media "narrowcasters." To gain acceptance for Media RSS, Yahoo knows it has to work with the community. As an active member of this community, I can tell you that we'll create Media RSS equivalents for rdf (an alternative subscription format) and Atom (yet another subscription format), so no one will be able to complain that Yahoo is picking sides in format wars.
When Yahoo announced the purchase of Flickr, Yahoo founder Jerry Yang insinuated that Yahoo is acquiring "open DNA" to turn Yahoo into an open standards player. Yahoo is showing what happens when you take a multi-billion dollar company and make openness one of its core valuesÂso Google, beware, even if Google does have more research fellows and Ph.D.s.
The open media landscape is far and wide, reaching from game machine hacks and mobile phone downloads to PC-driven bookmarklets, players, and editors, and it includes many other standardization efforts. XSPF is an open standard for playlists, and MusicBrainz is an alternative to the proprietary (and originally effectively stolen) database that Gracenote licenses.
Ourmedia.org is a community front-end to Brewster Kahle's Internet Archive. Brewster has promised free bandwidth and free storage forever to any content creators who choose to share their content via the Internet Archive. Ourmedia.org is providing an easy-to-use interface and community to get content in and out of the Internet Archive, giving ourmedia.org users the ability to share their media anywhere they wish, without being locked into a particular service or tool. Ourmedia plans to offer open APIs and an open media registry that interconnects other open media repositories into a DNS-like registry (just like the www domain system), so folks can browse and discover open content across many open media services. Systems like Brightcove and Odeo support the concept of an open registry, and hope to work with digital creators to sell their work to fulfill the financial aspect of the "Long Tail."
More Movers and Shakers:
Creative Commons, the Open Media Network, Jay Dedman, Ryanne Hodson, Michael Verdi, Eli Chapman, Kenyatta Cheese, Doug Kaye, Brad Horowitz, Lucas Gonze, Robert Kaye, Christopher Allen, Brewster Kahle, JD Lasica, and indeed, Marc Canter, among others.
4. Microcontent Publishing
Unstructured content is cheap to create, but hard to search through. Structured content is expensive to create, but easy to search. Microformats resolve the dilemma with simple structures that are cheap to use and easy to search.
The first kind of widely adopted microcontent is blogging. Every post is an encapsulated idea, addressable via a URL called a permalink. You can syndicate or subscribe to this microcontent using RSS or an RSS equivalent, and news or blog aggregators can then display these feeds in a convenient readable fashion. But a blog post is just a block of unstructured textânot a bad thing, but just a first step for microcontent. When it comes tostructured data, such as personal identity profiles, product reviews, or calendar-type event data, RSS was not designed to maintain the integrity of the structures.
Right now, blogging doesn't have the underlying structure necessary for full-fledged microcontent publishing. But that will change. Think of local information services (such as movie listings, event guides, or restaurant reviews) that any college kid can access and use in her weekend programming project to create new services and tools.
Today's blogging tools will evolve into microcontent publishing systems, and will help spread the notion of structured data across the blogosphere. New ways to store, represent and produce microcontent will create new standards, such as Structured Blogging and Microformats. Microformats differ from RSS feeds in that you can't subscribe to them. Instead, Microformats are embedded into webpages and discovered by search engines like Google or Technorati. Microformats are creating common definitions for "What is a review or event? What are the specific fields in the data structure?" They can also specify what we can do with all this information.OPML (Outline Processor Markup Language) is a hierarchical file format for storing microcontent and structured data. It was developed by Dave Winer of RSS and podcast fame.
Events are one popular type of microcontent. OpenEvents is already working to create shared databases of standardized events, which would get used by a new generation of event portalsâsuch as Eventful/EVDB, Upcoming.org, and WhizSpark. The idea of OpenEvents is that event-oriented systems and services can work together to establish shared events databases (and associated APIs) that any developer could then use to create and offer their own new service or application. OpenReviews is still in the conceptual stage, but it would make it possible to provide open alternatives to closed systems like Epinions, and establish a shared database of local and global reviews. Its shared open servers would be filled with all sorts of reviews for anyone to access.
Why is this important? Because I predict that in the future, 10 times more people will be writing reviews than maintaining their own blog. The list of possible microcontent standards goes on: OpenJobpostings, OpenRecipes, and even OpenLists. Microsoft recently revealed that it has been working on an important new kind of microcontent: Listsâso OpenLists will attempt to establish standards for the kind of lists we all use, such as lists of Links, lists of To Do Items, lists of People, Wish Lists, etc.
Movers and Shakers:
Tantek Ãelik and Kevin Marks of Technorati, Danny Ayers, Eric Meyer, Matt Mullenweg, Rohit Khare, Adam Rifkin, Arnaud Leene, Seb Paquet, Alf Eaton, Phil Pearson, Joe Reger, Bob Wyman among others.
5. Open Social Networks
I'll never forget the first time I met Jonathan Abrams, the founder of Friendster. He was arrogant and brash and he claimed he "owned"Â all his users, and that he was going to monetize them and make a fortune off them. This attitude robbed Friendster of its momentum, letting MySpace, Facebook, and other social networks take Friendster's place.
Jonathan's notion of social networks as a way to control users is typical of the Web 1.0 business model and its attitude towards users in general. Social networks have become one of the battlegrounds between old and new ways of thinking. Open standards for Social Networking will define those sides very clearly. Since meeting Jonathan, I have been working towards finding and establishing open standards for social networks. Instead of closed, centralized social networks with 10 million people in them, the goal is making it possible to have 10 million social networks that each have 10 people in them.
FOAF (which stands for Friend Of A Friend, and describes people and relationships in a way that computers can parse) is a schema to represent not only your personal profile's meta-data, but your social network as well. Thousands of researchers use the FOAF schema in their "Semantic Web" projects to connect people in all sorts of new ways. XFN is a microformat standard for representing your social network, while vCard (long familiar to users of contact manager programs like Outlook) is a microformat that contains your profile information. Microformats are baked into any xHTML webpage, which means thatany blog, social network page, or any webpage in general can "contain" your social network in itÂand be used byany compatible tool, service or application.
PeopleAggregator is an earlier project now being integrated into open content management framework Drupal. The PeopleAggregator APIs will make it possible to establish relationships, send messages, create or join groups, and post between different social networks. (Sneak preview: this technology will be available in the upcoming GoingOn Network.)
All of these open social networking standards mean that inter-connected social networks will form a mesh that will parallel the blogosphere. This vibrant, distributed, decentralized world will be driven by open standards: personalized online experiences are what the new open web will be all aboutÂand what could be more personalized than people's networks?
Movers and Shakers:
Eric Sigler, Joel De Gan, Chris Schmidt, Julian Bond, Paul Martino, Mary Hodder, Drummond Reed, Dan Brickley, Randy Farmer, and Kaliya Hamlin, to name a few.
6. Tags
Nowadays, no self-respecting tool or service can ship without tags. Tags are keywords or phrases attached to photos, blog posts, URLs, or even video clips. These user- and creator-generated tags are an open alternative to what used to be the domain of librarians and information scientists: categorizing information and content using taxonomies. Tags are instead creating "folksonomies."
The recently proposed OpenTags concept would be an open, community-owned version of the popular Technorati Tags service. It would aggregate the usage of tags across a wide range of services, sites, and content tools. In addition to Technorati's current tag features, OpenTags would let groups of people share their tags in "TagClouds." Open tagging is likely to include some of the open identity features discussed above, to create a tag system that is resilient to spam, and yet trustable across sites all over the web.
OpenTags owes a debt to earlier versions of shared tagging systems, which include Topic Exchange and something called the k-collectorÂa knowledge management tag aggregatorÂfrom Italian company eVectors.
Movers & Shakers:
Phil Pearson, Matt Mower , Paolo Valdemarin, and Mary Hodder and Drummond Reed again, among others.
7. Pinging
Websites used to be mostly static. Search engines that crawled (or "spidered") them every so often did a good enough job to show reasonably current versions of your cousin's homepage or even Time magazine's weekly headlines. But when blogging took off, it became hard for search engines to keep up. (Google has only just managed to offer blog-search functionality, despite buying Blogger back in early 2003.)
To know what was new in the blogosphere, users couldn't depend on services that spidered webpages once in a while. The solution: a way for blogs themselves to automatically notify blog-tracking sites that they'd been updated. Weblogs.com was the first blog "ping service": it displayed the name of a blog whenever that blog was updated. Pinging sites helped the blogosphere grow, and more tools, services, and portals started using pinging in new and different ways. Dozens of pinging services and sitesÂmost of which can't talk to each otherÂsprang up.
Matt Mullenweg (the creator of open source blogging software WordPress) decided that a one-stop service for pinging was needed. He created Ping-o-MaticÂwhich aggregates ping services and simplifies the pinging process for bloggers and tool developers. With Ping-o-Matic, any developer can alert all of the industry's blogging tools and tracking sites at once. This new kind of open standard, with shared infrastructure, is a critical to the scalability of Web 2.0 services.
As Matt said:There are a number of services designed specifically for tracking and connecting blogs. However it would be expensive for all the services to crawl all the blogs in the world all the time. By sending a small ping to each service you let them know you've updated so they can come check you out. They get the freshest data possible, you don't get a thousand robots spidering your site all the time. Everybody wins.
Movers and Shakers:
Matt Mullenweg, Jim Winstead, Dave Winer
8. Routing
Bloggers used to have to manually enter the links and content snippets of blog posts or news items they wanted to blog. Today, some RSS aggregators can send a specified post directly into an associated blogging tool: as bloggers browse through the feeds they subscribe to, they can easily specify and send any post they wish to "reblog" from their news aggregator or feed reader into their blogging tool. (This is usually referred to as "BlogThis.") As structured blogging comes into its own (see the section on Microcontent Publishing), it will be increasingly important to maintain the structural integrity of these pieces of microcontent when reblogging them.
Promising standard RedirectThis will combine a "BlogThis"-like capability while maintaining the integrity of the microcontent. RedirectThis will let bloggers and content developers attach a simple "PostThis" button to their posts. Clicking on that button will send that post to the reader/blogger's favorite blogging tool. This favorite tool is specified at the RedirectThis web service, where users register their blogging tool of choice. RedirectThis also helps maintain the integrity and structure of microcontentÂthen it's just up to the user to prefer a blogging tool that also attains that lofty goal of microcontent integrity.
OutputThis is another nascent web services standard, to let bloggers specify what "destinations" they'd like to have as options in their blogging tool. As new destinations are added to the service, more checkboxes would get added to their blogging toolÂallowing them to route their published microcontent to additional destinations.
Movers and Shakers:
Michael Migurski, Lucas Gonze
9. Open Communications
Likely, you've experienced the joys of finding friends on AIM or Yahoo Messenger, or the convenience of Skyping with someone overseas. Not that you're about to throw away your mobile phone or BlackBerry, but for many, also having access to Instant Messaging (IM) and Voice over IP (VoIP) is crucial.
IM and VoIP are mainstream technologies that already enjoy the benefits of open standards. Entire industries are bornÂright this secondÂbased around these open standards. Jabber has been an open IM technology for yearsÂin fact, as XMPP, it was officially dubbed a standard by the IETF. Although becoming an official IETF standard is usually the kiss of death, Jabber looks like it'll be around for a while, as entire generations of collaborative, work-group applications and services have been built on top of its messaging protocol. For VoIP, Skype is clearly the leading standard todayÂthough one could argue just how "open" it is (and defenders of the IETF's SIP standard often do). But it is free and user-friendly, so there won't be much argument from users about it being insufficiently open. Yet there may be a cloud on Skype's horizon: web behemoth Google recently released a beta of Google Talk, an IM client committed to open standards. It currently supports XMPP, and will support SIP for VoIP calls.
Movers and Shakers:
Jeremie Miller, Henning Schulzrinne, Jon Peterson, Jeff Pulver
10. Device Management and Control
To access online content, we're using more and more devices. BlackBerrys, iPods, Treos, you name it. As the web evolves, more and more different devices will have to communicate with each other to give us the content we want when and where we want it. No-one wants to be dependent on one vendor anymoreÂlike, say, SonyÂfor their laptop, phone, MP3 player, PDA, and digital camera, so that it all works together. We need fully interoperable devices, and the standards to make that work. And to fully make use of how content is moving online content and innovative web services, those standards need to be open.
MIDI (musical instrument digital interface), one of the very first open standards in music, connected disparate vendors' instruments, post-production equipment, and recording devices. But MIDI is limited, and MIDI II has been very slow to arrive. Now a new standard for controlling musical devices has emerged: OSC (Open SoundControl). This protocol is optimized for modern networking technology and inter-connects music, video and controller devices with "other multimedia devices." OSC is used by a wide range of developers, and is being taken up in the mainstream MIDI marketplace.
Another open-standards-based device management technology is ZigBee, for building wireless intelligence and network monitoring into all kinds of devices. ZigBee is supported by many networking, consumer electronics, and mobile device companies.
   · · · · · ·  Â
The Change to Openness
The rise of open source software and its "architecture of participation" are completely shaking up the old proprietary-web-services-and-standards approach. Sun MicrosystemsÂwhose proprietary Java standard helped define the Web 1.0Âis opening its Solaris OS and has even announced the apparent paradox of an open-source Digital Rights Management system.
Today's incumbents will have to adapt to the new openness of the Web 2.0. If they stick to their proprietary standards, code, and content, they'll become the new walled gardensÂplaces users visit briefly to retrieve data and content from enclosed data silos, but not where users "live." The incumbents' revenue models will have to change. Instead of "owning" their users, users will know they own themselves, and will expect a return on their valuable identity and attention. Instead of being locked into incompatible media formats, users will expect easy access to digital content across many platforms.
Yesterday's web giants and tomorrow's users will need to find a mutually beneficial new balanceÂbetween open and proprietary, developer and user, hierarchical and horizontal, owned and shared, and compatible and closed.
Marc Canter is an active evangelist and developer of open standards. Early in his career, Marc founded MacroMind, which became Macromedia. These days, he is CEO of Broadband Mechanics, a founding member of the Identity Gang and of ourmedia.org. Broadband Mechanics is currently developing the GoingOn Network (with the AlwaysOn Network), as well as an open platform for social networking called the PeopleAggregator.
A version of the above post appears in the Fall 2005 issue of AlwaysOn's quarterly print blogozine, and ran as a four-part series on the AlwaysOn Network website.(Via Marc's Voice.)
Here goes:
Blog Editing
I can use any editor that supports the following Blog Post APIs:
- Moveable Type
- Meta Weblog
- Blogger
Typically I use Virtuoso (which has an unreleased WYSIWYG blog post editor), Newzcrawler, ecto, Zempt, or w.bloggar for my posts. If a post is of interest to me, or relevant to our company or customers I tend to perform one of the following tasks:
- Generate a post using the "Blog This" feature of my blog editor
- Write a new post that was triggered by a previously read post etc.
Either way, the posts end up in our company wide blog server that is Virtuoso based (more about this below). The internal blog server automatically categorizes my blog posts, and automagically determines which posts to upstream to other public blogs that I author (e.g http://kidehen.typepad.com ) or co-author (e.g http://www.openlinksw.com/weblogs/uda and http://www.openlinksw.com/weblogs/virtuoso ). I write once and my posts are dispatched conditionally to multiple outlets.
RSS/Atom/RDF Aggregation & Reading
I discover, subscribe to, and view blog feeds using Newzcrawler (primarily), and from time to time for experimentation and evaluation purposes I use RSS Bandit, FeedDemon, and Bloglines. I am in the process of moving this activity over to Virtuoso completely due to the large number of feeds that I consume on a daily basis (scalability is a bit of a problem with current aggregators).
Blog Publishing
When you visit my blog you are experiencing the soon to be released Virtuoso Blog Publishing engine first hand, which is how WebDAV, SQLX, XQuery/XPath, and Free Text etc. come into the mix.
Each time I create a post internally, or subscribe to an external feed, the data ends up in Virtuoso's SQL Engine (this is how we handle some of the obvious scalability challenges associated with large subscription counts). This engine is SQL2000N based, which implies that it can transform SQL to XML on the fly using recent extensions to SQL in the form of SQLX (prior to the emergence of this standard we used the FOR XML SQL syntax extensions for the same result). It also has its own in-built XSLT processor (DB Engine resident), and validating XML parser (with support for XML Schema). Thus, my RSS/RDF/Atom archives, FOAF, BlogRoll, OPML, and OCS blog syndication gems are all live examples of SQLX documents that leverage Virtuoso's WebDAV engine for exposure to Blog Clients.
Blog Search
When you search for blog posts using the basic or advanced search features of my blog, you end up interacting with one of the following methods of querying data hosted in Virtuoso: Free Text Search, XPath, or XQuery. The result sets produced by the search feature uses SQLX to produce subscription gems (RSS/Atom/RDF/OpenSearch) and URIs that enable dynamic tracking of my posts using your search keywords.
BTW - the http://www.openlinksw.com/blog/~kidehen blog home page exists as a result of Virtuoso's Virtual Domain / Multi-Homing Web Server functionality. The entire site resides in an Object Relational DBMS, and I can take my DB file across Windows, Solaris, Linux, Mac OS X, FreeBSD, AIX, HP-UX, IRIX, and SCO UnixWare without missing a single beat! All I have to do is instantiate my Virtuoso server and my weblog is live.
]]>The Internet Archive initiative is building up an amazing collection of content that includes this "must watch" movie about the somewhat forgotten hypercard development environment.
As I watched the hypercard movie I obtained clear reassurance that my vision of Web 2.0 as critical infrastructure for a future Semantic Web isn't unfounded. The solution building methodology espoused by hypercard is exactly how Semantic Web applications will be built, and this will be done by orchestrating the componentary of Web 2.0.
When watching this clip make the following mental adjustments:
Web 2.0 is a reflection of the web taking its first major step out of the technology stone age (certainly the case relative to the hypercard movie and "pre web" application development in general).
]]>
Stickiness is a defining characteristic of Web 1.0 . It's all about eyeballs (site visitors) which implied ultimately that all early Web business models ended up down the advertising route.
I always felt that Web 1.0 was akin to having a crowd of people at your reception area seeking a look at your corporate brochures, and then someone realizes that you could start selling AD space in these brochures in response to the growing crowd size and frequency of congregation. The long-term folly of this approach is now obvious, as many organizations forgot their core value propositions (expressed via product offerings) in the process and wandered blindly down the AD model cul-de-sac, and we all know what happened down there..
Web 2.0 is taking shape (the inflection is in its latter stages), and the defining characteristics of Web 2.0 are:
When you factor in all of the above, the real question is whether Google and others are equipped to exploit Web 2.0? To some degree, is the best answer at the current time as they have commenced the transition from "content only" web site to web platform (via the many Web Services initiatives that expose SOAP and REST interfaces to various services), but there is much more to this journey, and that's the devil in the "competitive landscape details".
From my obviously biased perspective, I think Virtuoso and Yukon+WinFS provide the server models for driving Web 2.0 points of presence (single server instances that implement multiple protocols). Thus, if Google, Yahoo! et al. aren't exploiting these or similar products, then they will be vulnerable over the long term to the competitve challenges that a Web 2.0 landscape will present.
]]>The thing that most surprised me today in the SoftEdge panel on Social Software was the reaction to RSS. I should be clear that I am an RSS true believer. It seems to me that metadata as a byproduct of social software engines (be it blogging or social networking or whatever) is not only enviable, it is inevitable. RSS and FOAF and other yet-to-be-determined social software data protocols will become standards because it simply makes good sense for them to be standardized. Anyone paying attention to the unbelievable development and adoption curve of wireless can appreciate the immense value driven by standards -- and, in particular, standards that are truly standard. So it came as a bit of a shock to me that when I questioned the panelists on the implications of RSS and the Semantic Web, they were less sold on the inevitability of it all.
When asked the question of whether the proliferation of RSS and FOAF might make it possible for reader technology to be the next killer application in knowledge management, I got very strong reactions from both Reid Hoffman and Meg Hourihan. Reid stated that he did not believe that RSS was sufficiently robust to provide significant value an any level. Meg followed up with a general indictment of the semantic web, which she views merely as a geek utopia. I will admit that I'm a fan of Candide (particularly at the hands of Bernstein), but I hardly view myself as Panglos. One need look no further than, for example, the tools that Oddpost has incorporated into its web email client to allow an integrated email and blog experience. Better yet, through a relatively simple web service, Oddpost can deliver an RSS feed of a particular Google News search so that you can keep track of keywords that are of interest to you without having to visit Google repeatedly to find out if your company or candidate or favorite band has been mentioned in today's news. The same is true of watch lists on Technorati. Rather than periodically check to see if someone has linked to your blog, Technorati will do the work for you and deliver the info to your inbox only when there is information to be delivered. These examples are just the tip of the iceberg but the demonstrate the nascent power of RSS and related standards. I'll have to wait for another panel to have that argument with Reid and Meg.
Q: Amazon.com now runs sites and on-line operations for retailers such as Target and Toys 'R' Us. What's the future for that services business? A: It's a rapidly growing part of our business. And that goes from [large] companies that are customers of that all the way down to individuals using our Web services to tap into the fundamental platform that is Amazon.com. They can build their own applications very effectively. It's almost closer to an ecosystem. Q: So Amazon is becoming a kind of software platform a bit like Microsoft (MSFT )? A: People are building stuff that surprises us. That's what's so interesting about this. We've built this big base of technology to serve ourselves, and now we're opening it up and letting people access it. They're taking these fundamental pieces and building completely new things that not only would we have never gotten around to but in some cases maybe never even have thought of. There are thousands of developers who are building applications using Amazon Web services. The sky's the limit on their creativity. Q: What arises from all those efforts? A: People will be able to build very powerful applications by hooking together a whole bunch of Web services from a whole bunch of different companies. Q: What benefit is Amazon.com getting from this? A: It's too early to say. It's certainly not a major source of revenue for us. But when people use our Web services, they give us credit for that. That turns out to be very helpful.A few years ago the race was on to simply have a Web Site, then this requirement evolved into a requirement for a database driven site. Today we are seeing the final stages of the Web 2.0 inflection which will inevitably change the focus toward the need for a Point of Presence on the Web for exposing or invoking Web Services and/or Syndicating or Subscribing to XML based content. ]]>
Back to the article. This is an essay by George Gregorio who is so into auto discovery that he deliberately stuffed his contact details in an FOAF file that you need to auto discover using a FOAF auto discovery aware client (e.g. FOAFnaut or the human brain for instance :-) ) . Anyway, he is an excerpt from his essay (a very good read).
Over a month ago Paul Ford published a great essay entitled How Google beat Amazon and Ebay to the Semantic Web. After reading it the first time I thought it was a great introduction to the Semantic Web, an idea I had been trying to wrap my head around even since encountering RDF as it is baked into RSS 1.0. I had seen the light and bought into the promise of the Semantic Web.
Time passes...
With Dave Winer's floating of the idea of RSS 2.0 discussions ensue about the RDF in RSS 1.0. After spending some time badgering poor Bill Kearney for a concrete benefit of having RDF in RSS 1.0 and not getting a really satisfactory answer I went back and read Paul Ford's essay again. I wanted to get that old religious feeling back again. It didn't work. The magic was gone.
]]>
In the year 2000 the question of the shape and form of XML data was unclear to many, and reading the article below basically took me back in time to when we released Virtuoso 2.0 (we are now at release 3.0 commercially with a 3.2 beta dropping any minute).
RSS is a great XML application, and it does a great job of demonstrating how XML --the new data access foundation layer-- will galvanize the next generation Web (I refer to this as Web 2.0.).
RSS: INJAN (It's not just about news)
RSS is not just about news, according to Ian Davis on rss-dev.
He presents a nice list of alternatives, which I reproduce here (and to which I�d add, of course, bibliography management)
- Sitemaps: one of the S�s in RSS stands for summary. A sitemap is a summary of the content on a site, the items are pages or content areas. This is clearly a non-chronological ordering of items. Is a hierarchy of RSS sitemaps implied here � how would the linking between them work? How hard would it be to hack a web browser to pick up the RSS sitemap and display it in a sidebar when you visit the site?
- Small ads: also known as classifieds. These expire so there�s some kind of dynamic going on here but the ordering of items isn�t necessarily chronological. How to describe the location of the seller, or the condition of the item or even the price. Not every ad is selling something � perhaps it�s to rent out a room.
- Personals: similar model to the small ads. No prices though (I hope). Comes with a ready made vocabulary of terms that could be converted to an RDF schema. Probably should do that just for the hell of it anyway � gsoh
- Weather reports: how about a week�s worth of weather in an RSS channel. If an item is dated in the future, should an aggregator display it before time? Alternate representations include maps of temperature and pressure etc.
- Auctions: again, related to small ads, but these are much more time limited since there is a hard cutoff after which the auction is closed. The sequence of bids could be interesting � would it make sense to thread them like a discussion so you can see the tactics?
- TV listings: this is definitely chronological but with a twist � the items have durations. They also have other metadata such as cast lists, classification ratings, widescreen, stereo, program type. Some types have additional information such as director and production year.
- Top ten listings: top ten singles, books, dvds, richest people, ugliest, rear of the year etc. Not chronological, but has definate order. May update from day to day or even more often.
- Sales reporting: imagine if every department of a company reported their sales figures via RSS. Then the divisions aggregate the departmental figures and republish to the regional offices, who aggregate and add value up the chain. The chairman of the company subscribes to one super-aggregate feed.
- Membership lists / buddy lists: could I publish my buddy list from Jabber or other instant messengers? Maybe as an interchange format or perhaps could be used to look for shared contacts. Lots of potential overlap with FOAF here.
- Mailing lists: or in fact any messaging system such as usenet. There are some efforts at doing this already (e.g. yahoogroups) but we need more information � threads; references; headers; links into archives.
- Price lists / inventory: the items here are products or services. No particular ordering but it�d be nice to be able to subscribe to a catalog of products and prices from a company. The aggregator should be able to pick out price rises or bargains given enough history.
Thus, if we can comprehend RSS (the blog article below does a great job) we should be able to see the fundamental challenges that are before any organization seeking to exploit the potential of the imminent Web 2.0 inflection; how will you cost-effectively create XML data from existing data sources? Without upgrading or switching database engines, operating systems, programming languages? Put differently how can you exploit this phenomenon without losing your ever dwindling technology choices (believe me choices are dwindling fast but most are oblivious to this fact).
Â
xmlrsssyndication]]>