A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).
SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.
Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.
Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.
What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:
Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:
Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.
Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:
The problems typically take the following form:
To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.
Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.
Many options allow you to easily and quickly generate RDF data from other data sources:
Install the Faceted Browser VAD package (fct_dav.vad
) which delivers the following:
Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --
http://<cname>[:<port>]/describe/?uri=<entity-uri>
<cname>[:<port>]
gets replaced by the host and port of your Virtuoso instance<entity-uri>
gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.
Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).
It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.
At the end of the marathon session, it was clear to me that a blog post was required for future reference, at the very least :-)
"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.
Information makes the world tick!
Information doesn't exist without data to contextualize.
Information is inaccessible without a projection (presentation) medium.
All information (without exception, when produced by humans) is subjective. Thus, to truly maximize the innate heterogeneity of collective human intelligence, loose coupling of our information and associated data sources is imperative.
Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources bearing structured representations of data object descriptions. Remember, you have a single Identifier abstraction (generic HTTP URI) that embodies: Data Object Name and Data Representation Location (aka URL).
A structured representation of data exists when an Entity (Datum), its Attributes, and its Attribute Values are clearly discernible. In the case of a Linked Data Object, structured descriptions take the form of a hypermedia based Entity-Attribute-Value (EAV) graph pictorial -- where each Entity, its Attributes, and its Attribute Values (optionally) are identified using Generic HTTP URIs.
Examples of structured data representation formats (content types) associated with Linked Data Objects include:
You markup resources by expressing distinct entity-attribute-value statements (basically these a 3-tuple records) using a variety of notations:
You can achieve this task using any of the following approaches:
Our data access middleware heritage (which spans 16+ years) has enabled us to assemble a rich portfolio of coherently integrated products that enable cost-effective evaluation and utilization of Linked Data, without writing a single line of code, or exposing you to the hidden, but extensive admin and configuration costs. Post installation, the benefits of Linked Data simply materialize (along the lines described above).
Our main Linked Data oriented products include:
Anyway, Socialtext and Mike 2.0 (they aren't identical and juxtaposition isn't seeking to imply this) provide nice demonstrations of socially enhanced collaboration for individuals and/or enterprises is all about:
As is typically the case in this emerging realm, the critical issue of discrete "identifiers" (record keys in sense) for data items, data containers, and data creators (individuals and groups) is overlooked albeit unintentionally.
Rather than using platform constrained identifiers such as:
It enables you to leverage the platform independence of HTTP scheme Identifiers (Generic URIs) such that Identifiers for:
simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked Data Objects endowed with High SDQ (Serendipitious Discovery Quotient). For example my Personal WebID is all anyone needs to know if they want to explore:
Even when you reach a point of equilibrium where: your daily activities trigger orchestratestration of CRUD (Create, Read, Update, Delete) operations against Linked Data Objects within your socially enhanced collaboration network, you still have to deal with the thorny issues of security, that includes the following:
FOAF+SSL, an application of HTTP based Linked Data, enables you to enhance your Personal HTTP scheme based Identifer (or WebID) via the following steps (peformed by a FOAF+SSL compliant platform):
Contrary to conventional experiences with all things PKI (Public Key Infrastructure) related, FOAF+SSL compliant platforms typically handle the PKI issues as part of the protocol implementation; thereby protecting you from any administrative tedium without compromising security.
Understanding how new technology innovations address long standing problems, or understanding how new solutions inadvertently fail to address old problems, provides time tested mechanisms for product selection and value proposition comprehension that ultimately save scarce resources such as time and money.
If you want to understand real world problem solution #1 with regards to HTTP based Linked Data look no further than the issues of secure, socially aware, and platform independent identifiers for data objects, that build bridges across erstwhile data silos.
If you want to cost-effectively experience what I've outlined in this post, take a look at OpenLink Data Spaces (ODS) which is a distributed collaboration engine (enterprise of individual) built around the Virtuoso database engines. It simply enhances existing collaboration tools via the following capabilities:
Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items (if missing)
Since the beginning of the modern IT era, each period of innovation has inadvertently introduced its fair share of Data Silos. The driving force behind this anomaly remains an overemphasis on the role of applications when selecting problem solutions. Unfortunately, most solution selecting decision makers remain oblivious to the fact that most applications are architecturally monolithic; i.e., they fail to separate the following five layers that are critical to all solutions:
The rise of the Internet, and its exponentially-growing user-friendly enclave known as the World Wide Web, is bringing the intrinsic costs of the monolithic application architecture anomaly to bear -- in manners unanticipated by many. For example, the emergence of network-oriented solutions across the realms of Enterprise 2.0-based Collaboration and Web 2.0-based Software-as-a-Service (SaaS), combined with the overarching influence of Social Media, are producing more heterogeneously-structured and disparately-located data sources than people can effectively process.
As is often the case, a variety of problem and product monikers have emerged for the data access and integration challenges outlined above. Contemporary examples include Enterprise Information Integration, Master Data Management, and Data Virtualization. Labeling aside, the fundamental issues of the unresolved Data Integration challenge boil down to the following:
Effectively solving today's data integration challenges requires a move away from monolithic application architecture to loosely-coupled, network-centric application architectures. Basically, we need a ubiquitous network-centric application protocol that lends itself to loosely-coupled across-the-wire orchestration of data interactions. In short, this will be what revitalizes the art of application development and deployment.
The World Wide Web is built around a network application protocol called HTTP. This protocol intrinsically separates the five layers listed earlier, thereby enabling:
A uniquely designed to address today's escalating Data Access and Integration challenges without compromising performance, security, or platform independence. At its core lies an unrivaled commitment to industry standards combined with unique technology innovation that transcends erstwhile distinct realms such as:
When Virtuoso is installed and running, HTTP-based Data Objects are automatically created as a by-product of its powerful data virtualization, transcending data sources and data representation formats. The benefits of such power extend across profiles such as:
Bottom line, Virtuoso delivers unrivaled flexibility and scalability, without compromising performance or security.
Â
]]>In this post I provide a brief re-introduction to this essential aspect of Virtuoso.
This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).
In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.
The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!
You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).
You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).
It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.
As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.
In a nutshell, the AWS Cloud infrastructure simplifies the process of generating Federated presence on the Internet and/or World Wide Web. Remember, centralized networking models always end up creating data silos, in some context, ultimately! :-)
]]>Enjoy!
]]>Enjoy!
]]>Â | Web 1.0 | Web 2.0 | Web 3.0 |
Simple Definition | Interactive / Visual Web | Programmable Web | Linked Data Web |
Unit of Presence | Web Page | Web Service Endpoint | Data Space (named structured data enclave) |
Unit of Value Exchange | Page URL | Endpoint URL for API | Resource / Entity / Object URI |
Data Granularity | Low (HTML) | Medium (XML) | High (RDF) |
Defining Services | Search | Community (Blogs to Social Networks) | Find |
Participation Quotient | Low | Medium | High |
Serendipitous Discovery Quotient | Low | Medium | High |
Data Referencability Quotient | Low (Documents) | Medium (Documents) | High (Documents and their constituent Data) |
Subjectivity Quotient | High | Medium (from A-list bloggers to select source and partner lists) | Low (everything is discovered via URIs) |
Transclusence | Low | Medium (Code driven Mashups) | HIgh (Data driven Meshups) |
What You See Is What You Prefer (WYSIWYP) | Low | Medium | High (negotiated representation of resource descriptions) |
Open Data Access (Data Accessibility) | Low | Medium (Silos) | High (no Silos) |
Identity Issues Handling | Low | Medium (OpenID) | High (FOAF+SSL) |
Solution Deployment Model | Centralized | Centralized with sprinklings of Federation | Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia) |
Data Model Orientation | Logical (Tree based DOM) | Logical (Tree based XML) | Conceptual (Graph based RDF) |
User Interface Issues | Dynamically generated static interfaces | Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath) | Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF |
Data Querying | Full Text Search | Full Text Search | Full Text Search + Structured Graph Pattern Query Language (SPARQL) |
What Each Delivers | Democratized Publishing | Democratized Journalism & Commentary (Citizen Journalists & Commentators) | Democratized Analysis (Citizen Data Analysts) |
Star Wars Edition Analogy | Star Wars (original fight for decentralization via rebellion) | Empire Strikes Back (centralization and data silos make comeback) | Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation") |
Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)
Robin:
Web 3.0 is fundamentally about the World Wid Web becoming a structured database equipped with a formal data model (RDF which is a moniker for Entity-Attribute-Value with Classes & Relationships based Graph Model), query language, and a protocol for handling divrerse data representational requirements via negotiation
.Web 3.0 is about a Web that facilitates serendipitous discovery of relevant things; thereby making serendipitous discovery quotient (SDQ), rather than search engine optimization (SEO), the critical success factor that drives how resources get published on the Web.
Personally, I believe we are on the cusp of a major industry inflection re. how we interact with data hosted in computing spaces. In a nutshell, the conceptual model interaction based on real-world entities such as people, places, and other things (including abstract subject matter) will usurp traditional logical model interaction based on rows and columns of typed and/or untyped literal values exemplified by relational data access and management systems.
Labels such as "Web 3.0", "Linked Data", and "Semantic Web", are simply about the aforementioned model transition playing out on the World Wide Web and across private Linked Data Webs such as Intranets & Extranets, as exemplified emergence of the "Master Data Management" label/buzzword.
As was the case with Web Services re. Web 2.0, there is a critical piece of infrastructure driving the evolution in question, and in this case it comes down to the evolution of Hyperlinking.
We now have a new and complimentary variant of Hyperlinking commonly referred to as "Hyperdata" that now sits alongside "Hypertext". Hyperdata when used in conjunction with HTTP based URIs as Data Source Names (or Identifiers), delivers a potent and granular data access mechanism scoped down to the datum (object or record) level; which is much different from the document (record or entity container) level linkage that Hypertext accords.
In addition, the incorporation of HTTP into this new and enhanced granular Data Source Naming mechanism also addresses past challenges relating to separation of data, data representation, and data transmission protocols -- remember XDR woes familiar to all sockets level programmers -- courtesy of in-built content negotiation. Hence, via a simple HTTP GET --against a Data Source Name exposed by a Hyperdata link -- I can negotiate (from client or server sides) the exact representation of the description (entity-attribute-value graph) of an Entity / Data Object / Resource, dispatched by a data server.
For example, this is how a description of entity "Me" ends up being available in (X)HTML or RDF document representations (as you will observe when you click on that link to my Personal URI).
The foundation of what I describe above comes from:
Some live examples from DBpedia:
A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.
From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:
From a Middleware perspective it provides:
From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering
From the general System Administrator's perspective it provides:
Higher level user oriented offerings include:
For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:
Like Apache, Virtuoso is a bona-fide Web Application Server for PHP based applications. Unlike Apache, Virtuoso is also the following:
As result of the above, when you deploy a PHP application using Virtuoso, you inherit the following benefits:
As indicated in prior posts, producing RDF Linked Data from the existing Web, where a lot of content is deployed by PHP based content managers, should simply come down to RDF Views over the SQL Schemas and deployment / publishing of the RDF Views in RDF Linked data form. In a nutshell, this is what Virtuoso delivers via its PHP runtime hosting and pre packaged VADs (Virtuoso Application Distribution packages), for popular PHP based applications such as: phpBB3, Drupal, WordPress, and MediaWiki.
In addition, to the RDF Linked Data deployment, we've also taken the traditional LAMP installation tedium out of the typical PHP application deployment process. For instance, you don't have to rebuild PHP 3.5 (32 or 64 Bit) on Windows, Mac OS X, or Linux to get going, simply install Virtuoso, and then select a VAD package for the relevant application and you're set. If the application of choice isn't pre packaged by us, simply install as you would when using Apache, which comes dow to situating the PHP files in your Web structure under the Web Application's root directory.
At the current time, I've only provided links to ZIP files containing the Virtuoso installation "silent movies". This approach is a short-term solution to some of my current movie publishing challenges re. YouTube and Vimeo -- where the compressed output hasn't been of acceptable visual quality. Once resolved, I will publish much more "Multimedia Web" friendly movies :-)
]]>Typically, Orri's post are targeted at the hard core RDF and SQL DBMS audiences, but in this particular post, he shoots straight at the business community revealing "Opportunity Cost" containment as the invisible driver behind the business aspects of any market inflection.
Remember, the Web isn't ubiquitous because its users mastered the mechanics and virtues of HTML and/or HTTP. Web ubiquity is a function of the opportunity cost of not being on the Web, courtesy of the network effects of hyperlinked documents -- i.e., the instant gratification of traversing documents on the Web via a single click action. In similar fashion, the Linked Data Web's ubiquity will simply come down to the opportunity cost of not being "inside the Web", courtesy of the network effects of hyperlinked entities (documents, people, music, books, and other "Things").
Here are some excerpts from Orri's post:
Every time there is a major shift in technology, this shift needs to be motivated by addressing a new class of problem. This means doing something that could not be done before. The last time this happened was when the relational database became the dominant IT technology. At that time, the questions involved putting the enterprise in the database and building a cluster of line of business applications around the database. The argument for the RDBMS was that you did not have to constrain the set of queries that might later be made, when designing the database. In other words, it was making things more ad hoc. This was opposed then on grounds of being less efficient than the hierarchical and network databases which the relational eventually replaced. Today, the point of the Data Web is that you do not have to constrain what your data can join or integrate with, when you design your database. The counter-argument is that this is slow and geeky and not scalable. See the similarity? A difference is that we are not specifically aiming at replacing the RDBMS. In fact, if you know exactly what you will query and have a well defined workload, a relational representation optimized for the workload will give you about 10x the performance of the equivalent RDF warehouse. OLTP remains a relational-only domain. However, when we are talking about doing queries and analytics against the Web, or even against more than a handful of relational systems, the things which make RDBMS good become problematic.
If we think about Web 1.0 as a period where the distinguishing noun was: "Author", and Web 2.0 the noun: "Journalist", we should be able to see that what comes next is the noun: "Analyst". This new generation analyst would be equipped with de-referencable Web Identity courtesy of their Person Entity URI. The analyst's URI would also be the critical component of Web based low cost attribution ecosystem; one that ultimately turns the URI into the analyst's brand emblem / imprint.
You will control your data in the Web 3.0 realm. If somehow this remains somewhat incomprehensible and nebulous (as is typical in this emerging realm) then simply think about this as: The Magic of You!
Remember, "You" was the Times person of the year as an acknowledgement of the Web 2.0 phenomenon, and maybe this time next year it would simply be the "Magic of Being You" that's the person of the year :-)
Web 3.0 brings databasing to the Web (as a feature). The single most important action item at this stage is the act of creating a record for yourself, in this new distributed database held together by an HTTP based Network (e.g., the World Wide Web).
From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).
Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)
Understanding potential Linked Data Web business models, relative to other Web based market segments, is best pursued via a BCG Matrix diagram, such as the one I've constructed below:
To conclude, the Linked Data Web's market opportunities are all about the evolution of the Web into a powerful substrate that offers a unique intersection of "Link Density" and "Relevance", exploitable across horizontal and vertical market segments to solutions providers. Put differently, SDQ is how you take "The Ad" out of "Advertising" when matching Web users to relevant things :-)
]]>If your Web presence goes beyond (X)HTML pages, via the addition of REST or SOAP based Web Services, then you re participating in Web usage dimension 2.0.
If you Web presence includes all of the above, with the addition of structured data interlinked with structured data across other points of presence on the Web, then you are participating in Web usage dimension 3.0 i.e., "Linked Data Web" or "Web of Data" or "Data Web".
BTW - If you've already done all of the above, and you have started building intelligent agents that exploit the aforementioned structured interlinked data substrate, then you are already in Web usage dimension 4.0.
A while back I watched Kevin Kelly's 5,000 days presentation at TED. During the presentation, I kept on scratching my head, wondering why phrases like "Linked Data", "Semantic Web", "Web of Data", "Data Web" where so unnaturally disconnected from his session narrative.
Yesterday I watched IMINDI's TechCrunch 50 presentation, and once again I saw the aforementioned pattern repeat itself. This time around, the poor founders of this "Linked Data Web" oriented company (which is what they are in reality) took a totally undeserved pasting from a bunch of panelist incapable of seeing beyond today (Web 2.0) and yesterday (initial Web bootstrap).
Anyway, thanks to the Web, this post will make a small contribution towards re-connecting the missing phrases to these "Linked Data Web" presentations.
]]>Ubiquity from Mozilla Labs, provides an alternative entry point for experiencing the "Controller" aspect of the Web's natural compatibility with the MVC development pattern. As I've noted (in various posts) Web Services, as practiced by the REST oriented Web 2.0 community or SOAP oriented SOA community within the enterprise, is fundamentally about the ("Controller" aspect of MVC.
Ubiquity provides a commandline interface for direct invocation of Web Services. For instance, in our case, we can expose the Virtuoso's in-built RDF Middleware ("Sponger") and Linked Data deployment services via a single command of the form: describe-resource <url>
To experience this neat addition to Firefox you need to do the following:
Enjoy!
]]>Jana: What are the benefits you see to the business community in adopting semantic technology?
Me: Exposure, exploitation, of untapped treasure trove of interlinked data, information, and knowledge across disparate IT infrastructure via conceptual entry points (Entity IDs / URIs / Data Source Names) that refer to as "Context Lenses".
Jana: Do you think these benefits are great enough for businesses to adopt the changes?
Me: Yes, infrastructural heterogeneity is a fact of corporate life (growth, mergers, acquisitions etc). Any technology that addresses these challenges is extremely important and valuable. Put differently, the opportunity costs associated with IT infrastructural heterogeneity remains high!
Jana: How large do you think this impact will actually be?
Me: Huge, enterprise have been aware of their data, information, and knowledge treasure troves etc. for eons. Tapping into these via a materialization of the "information at your fingertips" vision is something they've simply been waiting to pursue without any platform lock-in, for as long as I've been in this industry.
Jana: Iâve heard, from contacts in the Bay Area, that they are skeptical of how large this impact of semantic technology will actually be on the web itself, but that the best uses of the technology are for fields such as medical information, or as you mentioned, geo-spatial data.
Me: Unfortunately, those people aren't connecting the Semantic Web and open access to heterogeneous data sources, or the intrinsic value of holistic exploration location of entity based data networks (aka Linked Data).
Jana: Are semantic technologies going to be part of the web because of people championing the cause or because it is actually a necessary step?
Me: Linked Data technology on the Web is a vital extension of the current Web. Semantic Technology without the "Web" component, or what I refer to as "Semantics Inside only" solutions, simply offer little or no value as Web enhancements based on their incongruence with the essence of the Web i.e., "Open Linkage" and no Silos! A nice looking Silo is still a Silo.
Jana: In the early days of the web, there was an explosion of new websites, due to the ease of learning HTML, from a business to a person to some crackpot talking about aliens. Even today, CSS and XHTML are not so difficult to learn that a determined person canât learn them from W3C or other tutorials easily. If OWL becomes the norm for websites, what do you think the effects will be on the web? Do you think it is easy enough to learn that it will be readily adopted as part of the standard toolkit for web developers for businesses?
Me: Correction, learning HTML had nothing to do with the Web's success. The value proposition of the Web simply reached critical mass and you simply couldn't afford to not be part of it. The easiest route to joining the Web juggernaut was a Web Page hosted on a Web Site. The question right now is: what's the equivalent driver for the Linked Data Web bearing in mind the initial Web bootstrap. My answer is simply this: Open Data Access i.e., getting beyond the data silos that have inadvertently emerged from Web 2.0.
Jana: Following the same theme, do you think this will lead to an internet full of corporate-controlled websites, with sites only written by developers rather than individuals?
Me: Not at all, we will have an Internet owned by it's participants i.e., You and the agents that work on your behalf.
Jana: So, you are imagining technologies such as Drupal or Wordpress, that allow users to manage sites without a great deal of knowledge of the nuts and bolts of current web technologies?
Me: Not at all! I envisage simple forms that provide conduits to powerful meshes of interlinked data spaces associated with Web users.
Jana: Given all of the buzz, and my own familiarity with ontology, I am just very curious if the semantic web is truly necessary?
Me:This question is no different than saying: I hear the Web is becoming a Database, and I wonder if a Data Dictionary is necessary, or even if access to structured data is necessary. It's also akin to saying: I accept "Search" as my only mechanism for Web interaction even though in reality, I really want to be able to "Find" and "Process" relevant things at a quicker rate than I do today, relative to the amount of information, and information processing time, at my disposal.
Jana: Will it be worth it to most people to go away from the web in its current form, with keyword searches on sites like Google, to a richer and more interconnected internet with potentially better search technology?
Me: As stated above, we need to add "Find" to the portfolio of functions we seek to perform against the Web. "Finding" and "Searching" are mutually inclusive pursuits at different ends of an activity spectrum.
Jana: For our more technical readers, I have a few additional questions: If no standardization comes about for mapping relational databases to domain ontologies, how do you see that as influencing the decisions about adoption of semantic technology by businesses? After all, the success of technology often lives or dies on its ease of adoption.
Me: Standardization of RDBMS to RDF Mapping is not the critical success factor here (of course it would be nice). As stated earlier, the issue of data integration that arises from IT infrastructural heterogeneity has been with decision makers in the enterprise for ever. The problem is now seeping into the broader consumer realm via Web ubiquity. The mistakes made in the enterprise realm are now playing out in the consumer Web realm. In both realms the critical success factors are:
CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).
CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.
CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.
CrunchBase: Can you put into laymanâs terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.
CrunchBase: On your website you wrote that âRDF and SPARQL as productivity boosters in everyday web developmentâ. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).
Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.
Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:
]]>My contribution to the developing discourse takes the form of a Q&A session. I've taken the questions posed and provided answers that express my particular points of view:
Q: Is the desktop of the future going to just be a web-hosted version of the same old-fashioned desktop metaphors we have today?
A: No, it's going to be a more Web Architecture aware and compliant variant exposed by appropriate metaphors.
Q: The desktop of the future is going to be a hosted web service
A: A vessel for exploiting the virtues of the Linked Data Web.
Q: The Browser is Going to Swallow Up the Desktop
A: Literally, of course not! Metaphorically, of course! And then the Browser metaphor will decomposes into function specific bits of Web interaction amenable to orchestration by its users.
Q: The focus of the desktop will shift from information to attention
A: No! Knowledge, Information, and Data sharing courtesy of Hyperdata & Hypertext Linking.
Q: Users are going to shift from acting as librarians to acting as daytraders
A: They were Librarians at Web 1.0, Journalist at Web 2.0, and Analysts in Web 3.0 (i.e, analyze structured and interlinked data), and CEOs in Web 4.0 (i.e. get Agents to do stuff intelligently en route to making decisions).
Q: The Webtop will be more social and will leverage and integrate collective intelligence
A: The Linked Data Web vessel will only require you to fill in your profile (once) and then serendipitous discovery and meshing of relevant data will simply happen (the serendipity quotient will grow in line with Linked Data Web density).
Q: The desktop of the future is going to have powerful semantic search and social search capabilities built-in
A: It is going to be able to "Find" rather than "Search" for stuff courtesy of the Linked Data Web.
Q: Interactive shared spaces will replace folders
A: Data Spaces and their URIs (Data Source Names) replace everything. You simply choose the exploration metaphor that best suits you space interaction needs.
Q: The Portable Desktop
A: Ubiquitous Desktop i.e. do the same thing (all answers above) on any device connected to the Web.
Q: The Smart Desktop
A: Vessels with access to Smart Data (Linked Data + Action driven Context sprinklings).
Q: Federated, open policies and permissions
A: More federation for sure, XMPP will become a lot more important, and OAuth will enable resurgence of the federated aspects of the Web and Internet.
Q: The personal cloud
A: Personal Data Spaces plugged into Clouds (Intranet, Extranet, Internet).
Q: The WebOS
A: An operating system endowed with traditional Database and Host Operating system functionality such as: RDF Data Model, SPARQL Query Language, URI based Pointer mechanism, and HTTP based message Bus.
Q: Who is most likely to own the future desktop?
A: You! And all you need is a URI (an ID or Data Source Name for "Entity You") and a Profile Page (a place where "Entity You" is Describe by You).
You can get a feel for the future desktop by downloading and then installing the OpenLink Data Explorer plugin for Firefox, which allows you to switch viewing modes between Web Page and Linked Data behind the page. :-)
When the DBpedia & Yago integration took place last year (around WWW2007, Banff) there was a little, but costly omission that occurred: nobody sought to load the Yago Class Hierarchy into the Virtuoso's Inference Engine :-(
Anyway, the Class Hierarchy has now been loaded into the Virtuoso's inference engine (as Virtuoso Inference Rules) and the following queries are now feasible using the live Virtuoso based DBpedia instance hosted by OpenLink Software:
-- Find all Fiction Books associated with a property "dbpedia:name" that has literal value: Â "The Lord of the Rings" .
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
-- Variant of query with Virtuoso's Full Text Index extension via the bif:contains function/magic predicate
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
SELECT DISTINCT ?s ?n
FROM < xmlns="http" dbpedia.org="dbpedia.org">//dbpedia.org>
WHERE {
?s a yago:Fiction106367107 .
?s dbpedia:name ?n .
?n bif:contains 'Lord and Rings'
}
-- Retrieve all individuals instances of Fiction Class which should include all Books.
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
Note: you can also move the inference pragmas to the Virtuoso Sever side i.e place the inference rules in a server instance config file, thereby negating the need to place "define input:inference 'http://dbpedia.org/resource/inference/rules/yago#'" pragmas directly in your SPARQL queries.
When tagging a document, the semantic tagging service passes the content of a target document through a processing pipeline (a distillation process of sorts) that results in automagic extraction of the following:
Once the extraction phase is completed, a user is presented with a list of "suggested tags" using a variety of user interaction techniques. The literal values of elected Tags are then associated with one or more Tag and Tag Meaning Data Objects, with each Object type endowed with a unique Identifier.
Broad acceptance that: "Context is king", is gradually taking shape. That said, "Context" landlocked within Literal values offers little over what we have right now (e.g. at Del.icio.us or Technorati), long term. By this I mean: if the end product of semantically enhanced tagging leaves us with: Literal Tag values only, Tags associated with Tag Data Objects endowed with platform specific Identifiers, or Tag Data Objects with any other Identity scheme that excludes HTTP, the ability of Web users to discern or derive multiple perspectives from the base Context (exposed by semantically enhanced Tags) will be lost, or severely impeded at best.
The shape, form, and quality of the lookup substrate that underlies semantic tagging services, ultimately affects "context fidelity" matters such as Entity Disambiguation. The importance of quality lookup infrastructure on the burgeoning Linked Data Web is the reason why OpenLink Software is intimately involved with the DBpedia and UMBEL projects.
I am immensely happy to see that the Web 2.0 and Semantic Web communities are beginning to coalesce around the issue of "Context". This was the case at the WWW2008 Linked Data Workshop, I am feeling a similar vibe emerging from the Semantic Web Technologies conference currently nearing completion in San Jose. Of course, I will be talking about, and demonstrating practical utility of all of this, at the upcoming Linked Data Planet conference.
ODBC identifies data sources using Data Source Names (DSNs).
WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.
ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.
WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).
ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).
WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!
So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!
Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.
URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.
I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.
The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.
By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)
]]>Evoluation is evolution devoid of the randomness of mutation. A state of being in which it is possible to evaluate and choose evolutionary paths.
Evoluation actually describes where we are today in relation to the World Wide Web; to the Linking Open Data community (LOD), it's taking the path towards becoming a Giant Global Graph of Linked Data; to the Web 2.0 community, it's simply a collection of Web Services and associated APIs; and to many others, it remains an opaque collection of interlinked documents.
The great thing about the Web is that it allows netizens to explore a plethora of paths without adversely affecting the paths of others. That said, controlling one's path may take mutation out of evolution, but we are still left with the requirement to adapt and eventually survive in a competitive environment. Thus, although we can evaluate and choose from the many paths the Web's evolution offers us, the path that delivers the most benefits ultimately dominates. :-)
]]>Daniel simplifies my post by using diagrams to depict the different paths for PHP based applications exposing Linked Data - especially those that already provide a significant amount of the content that drives Web 2.0.
If all the content in Web 2.0 information resources are distillable into discrete data objects endowed with HTTP based IDs (URIs), with zero "RDF handcrafting Tax", what do we end up with? A Giant Global Graph of Linked Data; the Web as a Database.
So, what used to apply exclusively, within enterprise settings re. Oracle, DB2, Informix, Ingres, Sybase, Microsoft SQL Server, MySQL, PostrgeSQL, Progress Open Edge, Firebird, and others, now applies to the Web. The Web becomes the "Distributed Database Bus" that connects database records across disparate databases (or Data Spaces). These databases manage and expose records that are remotely accessible "by reference" via HTTP.
As I've stated at every opportunity in the past, Web 2.0 is the greatest thing that every happened to the Semantic Web vision :-) Without the "Web 2.0 Data Silo Conundrum" we wouldn't have the cry for "Data Portability" that brings a lot of clarity to some fundamental Web 2.0 limitations that end-users ultimately find unacceptable.
In the late '80s, the SQL Access Group (now part of X/Open) addressed a similar problem with RDBMS silos within the enterprise that lead to the SAG CLI which is exists today as Open Database Connectivity.
In a sense we now have WODBC (Web Open Database Connectivity), comprised of Web Services based CLIs and/or traditional back-end DBMS CLIs (ODBC, JDBC, ADO.NET, OLE-DB, or Native), Query Language (SPARQL Query Language), and a Wire Protocol (HTTP based SPARQL Protocol) delivering Web infrastructure equivalents of SQL and RDA, but much better, and with much broader scope for delivering profound value due to the Web's inherent openness. Today's PHP, Python, Ruby, Tcl, Perl, ASP.NET developer is the enterprise 4GL developer of yore, without enterprise confinement. We could even be talking about 5GL development once the Linked Data interaction is meshed with dynamic languages (delivering higher levels of abstraction at the language and data interaction levels). Even the underlying schemas and basic design will evolve from Closed World (solely) to a mesh of Closed & Open World view schemas.
]]>ReadWriteWeb via Alex Iskold's post have delivered another iteration of their "Guide to Semantic Technologies".
If you look at the title of this post (and their article) they seem to be accurately providing a guide to Semantic Technologies, so no qualms there. If on the other hand, this is supposed to he a guide to the "Semantic Web" as prescribed by TimBL then they are completely missing the essence of the whole subject, and demonstrably so I may add, since the entities: "ReadWriteWeb" and "Alex Iskold" are only describable today via the attributes of the documents they publish i.e their respective blogs and hosted blog posts.
Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.
TimBL's vision as espoused via the "Semantic Web" vision is about the production, consumption, and sharing of Data Objects via HTTP based Identifiers called URIs/IRIs (Hyperdata Links / Linked Data). It's how we use the Web as a Distributed Database where (as Jim Hendler once stated with immense clarity): I can point to records (entity instances) in your database (aka Data Space) from mine. Which is to say that if we can all point to data entities/objects (not just data entities of type "Document") using these Location, Value, and Structure independent Object Identifiers (courtesy of HTTP) we end up with a much more powerful Web, and one that is closer to the "Federated and Open" nature of the Web.
As I stated in a prior post, if you or your platform of choice aren't producing de-referencable URIs for your data objects, you may be Semantic (this data model predates the Web), but there is no "World Wide Web" in what you are doing.
I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.
As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.
Bottom-line, the Semantic Web is about adding the missing "Open Data Access & Connectivity" feature to the current Document Web (we have to beyond regex, grep, xpath, xquery, full text search, and other literal scrapping approaches). The Linked Data Web of de-referencable data object URIs is the critical foundation layer that makes this feasible.
Remember, It's not about "Applications" it's about Data and actually freeing Data from the "tyranny of Applications". Unfortunately, application inadvertently always create silos (esp. on the Web) since entity data modeling, open data access, and other database technology realm matters, remain of secondary interest to many application developers.
Final comment, RDF facilitates Linked Data on the Web, but all RDF isn't endowed with de-referencable URIs (a major source of confusion and misunderstanding). Thus, you can have RDF Data Source Providers that simply project RDF data silos via Web Services APIs if RDF output emanating from a Web Service doesn't provide out-bound pathways to other data via de-referencable URIs. Of course the same also applies to Widgets that present you with all the things they've discovered without exposing de-referencable URIs for each item.
BTW - my final comments above aren't in anyway incongruent with devising successful business models for the Web. As you may or may not know, OpenLink is not only a major platform provider for the Semantic Web (expressed in our UDA, Virtuoso, OpenLink Data Spaces, and OAT products), we are also actively seeding Semantic Web (tribe: Linked Data of course) startups. For instance, Zitgist, which now has Mike Bergman as it's CEO alongside Frederick Giasson as CTO. Of course, I cannot do Zitgist justice via a footnote in a blog post, so I will expand further in a separate post.
If you look at the title of this post (and their article) they seem to be accurately providing a guide to Semantic Technologies, so no qualms there. If on the other hand, this is supposed to he a guide to the "Semantic Web" as prescribed by TimBL then they are completely missing the essence of the whole subject, and demonstrably so I may add, since the entities: "ReadWriteWeb" and "Alex Iskold" are only describable today via the attributes of the documents they publish i.e their respective blogs and hosted blog posts.
Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.
TimBL's vision as espoused via the "Semantic Web" vision is about the production, consumption, and sharing of Data Objects via HTTP based Identifiers called URIs/IRIs (Hyperdata Links / Linked Data). It's how we use the Web as a Distributed Database where (as Jim Hendler once stated with immense clarity): I can point to records (entity instances) in your database (aka Data Space) from mine. Which is to say that if we can all point to data entities/objects (not just data entities of type "Document") using these Location, Value, and Structure independent Object Identifiers (courtesy of HTTP) we end up with a much more powerful Web, and one that is closer to the "Federated and Open" nature of the Web.
As I stated in a prior post, if you or your platform of choice aren't producing de-referencable URIs for your data objects, you may be Semantic (this data model predates the Web), but there is no "World Wide Web" in what you are doing.
I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.
As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.
Bottom-line, the Semantic Web is about adding the missing "Open Data Access & Connectivity" feature to the current Document Web (we have to beyond regex, grep, xpath, xquery, full text search, and other literal scrapping approaches). The Linked Data Web of de-referencable data object URIs is the critical foundation layer that makes this feasible.
Remember, It's not about "Applications" it's about Data and actually freeing Data from the "tyranny of Applications". Unfortunately, application inadvertently always create silos (esp. on the Web) since entity data modeling, open data access, and other database technology realm matters, remain of secondary interest to many application developers.
Final comment, RDF facilitates Linked Data on the Web, but all RDF isn't endowed with de-referencable URIs (a major source of confusion and misunderstanding). Thus, you can have RDF Data Source Providers that simply project RDF data silos via Web Services APIs if RDF output emanating from a Web Service doesn't provide out-bound pathways to other data via de-referencable URIs. Of course the same also applies to Widgets that present you with all the things they've discovered without exposing de-referencable URIs for each item.
BTW - my final comments above aren't in anyway incongruent with devising successful business models for the Web. As you may or may not know, OpenLink is not only a major platform provider for the Semantic Web (expressed in our UDA, Virtuoso, OpenLink Data Spaces, and OAT products), we are also actively seeding Semantic Web (tribe: Linked Data of course) startups. For instance, Zitgist, which now has Mike Bergman as it's CEO alongside Frederick Giasson as CTO. Of course, I cannot do Zitgist justice via a footnote in a blog post, so I will expand further in a separate post.
Yes, integration is hard, but I do profoundly believe that what's been happening on the Web over the last 10 or so years also applies to the Enterprise, and by this I absolutely do not mean "Enterprise 2.0" since "2.0" and productive agility do not compute in my realm of discourse.
large collections of RSS feeds, Wikiwords, Shared Bookmarks, Discussion Forums etc.. when disconnected at the data level (i.e. hosted in pages with no access to the "data behind") simply offer information deluge and inertia (there are only so many hours for processing opaque information sources in a given day).
Enterprises fundamentally need to process information efficiently as part of a perpetual assessment of their relative competitive Strengths, Weaknesses, Opportunities, and Threats (SWOT), in existing and/or future markets. Historically, IT acquisitions have run counter intuitively to the aforementioned quest for "Ability" due to the predominance of "rip and replace" approach technology acquisition that repeatedly creates and perpetuates information silos across Application, Database, Operating System, Development Environment boundaries. The sequence of events typically occurs as follows:
In the early to mid 90's (pre ubiquitous Web), operating system, programming language, operating system, and development framework independence inside the enterprise was technically achievable via ODBC (due to it's platform independence). That said, DBMS specific ODBC channels alone couldn't address the holistic requirements associated with Conceptual Views of disparate data sources, hence the need for Data Access Virtualization via Virtual Database Engine technology.
Just as is the case on the Web today, with the emergence of the "Linked Data" meme, enterprises now have a powerful mechanism for exploiting the Data Integration benefits associated with generating Data Objects from disparate data sources, endowed with HTTP based IDs (URIs).
Conceptualizing access to data exposed Databases APIs, SOA based Web Services (SOAP style Web Services), Web 2.0 APIs (REST style Web Services), XML Views of SQL Data (SQLX), pure XML etc.. is problem area addressed by RDF aware middleware (RDFizers e.g Virtuoso Sponger).
Here are examples of what SQL Rows exposed as RDF Data Objects (identified using HTTP based URIs) would look like outside or behind a corporate firewall:
What's Good for the Web Goose (Personal Data Space URIs) is good for the Enterprise Gander (Enterprise Data Space URIs).
*On* the ubiquitous Web of "Linked Documents", HREF means (by definition and usage): Hypertext Reference to an HTTP accessible Data Object of Type: "Document" (an information resource). Of course we don't make the formal connection of Object Type when dealing with the Web on a daily basis, but whenever you encounter the "resource not found" condition notice the message: HTTP/1.0 404 Object Not Found, from the HTTP Server tasked with retrieving and returning the resource.
*In* the Web of "Linked Data", a complimentary addition to the current Web of "Linked Documents", HREF is used to reference Data Objects that are of a variety of "Types", not just "Documents". And the way this is achieved, is by using Data Object Identifiers (URIs / IRIs that are generated by the Linked Data deployment platform) in the strict sense i.e. Data Identity (URI) is separated from Data Address (URL). Thus, you can reference a Person Data Object (aka an instance of a Person Class) in your HREF and the HTTP Server returns a Description of the Data Object via a Document (again, an information resource). A document containing the Description of a Data Object typically contains HREFs to other Data Objects that expose the Attributes and Relationships of the initial Person Data Object, and it this collection of Data Objects that is technically called a "Graph" -- which is what RDF models.
What I describe above is basic stuff for anyone that's familiar with Object Database or Distributed Objects technology and concepts.
The Linked Document Web is a collection of physical resources that traverse the Web Information Bus in palatable format i.e documents. Thus, Document Object Identity and Document Object Data Address can be the same thing i.e. a URL can serve as the ID/URI of a Document Data Object.
The Linked Data Web on the other hand, is a Distributed Object Database, and each Data Object must be uniquely defined, otherwise we introduce ambiguity that ultimately taints the Database itself (making incomprehensible to reasoning challenged machines). Thus we must have unique Object IDs (URIs / IRIs) for People, Places, Events, and other things that aren't Documents. Once we follow the time tested rules of Identity, People can then be associated with the things they create (blog posts, web pages, bookmarks, wikiwords etc). RDF is about expressing these graph model relationships while RDF serialization formats enables the information resources to transport these data object link ladden information resources to requesting User Agents.
Put in more succinct terms, all documents on the Web are compound documents in reality (e.g. mast contain a least an image these days). The Linked Data Web is about a Web where Data Object IDs (URIs) enable us to distill source data from the information contained in a compound document.
The degree of unobtrusiveness of new technology, concepts, or new applications of existing technology, is what ultimately determines eventual uptake and meme virulence (network effects). For a while, the Semantic Web meme was mired in confusion and general misunderstanding due to a shortage of practical use case scenario demos.
The emergence of the SPARQL Query Language has provided critical infrastructure for a number of products, projects, and demos, that now make the utility of the Semantic Web vision mush clearly via the simplicity of Linked Data, as exemplified by the following:
BTW - Benjamin Nowack penned an interesting post titled: Semantic Web Aliases, that covers a variety of labels used to describe the Semantic Web. The great thing about this post is that it provides yet another demonstration-in-the-making for the virtues of Linked Data :-)
Labels are harmless when their sole purpose is the creation of routes of comprehension for concepts. Unfortunately, Labels aren't always constructed with concept comprehension in mind, most of the time they are artificial inflectors and deflectors servicing marketing communications goals.
Anyway, irrespective of actual intent, I've endowed all of the labels from Bengee's post with URIs as my contribution important disambiguation effort re. the Semantic Web:
As per usual this post is best appreciated when processed via an Linked Data aware user agent.
]]>OpenLink Data Spaces (ODS) now officially supports:
- Attention Profiling Markup Language (APML).
- Meaning of a Tag (MOAT) in conjunction with Simple Knowledge Organisation System (SKOS) and Social-Semantic Cloud of Tags (SCOT).
- OAuth - an Open Authentication Protocol
Which means that OpenLink Data Spaces support all of the main standards being discussed in the DataPortability Interest Group!
APML Example:
All users of ODS automatically get a dynamically created APML file, for example: APML profile for Kingsley Idehen
The URI for an APML profile is: http://myopenlink.net/dataspace/<ods-username>/apml.xml
Meaning of a Tag Example:
All users of ODS automatically have tag cloud information embedded inside their SIOC file, for example: SIOC for Kingsley Idehen on the Myopenlink.net installation of ODS.
But even better, MOAT has been implemented in the ODS Tagging System. This has been demonstrated in a recent test blog post by my colleague Mitko Iliev, the blog post comes up on the tag search: http://myopenlink.net/dataspace/imitko/weblog/Mitko%27s%20Weblog/tag/paris
Which can be put through the OpenLink Data Browser:
OAuth Example:
OAuth Tokens and Secrets can be created for any ODS application. To do this:
- you can log in to MyOpenlink.net beta service, the Live Demo ODS installation, an EC2 instance, or your local installation
- then go to âSettingsâ
- and then you will see âOAuth Keysâ
- you will then be able to choose the applications that you have instantiated and generate the token and secret for that app.
Related Document (Human) Links
- OpenLink Data Spaces Official Page
- OpenLink Software Page
- OpenLink Data Spaces Wikipedia Page
- Attention Profiling Markup Language Project Website
- Meaning of a Tag Project Website
- Simple Knowledge Organisation Systems Project Website
- Social-Semantic Cloud of Tags Project Website
- OAuth Protocol Website
- DataPortability.org Website
- Semantically Interlinked Online Communities Project Website
Remember (as per my most recent post about ODS), ODS is about unobtrusive fusion of Web 1.0, 2.0, and 3.0+ usage and interaction patterns. Thanks to a lot of recent standardization in the Semantic Web realm (e.g SPARQL), we are now employ the MOAT, SKOS, and SCOT ontologies as vehicles for Structured Tagging.
This is how we take a key Web 2.0 feature (think 2D in a sense), bend it over, to create a Linked Data Web (Web 3.0) experience unobtrusively (see earlier posts re. Dimensions of Web). Thus, nobody has to change how they tag or where they tag, just expose ODS to the URLs of your Web 2.0 tagged content and it will produce URIs (Structured Data Object Identifiers) and a lnked data graph for your Tags Data Space (nee. Tag Cloud). ODS will construct a graph which exposes tag subject association, tag concept alignment / intended meaning, and tag frequencies, that ultimately deliver "relative disambiguation" of intended Tag Meaning (i.e. you can easily discern the taggers meaning via the Tags actual Data Space which is associated with the tagger). In a nutshell, the dynamics of relevance matching, ranking, and the like, change immensely without futile timeless debates about matters such as:
We can just get on with demonstrating Linked Data value using what exists on the Web today. This is the approach we are deliberately taking with ODS.
Tip: This post is best viewed via an RDF aware User Agent (e.g. a Browser or Data Viewer). I say this because the permalink of this post is a URI in a Linked Data Space (My Blog) comprised of more data than meets the eye (i.e. what you see when you read this post via a Document Web Browser) :-)
]]>There are quite a few reasons to use OpenLink Data Spaces (ODS). Here are 10 of the reasons why I use ODS:
- Its native support of DataPortability Recommendations such as RSS, Atom, APML, Yadis, OPML, Microformats, FOAF, SIOC, OpenID and OAuth.
- Its native support of Semantic Web Technologies such as: RDF and SPARQL/SPARUL for querying.
- Everything in ODS is an Object with its own URI, this is due to the underlying Object-Relational Architecture provided by Virtuoso.
- It has all the social media components that you could need, including: blogs, wikis, social networks, feed readers, CRM and a calendar.
- It is expandable by installing pre-configured components (called VADs), or by re-configuring a LAMP application to use Virtuoso. Some examples of current VADs include: MediaWiki, Wordpress and Drupal.
- It works with external webservices such as: Facebook, del.icio.us and Flickr.
- Everything within OpenLink Data Spaces is Linked Data, which provides more meaningful information than just plain structural information. This meaningful information could be used for complex inferencing systems, as ODS can be seen as a Knowledge Base.
- ODS builds bridges between the existing static-document based web (aka âWeb 1.0â), the more dynamic, services-oriented, social and/or user-orientated webs (aka âWeb 2.0â) and the web which we are just going into, which is more data-orientated (aka âWeb 3.0â or âLinked Data Webâ).
- It is fully supportive of Cloud Computing, and can be installed on Amazon EC2.
- Its released free under the GNU General Public License (GPL). [note]However, it is technically dual licensed as it lays on top of the Virtuoso Universal Server which has both Commercial and GPL licensing[/note]
The features above collectively provide users with a Linked Data Junction Box that may reside with corporate intranets or "out in the clouds" (Internet). You can consume, share, and publish data in a myriad of formats using a plethora of protocols, without any programming. ODS is simply about exposing the data from your Web 1.0, 2.0, 3.0 application interactions in structured from, with Linking, Sharing, and ultimately Meshing (not Mashing) in mind.
Note: Although ODS is equipped with a broad array of Web 2.0 style Applications, you do not need to use native ODS apps in order to exploit it's power. It binds to anything that supports the relevant protocols and data formats.
]]>Jason recently moved to Massachusetts which lead to me pinging him about our earlier blogosphere encounter and the emergence of a Data Portability Community. I also informed him about the fact that TimBL, myself, and a number of other Semantic Web technology enthusiasts, frequently meet on the 2nd Tuesday of each month at the MIT hosted Cambridge Semantic Web Gatherings, to discuss, demonstrate, debate all aspects of the Semantic Web. Luckily (for both of us), Jason attended the last event, and we got to meet each other in person.
Following our face to face meeting in Cambridge, a number of follow-on conversations ensued covering, Linked Data and practical applications of the Semantic Web vision. Jason writes about our exchanges a recent post titled: The Semantic Web. His passion for Data Portability enabled me to use OpenID and FOAF integration to connect the Semantic Web and Data Portability via the Linked Data concept.
During our conversations, Jason also eluded to the fact that he had already encountered OpenLink Software while working with our ODBC Drivers (part of or UDA product family) for IBM Informix (Single-Tier or Multi-Tier Editions) a few years ago (interesting random connection).
As I've stated in the past, I've always felt that the Semantic Web vision will materialize by way of a global epiphany. The count down to this inevitable event started at the birth of the blogosphere, ironically. And accelerated more recently, through the emergence of Web 2.0 and Social Networking, even more ironically :-)
The blogosphere started the process of Data Space coalescence via RSS/Atom based semi-strucutured data enclaves, Web 2.0 RDFpropagated Web Service usage en route to creating service provider controlled, data and information silosRDF, Social NetworkingRDF brought attention to the fact that User Generated Data wasn't actually owned or controlled by the Data Creators etc.
The emergence of "Data Portability" has created a palatable moniker for a clearly defined, and slightly easier to understand, problem: the meshing of Data and Identity in cyberspace i.e. individual points of presence in cyberspace, in the form of "Personal Data Spaces in the Clouds" (think: doing really powerful stuff with .name domains). In a sense, this is the critical inflection point between the document centric "Web of Linked Documents" and the data centric "Web or Linked Data". There is absolutely no other way solve this problem in a manner that alleviates the imminent challenges presented by information overload -- resulting from the exponential growth of user generated data across the Internet and enterprise Intranets.
]]>A query language for the burgeoning Structured & Linked Data Web (aka Semantic Web / Giant Global Graph). Like SQL, for the Relational Data Model, it provides a query language for the Graph based RDF Data Model.
It's also a REST or SOAP based Web Service that exposes SPARQL access to RDF Data via an endpoint.
In addition, it's also a Query Results Serialization format that includes XML and JSON support.
It brings important clarity to the notion of the "Web as a Database" by transforming existing Web Sites, Portals, and Web Services into bona fide corpus of Mesh-able (rather than Mash-able) Data Sources. For instance, you can perform queries that join one or more of the aforementioned data sources in exactly the same manner (albeit different syntax) as you would one or more SQL Tables.
-- SPARQL equivalent of SQL SELECT * against my personal data space hosted FOAF file
SELECT DISTINCT ?s ?p ?o FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s ?p ?o}
-- SPARQL against my social network -- Note: My SPARQL will be beamed across all of contacts in the social networks of my contacts as long as they are all HTTP URI based within each data space
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?Person FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s a foaf:Person; foaf:knows ?Person}
Note: you can use the basic SPARQL Endpoint, SPARQL Query By Example, or SPARQL Query Builder Demo tool to experiment with the demonstration queries above.
SPARQL is implemented by RDF Data Management Systems (Triple or Quad Stores) just as SQL is implemented by Relational Database Management Systems. The aforementioned data management systems will typically expose SPARQL access via a SPARQL endpoint.
A SPARQL implementors Testimonial page accompanies the SPARQL press release. In addition the is a growing collection of implementations on the ESW Wiki Page for SPARQL compliant RDF Triple & Quad Stores.
Yes! SPARQL facilitates an unobtrusive manifestation of a Linked Data Web by way of natural extension of the existing Document Web i.e these Web enclaves co-exist in symbiotic fashion.
As DBpedia very clearly demonstrates, Linked Data makes the Semantic Web demonstrable and much easier to comprehend. Without SPARQL there would be no mechanism for Linked Data deployment, and without Linked Data there is no mechanism for Beaming Queries (directly or indirectly) across the Giant Global Graph of data hosted by Social Networks, Shard Bookmarks Services, Weblogs, Wikis, RSS/Atom/OPML feeds, Photo Galleries and other Web accessible Data Sources (Data Spaces).
Information overload and Data Portability are two of the most pressing and imminent challenges affecting every individual connected to the global village exposed by the Internet and World Wide Web. I wrote an earlier post titled: Why We Need Linked Data that shed light on frequently overlooked realities about the Document Web.
The real Killer application of the Semantic Web (imho) is Linked Data (or Hyperdata), just as the killer application of the Document Web was Linked Documents (Hyperlinks). Linked Data enables human users (indirectly) and software agents (directly in response to human instruction) to traverse Web Data Spaces (Linked Data enclaves within the Giant Global Graph).
Semantic Web applications (conduits between humans and agents) that take advantage of Linked Data include:
DBpedia - General Knowledge sourced from Wikipedia and a host of other Linked Data Spaces.
Various Linked Data Browsers: Zitgist Data Viewer, OpenLink RDF Browser, DISCO Browser, and TimBL's Tabulator.
zLknks - Linked Data Lookup technology for Web Content Publishing systems (note: more to come on this in a future post).
OpenLink Data Spaces - a solution for Data Portability via a Linked Data Junction Box for Web 1.0 ((X)HTML Document Webs), 2.0 (XML Web Services based Content Publishing, Content Syndication, and Aggregation), and 3.0 (Linked Data) Data Spaces. Thus, via my URI (when viewed through a Linked Data Browser/Viewer) you can traverse my Data Space (i.e my Linked Data Graph) generated by the following activities:
Virtuoso - a Universal Server Platform that includes RDF Data Management, RDFization Middleware, SQL-RDF Mapping, RDF Linked Data Deployment, alongside a hybrid/multi-model, virtual/federated data service in a single product offering.
BTW - There is a Linked Data Workshop at this years World Wide Web conference. Also note the Healthcare & Life Science Workshop which is a related Linked Data technology and Semantic Web best practices realm. ]]>So, unlike Scoble, I am able to make my Facebook Data portable without violating Facebook rules (no data caching outside Facebook realm) by doing the following:
In a nutshell, my Linked Data Space enables you to reference data in my data space via Object Identifiers (URIs), and some cases the Object IDs and Graphs are constructed on the fly via RDFization middleware.
Here are my URIs that provide different paths to my Facebook Data Space:
To conclude, 2008 is clearly the inflection year during which we will final unshackle Data and Identity from the confines of "Web Data Silos" by leveraging the HTTP, SPARQL, and RDF induced virtues of Linked Data.
Related Posts:
"The phrase Open Social implies portability of personal and social data. That would be exciting but there are entirely different protocols underway to deal with those ideas. As some people have told me tonight, it may have been more accurate to call this "OpenWidget" - though the press wouldn't have been as good. We've been waiting for data and identity portability - is this all we get?"
[Source: Read/Write Web's Commentary & Analysis of Google's OpenSocial API]
..Perhaps the world will read the terms of use of the API, and realize this is not an open API; this is a free API, owned and controlled by one company only: Google. Hopefully, the world will remember another time when Google offered a free API and then pulled it. Maybe the world will also take a deeper look and realize that the functionality is dependent on Google hosted technology, which has its own terms of service (including adding ads at the discretion of Google), and that building an OpenSocial application ties Google into your application, and Google into every social networking site that buys into the Dream. Hopefully the world will remember. Unlikely, though, as such memories are typically filtered in the Great Noise....
[Source: Poignant commentary excerpt from Shelly Power's Blog (as always)]
The "Semantic Data Web" vision has always been about "Data & Identity" portability across the Web. Its been that and more from day one.
In a nutshell, we continue to exhibit varying degrees of Cognitive Dissonance re the following realities:
The Data Web is about Presence over Eyeballs due to the following realities:
This is why we need to inject a mesh of Linked Data into the existing Web. This is what the often misunderstood vision of the "Semantic Data Web" or "Web of Data" or "Web or Structured Data" is all about.
As stated earlier (point 10 above), "Data is forever" and there is only more of it to come! Sociality and associated Social Networking oriented solutions are at best a spec in the Web's ocean of data once you comprehend this reality.
Note: I am writing this post as an early implementor of GData and an implementor of RDF Linked Data technology and a "Web Purist".
OpenSocial implementation and support across our relevant product families: Virtuoso (i.e the Sponger Middleware for RDF component), OpenLink Data Spaces (Data Space Controller / Services), and the OpenLink Ajaxt Toolkit (i.e OAT Widgets and Libraries), is a triviality now that the OpenSocial APIs are public.
The concern I have, and the problem that remains mangled in the vast realms of Web Architecture incomprehension, is the fact that GData and GData based APIs cannot deliver Structured Linked Data in line with the essence of the Web without introducing "lock-in" that ultimately compromises the "Open Purity" of the Web. Facebook and Google's OpenSocial response to the Facebook juggernaut (i.e. open variant of the Facebook Activity Dashboard and Social Network functionality realms, primarily), are at best icebergs in the ocean we know as the "World Wide Web". The nice and predictable thing about icebergs is that they ultimately melt into the larger ocean :-)
On a related note, I had the pleasure of attending the W3C's RDF and DBMS Integration Workshop, last week. The event was well attended by organizations with knowledge, experience, and a vested interested in addressing the issues associated with exposing none RDF data (e.g. SQL) as RDF, and the imminence of data and/or information overload covered in different ways via the following presentations:Jon Udell recently penned a post titled: The Fourth Platform. The post arrives at a spookily coincidental time (this happens quite often between Jon and I as demonstrated last year during our podcast; the "Fourth" in his Innovators Podcast series).
The platform that Jon describes is "Cloud Based" and comprised of Storage and Computation. I would like to add Data Access and Management (native and virtual) under the fourth platform banner with the end product called: "Cloud based Data Spaces".
As I write, we are releasing a Virtuoso AMI (Amazon Image) labeled: virtuoso-dataspace-server. This edition of Virtuoso includes the OpenLink Data Spaces Layer and all of the OAT applications we've been developing for a while.
There's more to come!
]]>First off, I am going to focus on the Semantic Data Web aspect of the overall Semantic Web vision (a continuum) as this is what we have now. I am also writing this post as a deliberate contribution to the discourse swirling around the real topic: Semantic Web Value Proposition.
We are in the early stages of the long anticipated Knowledge Economy. That being the case, it would be safe to assume that information access, processing, and dissemination are of utmost importance to individuals and organizations alike. You don't produce knowledge in a vacum! Likewise, you can produce Information in a vacum, you need Data.
Increasingly, Blogs, Wikis, Shared Bookmarks, Photo Galleries, Discussion Forums, Shared Calendars and the like, have become invaluable tools for individual and organizational participation in Web enabled global discourse (where a lot of knowledge is discovered). These tools, are typically associated with Web 2.0, implying Read-Write access via Web Services, centralized application hosting, and data lock-in (silos).
The reality expressed above is a recipe for "Information Overload" and complete annihilation of ones effective pursuit and exploitation of knowledge due "Time Scarcity" (note: disconnecting is not an option). Information abundance is inversely related to available processing time (for humans in particular). In my case for instance, I was actively subscribed to over 500+ RSS feeds in 2003. As of today, I've simply stopped counting, and that's just my Weblog Data Space. Then add to that, all of the Discussions I track across Blogs, wikis, message boards, mailing lists, traditional usnet discussion forumns, and the like, and I think you get the picture.
Beyond information overload, Web 2.0 data is "Semi-Structured" by way of it's dominant data containers ((X)HTML, RSS, Atom documents and data streams etc.) lacking semantics that formally expose individual data items as distinct entities, endowed with unambiguous naming / identification, descriptive attributes (a type of property/predicate), and relationships (a type of property/predicate).
Solution:Devise a standard for Structured Data Semantics that is compatible with the Web Information BUS.
Produce structured data (entities, entity types, entity relationships) from Web 1.0 and Web 2.0 resources that already exists on the Web such that individual entities, their attributes, and relationships are accessible and discernible to software agents (machines).
Once the entities are individually exposed, the next requirement is a mechanism for selective access to these entities i.e. a query language.
Semantic Data Web Technologies that facilitate the solution described above include:
Structured Data Standards:Use of URIs or IRIs for uniquely identifying physical (HTML Documents, Image Files, Multimedia Files etc..) and abstract (People, Places, Music, and other abstract things).
Entity Access & Querying:SPARQL Query Language - the SQL analog of the Semantic Data Web that enables query constructs that target named entities, entity attributes, and entity relationships
Organizations are rife with a plethora of business systems that are built atop a myriad of database engines, sourced from a variety of DBMS vendors. A typical organization would have a different database engine, from a specific DBMS vendor, underlying critical business applications such as: Human Resource Management (HR), Customer Relationship Management (CRM), Accounting, Supply Chain Management etc. In a nutshell, you have DBMS Engines, and DBMS Schema heterogeneity permeating the IT infrastructure of organizations on a global scale, making Data & Information Integration the biggest headache across all IT driven organizations.
Solution:Alleviation of the pain (costs) associated with Data & Information Integration.
Semantic Data Web offerings:A dexterous data model (RDF) that enables the construction of conceptual views of disparate data sources across an organization based on existing web architecture components such as HTTP and URIs.
Existing middleware solutions that facilitate the exposure of SQL DBMS data as RDF based Structured Data include:
BTW - There is an upcoming W3C Workshop covering the integration of SQL and RDF data.
The Semantic Data Web is here, it's value delivery vehicle is the URI. The URI is a conduit to Interlinked Structured Data (RDF based Linked Data) derived from existing data sources on the World Wide Web alongside data continuously injected into the Web by organizations world wide. Ironically, the Semantic Data Web only platform that crystallizes the: Information at Your Fingertips vision, without development environment, operating system, application, or database lock-in. You simply click on a Linked Data URI and the serendipitous exploration and discovery of data commences.
The unobtrusive emergence of the Semantic Data Web is a reflection of the soundness of the underlying Semantic Web vision.
If you are excited about Mash-ups then your are a Semantic Web enthusiast and benefactor in the making, because you only "Mash" (brute force data extraction and interlinking) because you can't "Mesh" (natural data extraction and interlinking). Likewise, if you are a social-networking, open social-graph, or portable social-network enthusiast, then you are also a Semantic Data Web benefactor and enthusiasts, because your "values" (yes, the values associated with the properties that define you e.g your interests etc) are the fundamental basis for portable, open, social-networking, which is what the Semantic Data Web hands to you on a platter without compromise (i.e. data lock-in or loss of data ownership).
Some practical examples of Semantic Data Web prowess:On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS.
Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol.
The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers).
HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc).
Examples of Information Resource and Data Source URIs:
Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL).
Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following:
The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as an Information Resource. It's comprised of granular data items called "Entities", that expose fine grained properties values, individual and/or group characteristics (attributes), and relationships (associations) with other Entities.
The Web is in the final stages of the 3rd phase of it's evolution. A phase characterized by the shuttling of structured data payloads (RDF) alongside less data oriented payloads (HTML, XHTML, XML etc.). As you can see, Linked Data and Structured Data are both terms used to describe the addition of more data centric payloads to the Web. Thus, you could view the process of creating a Structured Web of Linked Data as follows:
The Semantic Data Web is an evolution of the current Web (an Information Space) that adds structured data payloads (RDF) to current, less data oriented, structured payloads (HTML, XHTML, XML, and others).
The Semantic Data Web is increasingly seen as an inevitability because it's rapidly reaching the point of critical mass (i.e. network effect kick-in). As a result, Data Web emphasis is moving away from: "What is the Semantic Data Web?" To: "How will Semantic Data Web make our globally interconnected village an even better place?", relative to the contributions accrued from the Web thus far. Remember, the initial "Document Web" (Web 1.0) bootstrapped because of the benefits it delivered to blurb-style content publishing (remember the term electronic brochure-ware?). Likewise, in the case of the "Services Web" (Web 2.0), the bootstrap occurred because it delivered platform independence to Web Application Developers - enabling them to expose application logic behind Web Services. It is my expectation that the Data Integration prowess of the Data Web will create a value exchange realm for data architects and other practitioners from the database and data access realms.
Now that broader understanding of the Semantic Data Web is emerging, I would like to revisit the issue of "Data Spaces".
A Data Space is a place where Data Resides. It isn't inherently bound to a specific Data Model (Concept Oriented, Relational, Hierarchical etc..). Neither is it implicitly an access point to Data, Information, or Knowledge (the perception is purely determined through the experiences of the user agents interacting with the Data Space.
A Web Data Space is a Web accessible Data Space.
Real world example:
Today we increasing perform one of more of the following tasks as part of our professional and personal interactions on the Web:
John Breslin has nice a animation depicting the creation of Web Data Spaces that drives home the point.
Web Data Space SilosUnfortunately, what isn't as obvious to many netizens, is the fact that each of the activities above results in the creation of data that is put into some context by you the user. Even worse, you eventually realize that the service providers aren't particularly willing, or capable of, giving you unfettered access to your own data. Of course, this isn't always by design as the infrastructure behind the service can make this a nightmare from security and/or load balancing perspectives. Irrespective of cause, we end up creating our own "Data Spaces" all over the Web without a coherent mechanism for accessing and meshing these "Data Spaces".
What are Semantic Web Data Spaces?Data Spaces on the Web that provide granular access to RDF Data.
What's OpenLink Data Spaces (ODS) About?Short History
In anticipation of this the "Web Data Silo" challenge (an issue that we tackled within internal enterprise networks for years) we commenced the development (circa. 2001) of a distributed collaborative application suite called OpenLink Data Spaces (ODS). The project was never released to the public since the problems associated with the deliberate or inadvertent creation of Web Data silos hadn't really materialized (silos only emerged in concreted form after the emergence of the Blogosphere and Web 2.0). In addition, there wasn't a clear standard Query Language for the RDF based Web Data Model (i.e. the SPARQL Query Language didn't exist).
Today, ODS is delivered as a packaged solution (in Open Source and Commercial flavors) that alleviates the pain associated with Data Space Silos that exist on the Web and/or behind corporate firewalls. In either scenario, ODS simply allows you to create Open and Secure Data Spaces (via it's suite of applications) that expose data via SQL, RDF, XML oriented data access and data management technologies. Of course it also enables you to integrates transparently with existing 3rd party data space generators (Blogs, Wikis, Shared Bookmrks, Discussion etc. services) by supporting industry standards that cover:
Thus, by installing ODS on your Desktop, Workgroup, Enterprise, or public Web Server, you end up with a very powerful solution for creating Open Data access oriented presence on the "Semantic Data Web" without incurring any of the typically assumed "RDF Tax".
Naturally, ODS is built atop Virtuoso and of course it exploits Virtuoso's feature-set to the max. It's also beginning to exploit functionality offered by the OpenLink Ajax Toolkit (OAT).
]]>From my perspective on things I prefer to align my articulation of the changes that are occurring across our industry (courtesy of the Internet Inflection) to the MVC pattern.
Re. the Web Versions (or Dimensions of Interaction):
The same applies to evolution of Openness:
In the (C)ontroller realm where the focal point is Application Logic, data access issues aren't obvious (*I recall my battles with Richard Stallman re. the appropriate Open Source License variant for iODBC during the embryonic years of database and data access technology on Linux*). Data is an enigma in this realm, unfortunately. This implies that "Data Lock-in" occurs deliberately, but in most cases, inadvertently when we make Application Logic the focal point of everything. Another example is Web 2.0 in which the norm (unfortunately) is to suck in your data, and then refuse to give you complete ownership over how it is used (including the fact that you may want to share it elsewhere).
Open Data is a really big deal which is why the SWEO supported Linking Open Data Project is a very big deal. The good news is that this movement is gathering moment at an exponential rate :-)]]>Some Definitions (as per usual):
RDF Middleware (as defined in this context) is about producing RDF from non RDF Data Sources. This implies that you can use non RDF Data Sources (e.g. (X)HTML Web Pages, (X)HTML Web Pages hosting Microformats, and even Web Services such as those from Google, Del.icio.us, Flickr etc..) as Semantic Web Data Source URIs (pointers to RDF Data).
In this post I would like to provide a similar perspective on this ability to treat non RDF as RDF from RDF Browser perspective.
First off, what's an RDF Browser?
An RDF Browser is a piece of technology that enables you to Browse RDF Data Sources by way of Data Link Traversal. The key difference between this approach and traditional browsing is that Data Links are typed (they possess inherent meaning and context) whereas traditional links are untyped (although universally we have been trained to type them as links to Blurb in the form of (X)HTML pages or what is popularly called "Web Content".).
There are a number of RDF Browsers that I am aware off (note: pop me a message directly of by way of a comment to this post if you have a browser that I am unaware of), and they include (in order of creation and availability):
Each of the browsers above can consume the services of Triplr or the Virtuoso Sponger en route to unveiling a RDF Data that is traversable via URI dereferencing (HTTP GETing the data exposed by the Data Pointer). Thus you can cut&paste the following into each of the aforementioned RDF Browsers:
Since we are all time challenged (naturally!) you can also just click on these permalinks for the OAT RDF Browser demos:
]]>Quick Definitions:
Reasons for the distinction:
Examples:
So what? You may be thinking.
For starters, I can quite easily Mesh data from Googlebase (which emits RSS 2.0 or Atom) and other data sources with the Mapping Services from Yahoo!
I can achieve this in minutes without writing a single line of code. I can do it because of the Data Model prowess of RDF (self-describing instance-data), the data interchange and transformation power of XML and XSLT respectively, the inherent power of XML based Web Services (REST or SOAP), and of course, having a Hybrid Server product like Virtuoso at my disposal that delivers a cross platform solution for exploiting all of these standards coherently.
I can share the self-describing describing data source that serves my Meshup. Try reusing the data presented by a Mashup via the same URL that you used to locate Mashup to get my drift.
Demo Links:
What does this all mean?
"Context" is the catalyst of the burgeoning Data Web (Semantic Web Layer - 1). It's the emerging appreciation of "Context" that is driving the growing desire to increment Web versions from 2.0 to 3.0. It also the the very same "Context" that has been a preoccupation of Semantic Web vision since its inception.
The journey towards a more Semantic Web is all inclusive (all "ANDs" and no "ORs" re. participation).
The Semantic Web is self-annotating. Web 2.0 has provided a huge contribution to the self annotation effort: on the Web we now have Data Spaces for Bookmarks (e.g del.icio.us), Image Galleries ( e.g Flickr), Discussion Forums (remember those comments associated with blog posts? ditto the pingbacks and trackbacks?), People Profiles (FOAF, XFN, del.icio.us, and those crumbling walled-gardens around many Social Networks), and more..
A Web without granular access to Data is simply not a Web worth having (think about the menace of click-fraud and spam).
]]>(Via Read/Write Web.)
Web 3.0: When Web Sites Become Web Services: "
.....As more and more of the Web is becoming remixable, the entire system is turning into both a platform and the database. Yet, such transformations are never smooth. For one, scalability is a big issue. And of course legal aspects are never simple.'
But it is not a question of if web sites become web services, but when and how. APIs are a more controlled, cleaner and altogether preferred way of becoming a web service. However, when APIs are not avaliable or sufficient, scraping is bound to continue and expand. As always, time will be best judge; but in the meanwhile we turn to you for feedback and stories about how your businesses are preparing for 'web 3.0'.
We are hitting a little problem re. Web 3.0 and Web 2.0, naturally :-) Web 2.0 is one of several (present and future) Dimensions of Web Interaction that turns Web Sites into Web Services Endpoints; a point I've made repeatedly [1] [2] [3] [4] across the blogosphere, in addition to my early futile attempts to make the Wikipedia's Web 2.0 article meaningful (circa 2005), as per the Wikipedia Web 2.0 Talk Page excerpt below:
Web 2.0 is a web of executable endpoints and well formed content. The executable endpoints and well formed content are accessible via URIs. Put differently, Web 2.0 is a web defined by URIs for invoking Web Services and/or consuming or syndicating well formed content.
Hopefully, someone with more time on their hands will expand on this ( I am kinda busy)
.BTW - Web 2.0 being a platform doesn't distinguish it in anyway from Web 1.0. They are both platforms, the difference comes down to platform focus and mode of experience.
Web 3.0 is about Data Spaces: Points of Semantic Web Presence that provide granular access to Data, Information, and Knowledge via Conceptual Data Model oriented Query Languages and/or APIs.
The common denominator across all the current and future Web Interaction Dimensions is HTTP. While their differences are as follows:
Examples of Web 3.0 Infrastructure:
Web 3.0 is not purely about Web Sites becoming Web Services endpoints. It is about the "M" (Data Model) taking it's place in the MVC pattern as applied to the Web Platform.
I will repeat myself yet again:
The Devil is in the Details of the Data Model. Data Models make or break everything. You ignore data at your own peril. No amount of money in the bank will protect you from Data Ignorance! A bad Data Model will bring down any venture or enterprise, the only variable is time (where time is directly related to your increasing need to obtain, analyze, and then act on data, over repetitive operational cycles, that have ever decreasing intervals).
This applies to the Real-time enterprise of Information and/or knowledge workers and Real-time Web Users alike.
BTW - Data Makes Shifts Happen (spotter: Sam Sethi).
]]>Web 2.0 commentators such as Mike Arrington, and as mentioned above,Tim O'Reilly, both blogged about the imminent release of Freebase earlier today. Although I haven't looked at this database yet, it is crystal clear to me that it is one of many Web Databases to come. Others that I am personally familiar with, and involved in, include: DBpedia (Wikipedia as a true Database) and Zitgist (soon to be unveiled).
All of these databases mark the crystallization of the "Data Web" and the imminence of what is increasingly referred to as Web 3.0.
I certainly hope that all web 3.0 Database Providers keep the data Open, adhere to Web Best Practice recipes for sharing and publishing data, and generally make the process of data, information, and knowledge discovery via the Web much easier.
]]>Linking personal posted content across communities: "
With the help of Kingsley, Uldis and I have been looking at how SIOC can be used to link the content that a single person posts to a number of community sites. The picture below shows an example of stuff that Iâve created on Flickr, YouTube, etc. through my various user identities on those sites (these match some SIOC types that we want to add to a separate module). We can also say that each Web 2.0 content item is a user-contributed post, with some attached or embedded content (e.g. a file or maybe just some metadata). This is part of a new discussion on the sioc-dev mailing list, and weâd value your contributions.
Edit: The inner layer is a person (semantically described in FOAF), the next layer is their user accounts (described in FOAF, SIOC) and the outer layer is the posted content - text, files, associated metadata - on community sites (again described using SIOC).
No Tags"(Via John Breslin - Cloudlands.)
The point that John is making about the Data Web and Interlinked Data Spaces exposed via URIs (e.g Personal URIs), crystallizes a number of very important issues about the Data Web that may remain unclear. I am hoping that by digesting the post excerpt above, in conjunction with the items below, aids the pursuit of clarity and comprehension about the all important Data Web (Semantic Web - Layer 1):
Examples of some of these principles in practice:
And of course there is more to come such as Grandma's Semantic Web Browser which is coming from Zitgist LLC (pronounced: Zeitgeist) a joint venture of OpenLink Software and Frederick Giasson.
]]>In this third take on my introduction to the Data Web I would like to share a link with you (a Dynamic Start Page in Web 2.0 parlance) with a Data Web twist: You do not have to preset the Start Page Data Sources (this is a small-big thing, if you get my drift, hopefully!).
Here are some Data Web based Dynamic Start Pages that I have built for some key play ers from the Semantic Web realm (in random order):
"These are RDF prepped Data Sources....", you might be thinking, right? Well here is the reminder: The Data Web is a Global Data Generation and Integration Effort. Participation may be active (Semantic Web & Microformats Community), or passive (web sites, weblogs, wikis, shared bookmarks, feed subscription, discussion forums, mailing lists etc..). Irrespective of participation mode, RDF instance can be generated from close to anything (I say this because I plan to add binary files holding metadata to this mix shortly). Here are examples of Dynamic Start Pages for non RDF Data Sources:
what about Microformats you may be wondering? Here goes:
Let's carry on.
How about some traditional Web Sites? Here goes:
And before I forget, here is My Data Web Start Page .
Due to the use of Ajax in the Data Web Start Pages, IE6 and Safari will not work. For Mac OS X users, Webkit works fine. Ditto re. IE7 on Windows.
]]>Play Date: What is that thing on the Wall? My Son: Security Alarm Play Date: How does it work My Son: If you click on that top button and then open the door, I will have to enter a code when we come back in or the alarm will go off Play Date: What is the code? My Son: I can't tell you that! Play Date: Why not? My Son: You might come and steal something from our house! Play Date: No I won't! My Son: Well, you might tell someone that might come and steal something from our house! or that person could tell someone who could tell someone that would steal from our house
LOL!! of course! At the same time wondering, how come a majority of adults don't quite see the need for granular access to Web Data in a manner that enables computers and humans to collectively arrive at similar decisions?
Putting Data in context en route to producing actionable knowledge is a transient endeavor that engages a myriad of human senses. We demonstrate comprehension of this fact in our daily existence as social creatures (at a very early age as depicted above). That said, we seem to forget this fact when engaging the Web: If we can't see it then it can't be valuable.
BTW - I just received a ping about the "Sensory Web" (which is just another way of describing a Data Driven Web experience from my vantage point.)
In the popular M-V-C pattern you don't see the "M", but the "M" will kill you if you get it wrong (it is the FORCE)! Coming to think about it, the pattern could have been coined: V-C-M or C-M-V, but isn't for obvious reasons :-)
RDF is the vehicle that enables us tap into the Data aspect of the Web. We started off with pages of blurb linked via hypertext (Web 1.0) and then looked to "Keywords" for some kind of data access; we then isolated some "Verbs" and discovered another dimension of Web Interaction (Web 2.0) but looked to these "Verbs" for data access which left us with Mashups; and now we are starting to extract "Nouns" and "Adjectives" from sentences (Subject, Predicate, Object - Triples) associated with resources on the Web (Data Web / Web 3.0 / Semantic Web Layer 1) which provides a natural data access substrate for Meshups (natural joining of disparate data from a plethora of data sources) while providing the foundation layer for the Semantic Web.
For those who need use-cases that demonstrate tangible value re. the Semantic Web, here are some projects to note courtesy of the Semantic Web Education and Outreach (SWEO) interest group:
OAT: OpenAjax Alliance Compliant Toolkit: "
Ondrej Zara and his team at Openlink Software have created a Openlink Software JS Toolkit, known as OAT. It is a full-blown JS framework, suitable for developing
rich applications with special focus to data access.
OAT works standalone, offers vast number of widgets and has some rarely seen features, such as on-demand library loading (which reduces the total amount of downloaded JS code).
OAT is one of the first JS toolkits which show full OpenAjax Alliance conformance: see the appropriate wiki page and conformance test page.
There is a lot to see with this toolkit:
You can see some of the widgets in a Kitchen sink application
Sample data access applications:
OAT is Open Source and GPLâed over at sourceforge and the team has recently managed to incorporate our OAT data access layer as a
module to dojo datastore.
(Via Ajaxian Blog.)
This is a corrected version of the initial post. Unfortunately, the initial post was inadvertently littered with invalid links :-( Also, since the original post we have released OAT 1.2 that includes integration of our iSPARQL QBE into the OAT Form Designer application.
Re. Data Access, It is important to note that OAT's Ajax Database Connectivity layers supports data binding to the following data source types:
OAT also includes a number of prototype applications that are completely developed using OAT Controls and Libraries:
Note: Pick "Local DSN" from page initialization dialog's drop-down list control when prompted
]]>Dare:
I have been through the Wikipedia fires a few times. If you recall that I actually triggered the early Web 2.0 Wikipedia article. along the following lines:
As annoying as the experience above was, I didn't find this inconsistent with the spirit of Wikipedia (i.e. open contribution and discourse). I felt, at the time, that a lot of historical data was being left in place for future reference etc.. In addition, the ultimate aim of creating an evolving Web 2.0 document did commence albeit some distance from "modern man" re. accuracy and meaningfulness as of my last read (today).
Even closer to home, I repeated the process above re. Virtuoso Universal Server. This basically ended up being a live case study on how you handle the Wikipedia NPOV conundurum. Just look at the Virtuoso Universal Server Talk Pages to see how the process evolved (the key was Virtuoso's lineage and it's proximity to the very DBMS platform upon which Wikipedia runs i.e MySQL).
Bearing in mind the size and magnitude of Microsoft, there should be no reason why Microsoft's "Microsoft Digital Caucus" ( legions of Staff, MSDN members, Integrators, and other partners) can't simply go into Wikipedia and participate in the edit and discourse process.
Truth cannot be surpressed! At best, it can only be temporarily delayed :-) Even more so on the Web!
]]>SPARQL (query language for the Semantic Web) basically enables me to query a collection of typed links (predicates/properties/attributes) in my Data Space (ODS based of course) without breaking my existing local bookmarks database or the one I maintain at del.icio.us.
I am also demonstrating how Web 2.0 concepts such as Tagging mesh nicely with the more formal concepts of Topics in the Semantic Web realm. The key to all of this is the ability to generate RDF Data Model Instance Data based on Shared Ontologies such as SIOC (from DERI's SIOC Project) and SKOS (again showing that Ontologies and Folksonomies are complimentary).
This demo also shows that Ajax also works well in the Semantic Web realm (or web dimension of interaction 3.0) especially when you have a toolkit with Data Aware controls (for SQL, RDF, and XML) such as OAT (OpenLink Ajax Toolkit). For instance, we've successfully used this to build a Visual Query Building Tool for SPARQL (alpha) that really takes a lot of the pain out of constructing SPARQL Queries (there is much more to come on this front re. handling of DISTINCT, FILTER, ORDER BY etc..).
For now, take a look at the SPARQL Query dump generated by this SIOC & SKOS SPARQL QBE Canvas Screenshot.
You can cut and paste the queries that follow into the Query Builder or use the screenshot to build your variation of this query sample. Alternatively, you can simply click on *This* SPARQL Protocol URL to see the query results in a basic HTML Table. And one last thing, you can grab the SPARQL Query File saved into my ODS-Briefcase (the WebDAV repository aspect of my Data Space).
Note the following SPARQL Protocol Endpoints:
My beautified Version of the SPARQL Generated by QBE (you can cut and paste into "Advanced Query" section of QBE) is presented below:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX dct: <http://purl.org/dc/elements/1.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT distinct ?forum_name, ?owner, ?post, ?title, ?link, ?url, ?tag FROM <http://myopenlink.net/dataspace> WHERE { ?forum a sioc:Forum; sioc:type "bookmark"; sioc:id ?forum_name; sioc:has_member ?owner. ?owner sioc:id "kidehen". ?forum sioc:container_of ?post . ?post dct:title ?title . optional { ?post sioc:link ?link } optional { ?post sioc:links_to ?url } optional { ?post sioc:topic ?topic. ?topic a skos:Concept; skos:prefLabel ?tag}. }
Unmodified dump from the QBE (this will be beautified automatically in due course by the QBE):
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX sioc: <http://rdfs.org/sioc/ns#> PREFIX dct: <http://purl.org/dc/elements/1.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?var8 ?var9 ?var13 ?var14 ?var24 ?var27 ?var29 ?var54 ?var56 WHERE { graph ?graph { ?var8 rdf:type sioc:Forum . ?var8 sioc:container_of ?var9 . ?var8 sioc:type "bookmark" . ?var8 sioc:id ?var54 . ?var8 sioc:has_member ?var56 . ?var9 rdf:type sioc:Post . OPTIONAL {?var9 dc:title ?var13} . OPTIONAL {?var9 sioc:links_to ?var14} . OPTIONAL {?var9 sioc:link ?var29} . ?var9 sioc:has_creator ?var37 . OPTIONAL {?var9 sioc:topic ?var24} . ?var24 rdf:type skos:Concept . OPTIONAL {?var24 skos:prefLabel ?var27} . ?var56 rdf:type sioc:User . ?var56 sioc:id "kidehen" . } }
Current missing items re. Visual QBE for SPARQL are:
Quick Query Builder Tip: You will need to import the following (using the Import Button in the Ontologies & Schemas side-bar);
Browser Support: The SPARQL QBE is SVG based and currently works fine with the following browsers; Firefox 1.5/2.0, Camino (Cocoa variant of Firefox for Mac OS X), Webkit (Safari pre-release / advanced sibling), Opera 9.x. We are evaluating the use of the Adobe SVG plugin re. IE 6/7 support.
Of course this should be a screencast, but I am the middle of a plethora of things right now :-)
]]>I came across a pretty deep comments trail about the aforementioned items on Fred Wilson's blog (aptly titled: A VC) under the subject heading: Web 3.0 Is The Semantic Web.
Contributions to the general Semantic Web discourse by way of responses to valuable questions and commentary contributed by a Semantic Web skeptic (Ed Addison who may be this Ed Addison according to Google):
Ed, Responses to your points re. Semantic Web Matrialization:<< 1) ontologies can be created and maintained by text extractors and crawlers" >>
Ontologies will be developed by Humans. This process has already commenced and far more landscape has been covered that you may be aware of. For instance, there is an Ontology for Online Communities with Semantics factored in. More importantly, most Blogs, Wikis, and other "points of presence" on the Web are already capable of generating Instance Data for this Ontology by way of the underlying platforms that drive these things. The Ontology is called: SIOC (Semantically-Interlinked Online Communities).
<< 2) the entire web can be marked up, semantically indexed, and maintained by spiders without human assistance >>
Most of it can, and already is :-) Human assistance should, and would, be on an "exception basis" a preferred use of human time (IMHO). We do not need to annotate the Web manually when this labor intensive process can be automated (see my earlier comments).
<< 3) inference over the semantic web does not require an extremely deep heuristic search down multiple, redundant, cyclical pathways with many islands that are disconnected >>
When you have a foundation layer of RDF Data (generated in the manner I've discussed above), you then have a substrate that's far more palatable to Intelligent Reasoning. Note, the Semantic Web is made of many layers. The critical layer at this juncture is the Data-Web (Web of RDF Data). Note, when I refer to RDF I am not referring to RDF/XML the serialization format, I am referring to the Data Model (a Graph).
<< 4) the web becomes smart enough to eliminate websites or data elements that are incorrect, misleading, false, or just plain lousy >>
The Semantic Web vision is not about eliminating Web Sites (The Hypertext-Document-Web). It is simply about adding another dimension of interaction to the Web. This is just like the Services-Web dimension as delivered by Web 2.0.
We are simply evolving within an innovation continuum. There is no mutual exclusivity about any of the Web Dimensions since they collectively provide us with a more powerful infrastructure for building and exploiting "collective wisdom".
As for the Data-Web experiment part of this post, I would expect to see this post exposed as another contribution to the Data-Web via the PingTheSemanticWeb notification service :-) Implying, that all the relevant parts of this conversation are in a format (Instance Data for the SIOC Ontology) that is available for further use in a myriad of forms.
]]>Web Me2.0 -- Exploding the Myth of Web 2.0:"Many people have told me this week that they think 'Web 2.0' has not been very impressive so far and that they really hope for a next-generation of the Web with some more significant innovation under the hood -- regardless of what it's called. A lot of people found the Web 2.0 conference in San Francisco to be underwhelming -- there was a lot of self-congratulation by the top few brands and the companies they have recently bought, but not much else happening. Where was all the innovation? Where was the focus on what's next? It seemed to be a conference mainly about what happened in the last year, not about what will happen in the coming year. But what happened last year is already so 'last year.' And frankly Web 2.0 still leaves a lot to be desired. The reason Tim Berners-Lee proposed the Semantic Web in the first place is that it will finally deliver on the real potential and vision of the Web. Not that today's Web 2.0 sucks completely -- it only sort of sucks. It's definitely useful and there are some nice bells and whistles we didn't have before. But it could still suck so much less!"
Web 2.0 is a (not was) a piece of the overall Web puzzle. The Data Web (so called Web 3.0) is another critical piece of this puzzle, especially as it provides the foundation layer (Layer 1) of the Semantic Web.
Web 2.0 was never about "Open Data Access", "Flexible Data Models", or "Open World" meshing of disparate data sources built atop disparate data schemas (see: Web 2.0's Open Data Access Conundrum). It was simply about "Execution and APIs". I already written about "Web Interaction Dimensions", but you call also look at the relationship of the currently perceived dimensions through the M-V-C programming pattern:
Another point to note, Social Networking is hot, but nearly every social network that I know (and I know and use most of them) suffers from an impedance mismatch between the service(s) they provide (social networks) and their underlying data models (in many cases Relational as opposed to Graph). Networks are about Relationships (N-ary) and your cannot effectively exploit the deep potential of: "Network Effects" (Wisdom of Crowds, Viral Marketing etc..) without a complimentary data model, you simply can't.
Finally, the Data Web is already here, I promised a long time ago (Internet Time) that the manifestation of the Semantic Web would occur unobtrusively, meaning, we will wake up one day and realize we are using critical portions of the Semantic Web (i.e. Data-Web) without even knowing it. Guess what? It's already happening. Simple case in point, you may have started to notice the emergence of SIOC gems in the same way you may have observed those RSS 2.0 gems at the dawn of Web 2.0. What I am implying here is that the real question we should be asking is: Where is the Semantic Web Data? And how easy or difficult will it be to generate? And where are the tools? My answers are presented below:
Next stop, less writing, more demos, these are long overdue! At least from my side of the fence :-) I need to produce a little step-by-guide oriented screencasts that demonstrates how Web 2.0 meshes nicely with the Data-Web.
Here are some (not so end-user friendly) examples of how you can use SPARQL (Data-Web's Query Language) to query Web 2.0 Instance Data projected through the SIOC Ontology:
Note: You can use the online SPARQL Query Interface at: http://demo.openlinksw.com/isparql.
Other Data-Web Technology usage demos include:
Amongst the numerous comments about this subject, I felt most compelled to respond to the commentary from Tim O'Reilly (based on his proximity to Web 2.0 etc..) in relation to his view that the NYT's Web 3.0 = Collective Intelligence Harnessing aspect of his Web 2.0 meme.
My response is dumped semi-verbatim below:
Tim,
A few things:
- We are in an innovation continuum
- The Web as a medium of innovation will evolve forever
- Different commentators have different views about monikers associated with these innovations
- To say Web 3.0 (aka the Data Web or Semantic Web - Layer 1) is what Web 2.0's collective intelligence is all about is a little inaccurate (IMHO); Web 2.0 doesn't provide "Open Data Access"
- Web 2.0 is a "Web of Services" primarily, a dimension of "Web Interaction" defined by interaction with Services
- Web 3.0 ("Data Web" or "Web of Databases" or "Semantic Web - Layer 1") is a Web dimension that provides "Open Data Access" that will be exemplified by the transition from "Mash-ups" (brute force data joining) to "Mesh-ups" (natural data joining)
The original "Web of Hypertext" or "Interactive Web", the current "Web of Services", and the emerging "Data Web" or "Web of Databases" collectively provide dimensions of interaction in the innovation continuum called the Web.
There are many more dimensions to come. Monikers come and go, but the retrospective "Long Shadow" of Innovation is ultimately timeless.
"Mutual Inclusivity" is a critical requirement for truly perceiving these "Web Interaction Dimensions" ("Participation" if I recall). "Mutual Exclusivity" on the other hand, simpy leads to obscuring reality with Versionitis as exemplified by the ongoing: Web 1.0 vs 2.0 vs 3.0 debates.
BTW - I enjoyed reading Nick Carr's take on the Web 3.0 meme, especially his "tongue in cheek" power-grab for the rights to all "Web 3.0" Conferences etc. :-)
]]>Likewise, It would be nice if there were some Mash-ups, Service Endpoints, or Syndication Feeds that exposed relevant Data from the Web 2.0 Summit (beyond the usual selective, best-of, type Blog Commentary and traditional Speakers List).
]]>From the ISWC event (#swig) I just located a presentation by TimBL titled: Semantic Web & Web 2.0
Key Excerpt: Web 2.0 and the Semantic Web work "Well Apart" and "Great Together".
]]>Geonames announced the release of its Geonames ontology v1.2. The new ontology has few enhancements. It introduced the notion of linked data and made clear distinction between URI that intended for linking documents and for linking ontology concepts.
Different types of geospatial data are of different spatial granularity. Data of different spatial granularity may relate to each other by the containment relation. For example, countries contain states, states contains cities and so on. Some geospatial data are of the similar spatial granularity (e.g., two cities that are nearby each other, or two countries that are neighboring each other). To support the knowledge representation of these relationships, the ontology introduced three new properties: childreanFeatures, nearbyFeatures and neighbouringFeatures.
In the Semantic Web, both ontology concepts and physical web documents are linked by URI. Sometimes in applications, it’s useful to make clear whether the use of a URI is intended for linking documents or for linking ontology concepts. The new Geonames ontology introduced a URI convention for identifying the intended usage of a URI. This convention also simplifies the discovering of geospatial data using Geonames web services.
Here is an example:
Other interesting ontology properties include wikipediaArticle and locationMap. The former links a Feature instance to a Web article on Wikipedia, and the latter links a Feature instance to a digital map Web page.
For additional information about Geonames ontology v1.2, see Marc’s post at the Geonames blog.
"(Via Geospatial Semantic Web Blog.)
]]>A declarative language adapted from SPARQL's graph pattern language (N3/Turtle) for mapping SQL Data to RDF Ontologies. We currently refer to this as a Graph Pattern based RDF VIEW Definition Language.
It provides an effective mechanism for exposing existing SQL Data as virtual RDF Data Sets (Graphs) negating the data duplication associated with generating physical RDF Graphs from SQL Data en route to persistence in a dedicated Triple Store.
Enterprise applications (traditional and web based) and most Web Applications (Web 1.0 and Web 2.0) sit atop relational databases, implying that SQL/RDF model and data integration is an essential element of the burgeoning "Data Web" (Semantic Web - Layer 1) comprehension and adoption process.
In a nutshell, this is a quick route for non disruptive exposure of existing SQL Data to SPARQL supporting RDF Tools and Development Environments.
CREATE GRAPH IRI("http://myopenlink.net/dataspace")
CREATE IRI CLASS odsWeblog:feed_iri "http://myopenlink.net/dataspace/kidehen/weblog/MyFeeds" ( in memb varchar not null, in inst varchar not null)
Hopefully, I can expand further :-)
]]>For additional clarity re. my comments above, you can also look at the SPARQL & SIOC Usecase samples document for our OpenLink Data Spaces platform. Bottom line, the Semantic Web and SPARQL aren't BORING. In fact, quite the contrary, since they are essential ingredients of a more powerful Web than the one we work with today!
Enjoy the rest of John's post:
]]>Creating connections between discussion clouds with SIOC:
(Extract from our forthcoming BlogTalk paper about browsers for SIOC.)
SIOC provides a unified vocabulary for content and interaction description: a semantic layer that can co-exist with existing discussion platforms. Using SIOC, various linkages are created between the aforementioned concepts, which allow new methods of accessing this linked data, including:
- Virtual Forums. These may be a gathering of posts or threads which are distributed across discussion platforms, for example, where a user has found posts from a number of blogs that can be associated with a particular category of interest, or an agent identifies relevant posts across a certain timeframe.
- Distributed Conversations. Trackbacks are commonly used to link blog posts to previous posts on a related topic. By creating links in both directions, not only across blogs but across all types of internet discussions, conversations can be followed regardless of what point or URI fragment a browser enters at.
- Unified Communities. Apart from creating a web page with a number of relevant links to the blogs or forums or people involved in a particular community, there is no standard way to define what makes up an online community (apart from grouping the people who are members of that community using FOAF or OPML). SIOC allows one to simply define what objects are constituent parts of a community, or to say to what community an object belongs (using sioc:has_part / part_of): users, groups, forums, blogs, etc.
- Shared Topics. Technorati (a search engine for blogs) and BoardTracker (for bulletin boards) have been leveraging the free-text tags that people associate with their posts for some time now. SIOC allows the definition of such tags (using the subject property), but also enables hierarchial or non-hierarchial topic definition of posts using sioc:topic when a topic is ambiguous or more information on a topic is required. Combining with other Semantic Web vocabularies, tags and topics can be further described using the SKOS organisation system.
- One Person, Many User Accounts. SIOC also aims to help the issue of multiple identities by allowing users to define that they hold other accounts or that their accounts belong to a particular personal identity (via foaf:holdsOnlineAccount or sioc:account_of). Therefore, all the posts or comments made by a particular person using their various associated user accounts across platforms could be identified.
A phase in the evolution web usage patterns that emphasizes Web Services based interaction between âWeb Usersâ and âPoints of Web Presenceâ over traditional âWeb Usersâ and âWeb Sitesâ based interaction. Basically, a transition from visual site interaction to presence based interaction.
BTW - Dare Obasanjo also commented about Web usage patterns in his post titled: The Two Webs. Where he concluded that we had a dichotomy along the lines of: HTTP-for-APIs (2.0) and HTTP-for-Browsers (1.0). Which Jon Udell evolved into: HTTP-Services-Web and HTTP-Intereactive-Web during our recent podcast conversation.
With definitions in place, I will resume my quest to unveil the aforementioned Web 2.0 Data Access Conundrum:
As you can see from the above, Open Data access isn't genuinely compatible with Web 2.0.
We can also look at the same issue by way of the popular M-V-C (Model View Controller) pattern. Web 2.0 is all about the âVâ and âCâ with a modicum of âMâ at best (data access, open data access, and flexible open data access are completely separate things). The âCâ items represent application logic exposed by SOAP or REST style web services etc. I'll return to this later in this post.
What about Social Networking you must be thinking? Isn't this a Web 2.0 manifestation? Not at all (IMHO). The Web was developed / invented by Tim Berners-Lee to leverage the âNetwork Effectsâ potential of the Internet for connecting People and Data. Social Networking on the other hand, is simply one of several ways by which construct network connections. I am sure we all accept the fact that connections are built for many other reasons beyond social interaction. That said, we also know that through social interactions we actually develop some of our most valuable relationships (we are social creatures after-all).
The Web 2.0 Open Data Access impedance reality is ultimately going to be the greatest piece of tutorial and usecase material for the Semantic Web. I take this position because it is human nature to seek Freedom (in unadulterated form) which implies the following:
Web 2.0 by definition and use case scenarios is inherently incompatible with the above due to the lack of Flexible and Open Data Access.
If we take the definition of Web 2.0 (above) and rework it with an appreciation Flexible and Open Data Access you would arrive at something like this:
A phase in the evolution of the web that emphasizes interaction between âWeb Usersâ and âWeb Dataâ facilitated by Web Services based APIs and an Open & Flexible Data Access Model â.
In more succinct form:
A pervasive network of people connected by data or data connected by people.
Returning to M-V-C and looking at the definition above, you now have a complete of âMâ which is enigmatic in Web 2.0 and the essence of the Semantic Web (Data and Context).
To make all of this possible a palatable Data Model is required. The model of choice is the Graph based RDF Data Model - not to be mistaken for the RDF/XML serialization which is just that, a data serialization that conforms to the aforementioned RDF data model.
The Enterprise Challenge
Web 2.0 cannot and will not make valuable inroads into the the enterprise because enterprises live and die by their ability to exploit data. Weblogs, Wikis, Shared Bookmarking Systems, and other Web 2.0 distributed collaborative applications profiles are only valuable if the data is available to the enterprise for meshing (not mashing).
A good example of how enterprises will exploit data by leveraging networks of people and data (social networks in this case) is shown in this nice presentation by Accenture's Institute for High Performance Business titled: Visualizing Organizational Change.
Web 2.0 commentators (for the most part) continue to ponder the use of Web 2.0 within the enterprise while forgetting the congruency between enterprise agility and exploitation of people & data networks (The very issue emphasized in this original Web vision document by Tim Berners-Lee). Even worse, they remain challenged or spooked by the Semantic Web vision because they do not understand that Web 2.0 is fundamentally a Semantic Web precursor due to Open Data Access challenges. Web 2.0 is one of the greatest demonstrations of why we need the Semantic Web at the current time.
Finally, juxtapose the items below and you may even get a clearer view of what I am an attempting to convey about the virtues of Open Data Access and the inflective role it plays as we move beyond Web 2.0:
Information Management Proposal - Tim Berners-Lee
Visualizing Organizational Change - Accenture Institute of High Performance Business
This is fundamentally an animation demonstrating Semantic Web exploitation in the classic: picture speaks a thousand words manner. It also illustrates (yet again) the important Data Space(s) aspect of creating Semantic Web presence.
Finally, the Web 2.0 usage pattern tries to espouse what's demonstrated in this animation via data-context-challenged interactions (due to its "Walled Garden" and "Data Silo" approach to Data Access etc..). The Semantic Web (as per numerous posts on the subject) on the other hand achieves this via data-context-aware interactions (as will be exemplified via meshups).
]]>One of the great things about the moderate “open data access” that we have today (courtesy of the blogosphere) is the fact that you can observe the crystallization of new thinking, and/or new appreciation of emerging ideas, in near real-time. Of course, when we really hit the tracks with the Semantic Web this will be in “conditional real-time” (i.e. you choose and control your scope and sensitivity to data changes etc..).
For instance, by way of feed subscriptions, I stumbled upon a series of posts by Jason Kolb that basically articulate what I (and others who believe in the Semantic Web vision) have been attempting to convey in a myriad of ways via posts and commentary etc..
Here are the links to the 4 part series by Jason:
A phase in the evolution web usage patterns that emphasizes Web Services based interaction between âWeb Usersâ and âPoints of Web Presenceâ over traditional âWeb Usersâ and âWeb Sitesâ based interaction. Basically, a transition from visual site interaction to presence based interaction.
BTW - Dare Obasanjo also commented about Web usage patterns in his post titled: The Two Webs. Where he concluded that we had a dichotomy along the lines of: HTTP-for-APIs (2.0) and HTTP-for-Browsers (1.0). Which Jon Udell evolved into: HTTP-Services-Web and HTTP-Intereactive-Web during our recent podcast conversation.
With definitions in place, I will resume my quest to unveil the aforementioned Web 2.0 Data Access Conundrum:
As you can see from the above, Open Data access isn't genuinely compatible with Web 2.0.
We can also look at the same issue by way of the popular M-V-C (Model View Controller) pattern. Web 2.0 is all about the âVâ and âCâ with a modicum of âMâ at best (data access, open data access, and flexible open data access are completely separate things). The âCâ items represent application logic exposed by SOAP or REST style web services etc. I'll return to this later in this post.
What about Social Networking you must be thinking? Isn't this a Web 2.0 manifestation? Not at all (IMHO). The Web was developed / invented by Tim Berners-Lee to leverage the âNetwork Effectsâ potential of the Internet for connecting People and Data. Social Networking on the other hand, is simply one of several ways by which construct network connections. I am sure we all accept the fact that connections are built for many other reasons beyond social interaction. That said, we also know that through social interactions we actually develop some of our most valuable relationships (we are social creatures after-all).
The Web 2.0 Open Data Access impedance reality is ultimately going to be the greatest piece of tutorial and usecase material for the Semantic Web. I take this position because it is human nature to seek Freedom (in unadulterated form) which implies the following:
Web 2.0 by definition and use case scenarios is inherently incompatible with the above due to the lack of Flexible and Open Data Access.
If we take the definition of Web 2.0 (above) and rework it with an appreciation Flexible and Open Data Access you would arrive at something like this:
A phase in the evolution of the web that emphasizes interaction between âWeb Usersâ and âWeb Dataâ facilitated by Web Services based APIs and an Open & Flexible Data Access Model â.
In more succinct form:
A pervasive network of people connected by data or data connected by people.
Returning to M-V-C and looking at the definition above, you now have a complete of âMâ which is enigmatic in Web 2.0 and the essence of the Semantic Web (Data and Context).
To make all of this possible a palatable Data Model is required. The model of choice is the Graph based RDF Data Model - not to be mistaken for the RDF/XML serialization which is just that, a data serialization that conforms to the aforementioned RDF data model.
The Enterprise Challenge
Web 2.0 cannot and will not make valuable inroads into the the enterprise because enterprises live and die by their ability to exploit data. Weblogs, Wikis, Shared Bookmarking Systems, and other Web 2.0 distributed collaborative applications profiles are only valuable if the data is available to the enterprise for meshing (not mashing).
A good example of how enterprises will exploit data by leveraging networks of people and data (social networks in this case) is shown in this nice presentation by Accenture's Institute for High Performance Business titled: Visualizing Organizational Change.
Web 2.0 commentators (for the most part) continue to ponder the use of Web 2.0 within the enterprise while forgetting the congruency between enterprise agility and exploitation of people & data networks (The very issue emphasized in this original Web vision document by Tim Berners-Lee). Even worse, they remain challenged or spooked by the Semantic Web vision because they do not understand that Web 2.0 is fundamentally a Semantic Web precursor due to Open Data Access challenges. Web 2.0 is one of the greatest demonstrations of why we need the Semantic Web at the current time.
Finally, juxtapose the items below and you may even get a clearer view of what I am an attempting to convey about the virtues of Open Data Access and the inflective role it plays as we move beyond Web 2.0:
Information Management Proposal - Tim Berners-Lee
Visualizing Organizational Change - Accenture Institute of High Performance Business
Continuing from our recent Podcast conversation, Jon Udell sheds further insight into the essence of our conversation via a âStrategic Developerâ column article titled: Accessing the web of databases.
Below, I present an initial dump of a DataSpace FAQ below that hopefully sheds light on the DataSpace vision espoused during my podcast conversation with Jon.
What is a DataSpace?
A moniker for Web-accessible atomic containers that manage and expose Data, Information, Services, Processes, and Knowledge.
What would you typically find in a Data Space? Examples include:
How do Data Spaces and Databases differ?
Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled.
How do Data Spaces and Content Management Systems differ?
Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity.
How do Data Spaces and Knowledgebases differ?
A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models.
What Architectural Components make up a Data Space?
Where can I see a DataSpace along the lines described, in action?
Just look at my blog, and take the journey as follows:
What about other Data Spaces?
There are several and I will attempt to categorize along the lines of query method available:
Type 1 (Free Text Search over HTTP):
Google, MSN, Yahoo!, Amazon, eBay, and most Web 2.0 plays .
Type 2 (Free Text Search and XQuery/XPath over HTTP)
A few blogs and Wikis (Jon Udell's and a few others)
What About Data Space aware tools?
]]>
BTW - I am just as perplexed as Dare about Paul Graham being blind-sided by the integration of Calendaring and Email by Google.
]]>Paul Graham was Surprised by Google Calendar?: "
I was just reading Paul Graham's post entitled The Kiko Affair which talks about the recent failure of Kiko, an AJAX web-calendaring application. I was quite surprised to see the following sentence in Paul Graham's post
The killer, unforseen by the Kikos and by us, was Google Calendar's integration with Gmail. The Kikos can't very well write their own Gmail to compete.Integrating a calendaring application with an email application seems pretty obvious to me especially since the most popular usage of calendaring applications is using Outlook/Exchange to schedule meetings in corporate environments. What's surprising to me is how surprised people are that an idea that failed in 1990s will turn out any differently now because you sprinkle the AJAX magic pixie dust on it.
Kiko was a feature, not a full-fledged online destination let alone a viable business. There'll be a lot more entrants into the TechCrunch deadpool that are features masquerading as companies before the 'Web 2.0' hype cycle runs its course.
"
OAT offers a broad Javascript-based, browser-independent widget set
for building data source independent rich internet applications that are usable across a broad range of Ajax-capable web browsers.
OAT's support binding to the following data sources via its Ajax Database Connectivity Layer:
SQL Data via XML for Analysis (XMLA)
Web Data via SPARQL, GData, and OpenSearch Query Services
Web Services specific Data via service specific binding to SOAP and REST style web services
The toolkit includes a collection of powerful rich internet application prototypes include: SQL Query By Example, Visual Database Modeling, and Data bound Web Form Designer.
Project homepage on sourceforge.net:
http://sourceforge.net/projects/oat
Source Code:
http://sourceforge.net/projects/oat/files
Live demonstration:
http://www.openlinksw.com/oat/
]]>
Goggle vs Semantic Web: "Google exec challenges Berners-Lee 'At the end of the keynote, however, things took a different turn. Google Director of Search and AAAI Fellow Peter Norvig was the first to the microphone during the Q&A session, and he took the opportunity to raise a few points.
'What I get a lot is: 'Why are you against the Semantic Web?' I am not against the Semantic Web. But from Google's point of view, there are a few things you need to overcome, incompetence being the first,' Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user.'
Related: Google Base -- summing up."
(Via More News.)
When will we drop the ill conceived notion that end-users are incompetent?
Has it every occurred to software developers and technology vendors that incompetent, dumb, and other contemptuous end-user adjectives simply reflect the inability of most technology products to surmount end-user "Interest Activation Thresholds"?
Interest Activation Threshold (IAT)? What's That?
I have a fundamental personal belief that all human beings are intelligent. Our ability to demonstrate intelligence, or be perceived as intelligent, is directly proportional to our interest level in a given context. In short, we have "Ambivalence Quotients" (AQs) just as we have "Intelligence Quotients" (IQs).
An interested human being is an inherently intelligent entity. The abstract nature of human intelligence also makes locating the IQ and AQ on/off buttons a mercurial quest at the best of times.
Technology end-users exhibit high AQs, most of the time due to the inability of most technology products to truly engage, and ultimately stimulate genuine interest, by surmounting IAT and reducing AQ.
Ironically, when a technology vendor is lagging behind its competitors in the "features arms race" it is common place to use the familiar excuse: "our end-users aren't asking for this feature".
Note To Google:
Ambivalence isn't incompetence. If end-users were genuinely incompetent, how is that they run rings around your page rank algorithms by producing google-friendly content at the expense of valuable context? What about the deteriorating value of Adsense due to click fraud? Likewise, the continued erosion of the value of your once exemplary "keyword based search" service? As we all know, necessity is the mother of invention, so when users develop high AQs because there is nothing better, we end up with a forced breech of "IAT"; which is why the issues that I mention remain long term challenges for you. Ironically, the so called "incompetents" are already outsmarting you, and you don't seem to comprehend this reality or its inevitable consequences.
Finally, how you are going to improve value without integrating the Semantic Web vision into your R&D roadmap? I can tell you categorically that you have little or no wiggle room re. this matter, especially if you want to remain true to your: "don't be evil" mantra. My guess is that you will incorporate Semantic Web technologies sooner rather than later (Google Co-op is a big clue). I would even go as far as predicting a Google hosted SPARQL Query Endpoint alongside your GData endpints during the next 6-12 months (if even that long). I believe that your GData protocol (like the rest of Web 2.0) will ultimately accelerate your appreciation of the data model dexterity that RDF brings to loosely coupled knowledge networks espoused by the Semantic Web vision.
Google & Semantic Web Paradox
The Semantic Web vision has the RDF graph data model at its core (and for good reason), but even more confusing for me, as I process Google sentiments about the Semantic Web, is the fact that RDF's actual creator (Ramanathan Guha aka. Guha) currently works at Google. There's a strange disconnect here IMHO.
If I recall correctly, Google wants to organize the worlds data and information, leaving the knowledge organization to someone else which is absolutely fine. What is increasingly irksome, is the current tendency to use corporate stature to generate Fear, Uncertainty, and Doubt when the subject matter is the "Semantic Web".
BTW - I've just read Frederick Giasson's perspective on the Google Semantic Web paradox which ultimately leads to the same conclusions regarding Google's FUD stance when dealing with matters relating to the Semantic Web.
I wonder if anyone is tracking the google hits for "fud google semantic web"?
]]>I shopped for everything except food on eBay. When working with foreign-language documents, I used translations from Babel Fish. (This worked only so well. After a Babel Fish round-trip through Italian, the preceding sentence reads, 'That one has only worked therefore well.') Why use up space storing files on my own hard drive when, thanks to certain free utilities, I can store them on Gmail's servers? I saved, sorted, and browsed photos I uploaded to Flickr. I used Skype for my phone calls, decided on books using Amazon's recommendations rather than 'expert' reviews, killed time with videos at YouTube, and listened to music through customizable sites like Pandora and Musicmatch. I kept my schedule on Google Calendar, my to-do list on Voo2do, and my outlines on iOutliner. I voyeured my neighborhood's home values via Zillow. I even used an online service for each stage of the production of this article, culminating in my typing right now in Writely rather than Word. (Being only so confident that Writely wouldn't somehow lose my work -- or as Babel Fish might put it, 'only confident therefore' -- I backed it up into Gmail files.Interesting article, Tim O'Reilly's response is here"
(Via Valentin Zacharias (Student).)
Tim O'Reilly's response provides the following hierarchy for Web 2.0 based on The what he calls: "Web 2.0-ness":
level 3: The application could ONLY exist on the net, and draws its essential power from the network and the connections it makes possible between people or applications. These are applications that harness network effects to get better the more people use them. EBay, craigslist, Wikipedia, del.icio.us, Skype, (and yes, Dodgeball) meet this test. They are fundamentally driven by shared online activity. The web itself has this character, which Google and other search engines have then leveraged. (You can search on the desktop, but without link activity, many of the techniques that make web search work so well are not available to you.) Web crawling is one of the fundamental Web 2.0 activities, and search applications like Adsense for Content also clearly have Web 2.0 at their heart. I had a conversation with Eric Schmidt, the CEO of Google, the other day, and he summed up his philosophy and strategy as "Don't fight the internet." In the hierarchy of web 2.0 applications, the highest level is to embrace the network, to understand what creates network effects, and then to harness them in everything you do.
Level 2: The application could exist offline, but it is uniquely advantaged by being online. Flickr is a great example. You can have a local photo management application (like iPhoto) but the application gains remarkable power by leveraging an online community. In fact, the shared photo database, the online community, and the artifacts it creates (like the tag database) is central to what distinguishes Flickr from its offline counterparts. And its fuller embrace of the internet (for example, that the default state of uploaded photos is "public") is what distinguishes it from its online predecessors.
Level 1: The application can and does exist successfully offline, but it gains additional features by being online. Writely is a great example. If you want to do collaborative editing, its online component is terrific, but if you want to write alone, as Fallows did, it gives you little benefit (other than availability from computers other than your own.)
Level 0: The application has primarily taken hold online, but it would work just as well offline if you had all the data in a local cache. MapQuest, Yahoo! Local, and Google Maps are all in this category (but mashups like housingmaps.com are at Level 3.) To the extent that online mapping applications harness user contributions, they jump to Level 2.
So, in a sense we have near conclusive confirmation that Web 2.0 is simply about APIs (typically service specific Data Silos or Walled-gardens) with little concern, understanding, or interest in truly open data access across the burgeoning "Web of Databases". Or the Web of "Databases and Programs" that I prefer to describe as "Data Spaces"
Thus, we can truly begin to conclude that Web 3.0 (Data Web) is the addition of Flexible and Open Data Access to Web 2.0; where the Open Data Access is achieved by leveraging Semantic Web deliverables such as the RDF Data Model and the SPARQL Query Language :-)
]]>Standards as social contracts: "Looking at Dave Winer's efforts in evangelizing OPML, I try to draw some rough lines into what makes a de-facto standard. De Facto standards are made and seldom happen on their own. In this entry, I look back at the history of HTML, RSS, the open source movement and try to draw some lines as to what makes a standard.
"(Via Tristan Louis.)
I posted a comment to the Tristan Louis' post along the following lines:
Analysis is spot on re. the link between de facto standardization and bootstrapping. Likewise, the clear linkage between boostrapping and connected communities (a variation of the social networking paradigm).
Dave built a community around a XML content syndication and subscription usecase demo that we know today as the blogosphere. Superficially, one may conclude that Semantic Web vision has suffered to date from a lack a similar bootstrap effort. Whereas in reality, we are dealing with "time and context" issues that are critical to the base understanding upon which a "Dave Winer" style bootstrap for the Semantic Web would occur.
Personally, I see the emergence of Web 2.0 (esp. the mashups phenomenon) as the "time and context" seeds from which the Semantic Web bootstrap will sprout. I see shared ontologies such as FOAF and SIOC leading the way (they are the RSS 2.0's of the Semantic Web IMHO).
]]>This particular inflection and, ultimately, transistion is going to occur at Warp Speed!
]]>"We all know that structured data is boring and useless; while unstructured data is sexy and chock full of value. Well, only up to a point, Lord Copper. Genuinely unstructured data can be a real nuisance - imagine extracting the return address from an unstructured letter, without letterhead and any of the formatting usually applied to letters. A letter may be thought of as unstructured data, but most business letters are, in fact, highly-structured." ....Duncan Pauly, founder and chief technology officer of Coppereye add's eloquent insight to the conversation:
"The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure:* The structure of the data itself.
* The structure of the container that hosts the data.
* The structure of the access method used to access the data.
These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."
Data understanding and appreciation is dwindling at a time when the reverse should be happening. We are supposed to be in the throws of the "Information Age", but for some reason this appears to have no correlation with data and "data access" in the minds of many -- as reflected in the broad contradictory positions taken re. unstructured data vs structured data, structured is boring and useless while unstructured is useful and sexy....
The difference between "Structured Containers" and "Structured Data" are clearly misunderstood by most (an unfortunate fact).
For instance all DBMS products are "Structured Containers" aligned to one or more data models (typically one). These products have been limited by proprietary data access APIs and underlying data model specificity when used in the "Open-world" model that is at the core of the World Wide Web. This confusion also carries over to the misconception that Web 2.0 and the Semantic/Data Web are mutually exclusive.
But things are changing fast, and the concept of multi-model DBMS products is beginning to crystalize. On our part, we have finally released the long promised "OpenLink Data Spaces" application layer that has been developed using our Virtuoso Universal Server. We have structured unified storage containment exposed to the data web cloud via endpoints for querying or accessing data using a variety of mechanisms that include; GData, OpenSearch, SPARQL, XQuery/XPath, SQL etc..
To be continued....
]]>Despite page ranking and other techniques, the scale of the Internet is straining available commercial search engines to deliver truly relevant content.' This observation is not new, but its relevance is growing.' Similarly, the integration and interoperabillity challenges facing enterprises have never been greater.' One approach to address these needs, among others, is to adopt semantic Web standards and technologies.
The image is compelling:' targeted and unambiguous information from all relevant sources, served in usable bit-sized chunks.' It sounds great; why isn’t it happening?
There are clues — actually, reasons — why semantic Web technology is not being embraced on a broad-scale way.' I have spoken elsewhere as to why enterprises or specific organizations will be the initial adopters and promoters of these technologies.' I still believe that to be the case.' The complexity and lack of a network effect ensure that semantic Web stuff will not initially arise from the public Internet.
Parellels with Knowledge Management
]]>A pre-print from Tim Finin and Li Deng entitled, Search Engines for Semantic Web Knowledge,1 presents a thoughtful and experienced overview of the challenges posed to conventional search by semantic Web constructs.' The authors’ base much of their observations on their experience with the Swoogle semantic Web search engine over the past two years.' They also used Swoogle, whose index contains information on over 1.3M RDF documents, to generate statistics on the semantic Web size and growth in the paper.
Among other points, the authors note these key differences and challenges from conventional search engines:
The authors particularly note the challenge of indexing as repositories grow to actual Internet scales.
Though not noted, I would add to this list the challenge of user interfaces. Only a small percentage of users, for example, use Google’s more complicated advanced search form.' In its full-blown implementation, semantic Web search variations could make the advanced Google form look like child’s play.
'
1Tim Finin and Li Ding, 'Search Engines for Semantic Web Knowledge,' a pre-print to be published in the Proceedings of XTech 2006: Building Web 2.0, May 16, 2006, 19 pp.' A PDF of the paper is available for download.
" ]]>]]>Two graphs that explain most IT dysfunction (Part I): "
Inspired by reading about other peopleâs blogging weaknesses, Iâve decided to finally get this one off the back burner and post it. Iâm pretty sure that this isnât original, but I started thinking about this way back in 1996 (pre-social-bookmarking) and Iâve lost my pointer to whatever influenced it. Anybody who can set me straight- Iâd appreciate it.
So here goes.
There are two graphs which, when seen together, explain a hell of a lot about various forms of dysfunction that you see in the technology world.
In this first graph, X represents relative âtechnical expertiseâ and Y represents the âperceived benefitâ in the introduction of a new technology:
The summary is that technical neophytes (A) tend to see high potential benefit in new technologies, while people who have a bit of technology experience (B) grow increasingly cynical about technology claims and can rattle-off the names of technologies that they have seen over-hyped and that have under-delivered. The interesting thing though, is that, as people become really expert in technology (C), their view of the potential benefits in new technology starts to increase again. At the far right of this scale Iâm talking about the real experts- the alpha-geeks of the world.
In the second graph, X again represents technical expertise, but Y represents âperceived riskâ associated with the introduction of a new technology:
Here the curve is inverted, but the basic pattern is the same. The neophytes (A) are blissfully unaware of the things that can go wrong with the introduction of a new technology. The tech-savvy (B) are battle-scarred and have seen (and possibly caused) countless disasters. The alpha-geeks (C) have also seen their share of problems, but they have also learned from their mistakes and know how to avoid them in the future. The alpha-geeks understand how to manage the risk.
Now things get interesting when you map these two dynamics against each other:
You see that neophytes in group A have essentially the same world view as the alpha-geeks in group C, but for completely different reasons. The trouble starts when you realize that most of senior executives, venture capitalists and members of the popular press are in group A. At the other extreme, most R&D groups, architecture groups, independent consultancies, technology pundits, etc. are in group C . There are a few problems with this:
- People in group A will often talk to and solicit advice from people in group C
- There are relatively few people in group C
- Most of the people who actually have to implement new technologies are in group B.
So you can start to see the problem.
In Part II Iâl talk some more about group B and Iâll discuss some of the classic patterns that emerge when A, B and C try to work with each other.
"
These are my notes from the session eBay Web Services: A Marketplace Platform for Fun and Profit by Adam Trachtenberg.
This session was about the eBay developer program. The talk started by going over the business models for 'Web 2.0' startups. Adam Trachtenberg surmised that so far only two viable models have shown up (i) get bought by Yahoo! and (ii) put a lot of Google AdSense ads on your site. The purpose of the talk was to introduce a third option, making money by integrating with eBay's APIs.
Adam Trachtenberg went on to talk about the differences between providing information and providing services. Information is read-only while services are read/write. Services have value because they encourage an 'architecture of participation'.
eBay is a global, online marketplace that facilitates the exchange of goods. The site started off as being a place to purchase used collectibles but now has grown to encompass old and new items, auctions and fixed price sales (fixed price sales are now a third of their sales) and even sales of used cars. There are currently 78 million items being listed at any given time on eBay.
As eBay has grown more popular they have come to realize that one size doesn't fit all when it comes to the website. It has to be customized to support different languages and markets as well as running on devices other the PC. Additionally, they discovered that some companies had started screen scraping their site to give an optimized user experience for some power users. Given how fragile screen scraping is the eBay team decided to provide a SOAP API that would be more stable and performant for them than having people screen scrape the website.
The API has grown to over 100 methods and about 43% of the items on the website are added via the SOAP API. The API enables one to build user experiences for eBay outside the web browser such as integration with cell phones, Microsoft Office, gadgets & widgets, etc. The API has an affiliate program so developers can make money for purchases that happen through the API. An example of the kind of mashup one can build to make money from the eBay API is https://www.dudewheresmyusedcar.com. Another example of a mashup that can be used to make money using the eBay API is http://www.ctxbay.com which provides contextual eBay ads for web publishers.
The aforementioned sites are just a few examples of the kinds of mashups that can be built with the eBay API. Since the API enables buying and listing of items for sale as well as obtaining inventory data from the service, one can build a very diverse set of applications.
" ]]>There has been lot of discussion about what Web 2.0 really is, so we thought weâd use the power of Web 2.0 itself to come up with the answer, and here it is:
42.
Just kidding. What we actually did was take a look at all the tag data going back to February 2004 (the month of the first use of Web 2.0 as a tag on del.icio.us), and analyzed all the bookmarks and tags related to the term. We can report that as of October 31, 2005 there have been over 230,000 separate bookmarks and over 7,000 unique tags associated with the term âWeb 2.0â by del.icio.us users. So for this exercise, we lopped off the really long tail and normalized some similar terms (e.g. combining blog, blogs, and blogging), and came up with this snapshot of what Web 2.0 REALLY is â at least according to del.icio.us users' most popular tags through the end of October 2005:
ajax | 9.9% |
blog | 6.1% |
social | 4.2% |
tools | 4.1% |
software | 3.3% |
tagging | 3.3% |
javascript | 2.8% |
internet | 2.6% |
programming | 2.5% |
rss | 2.5% |
Other notable tags included rubyonrails (1.8%), del.icio.us (1.6%), folksonomy (1.4%), community (1.1%), wiki (.9%), flickr (.8%), free (.7%), trends (.6%), flock (.4%) and googlemaps (.3%).
So there you have it - interesting, but it still seems to fall short of a definitive answer. Maybe the blinding flash of the obvious is that Web 2.0 is best defined as arguing about what Web 2.0 is really about.
(Via del.icio.us.)
webservicesweb2.0web20ajax]]>]]>Vinod Khosla: Web 2.0 2005: "Web 2.0 enriches online user experience by facilitating collaboration, participation, and communication. This is exciting investors once more and new Web 2.0 startups are finding it easy to get funding from venture capitalists. Although Vinod Khosla is a venture capitalist himself, he warns startups to learn the lessons of the failures of Web 1.0 companies and to use the money they raise judiciously and to remain creative rather than become comfortable with a business plan. [Web 2.0 audio from IT Conversations]"
(Via IT Conversations.)
(Via ProgrammableWeb.com.)
]]>Ajax-S: Ajaxian slideshow software: "The idea came to me because I wanted a lightweight slideshow based on HTML, CSS and JavaScript, but I also wanted to separate the data of each page from the actual code that presents it. Therefore, I decided to move the data into an XML file and then use AJAX to retrieve it. The name AJAX-S is short for AJAX-Slides (or Asynchronous JavaScript and XML Slides, if you want to)."
(Via Ajaxian Blog.)
AJAX is clearly illuminating one of my pet issues: Separation of Application/Service Logic and Data. Even better, the concept of XML instance data is gradually getting much clearer. AJAX has created context for validating the concept of browser hosted Rich Internet Applications (RIA).
AJAX has become a widely accepted framework for the InternetOS that facilitates Rich Internet Application development using Web 2.0 (and beyond) APIs.
]]>"Ok, my first attempt at a round-up (in response to Philâs observation of Planetary damage). Thanks to the conference thereâs loads more here than thereâs likely to be subsequent weeks, although itâs still only a fairly random sample and some of the links here are to heaps of other resourcesâ¦
Incidentally, if anyoneâs got a list/links for SemWeb-related blogs that arenât on Planet RDF, Iâd be grateful for a pointer. PS. Ok, I forget⦠are there any blogs that arenât on Daveâs list yet..?
Quote of the week:
In the Semantic Web, it is not the Semantic which is new, it is the Web which is new.
- Chris Welty, IBM (lifted from TimBLâs slides)
I just noticed the article from Dan Zambonini âIs Web 2.0 killing the Semantic Web?â. From my perspective the article shows a misconception that people seems to have around the Semantic Web: the Semantic Web effort itself is not provide applications (like the Web 2.0 meme indicates) - it rather provides standards to interlink applications.
Blog post title of the week:
Alsoâ¦a new threat to Semantic Web developers has been discovered: typhoid!, and the key to the Webâs full potential isâ¦Tetris."
]]>Solutions to allow XMLHttpRequest to talk to external services: "
Over on XML.com they published Fixing AJAX: XmlHttpRequest Considered Harmful.
This article discusses a few ways to get around the security constraints that we have to live with in the browsers theses days, in particular, only being able to talk to your domain via XHR.
The article walks you through three potential solutions:
XMLHttpRequest
s from users, makes the web service call, and sends the data back to users.XMLHttpRequest
s can be invisibly re-routed from your server to the target web service domain.XMLHttpRequest
at all). Use the HTML script
tag to make a request to an application proxy (see #1 above) that returns your data wrapped in JavaScript. This approach is also known as On-Demand JavaScript.I can't wait for Trusted Relationships within the browser - server infrastructure.
With respect to Apache proxies, these things are priceless. I recently talked about them in relation to Migrating data centers with zero downtime.
What do you guys think about this general issue? Have you come up with any interesting solutions? Any ideas on how we can keep security, yet give us the freedom that we want?
(Via Ajaxian Blog.)
Well here is what I think (actually know):
Our Virtuoso Universal Server has been sitting waiting to deliver this for years (for the record see the Virtuoso 2000 Press Release). Virtuoso can proxy for disparate data sources and expose disparate data as Well-Formed XML using an array of vocabularies (you experience this SQL-XML integration on the fly every time you interact with various elements of my public blog).
Virtuoso has always been able to expose Application Logic as SOAP and/or RESTful/RESTian style XML Web Services. This blog's search page is a simple demo of this capability.
Virtuoso is basically a Junction Box / Aggregator / Proxy for disparate Data, Applications, Services, and BPEL compliant business processes. AJAX clients talk to this single multi-purpose server which basically acts as a conduit to content/data, services, and processes (which are composite services).
BTW - there is a lot more, but for now, thou shall have to seek in order to find :-)
]]>Answer: NO.
Comparing Web 2.0 to Windows is like comparing Apples and Oranges!
The Internet displaced Windows (long time ago!). The effect of this reality is simply working its way through Geoffrey Moore's Bell Curve - in "left to right" fashion. By the way, there isn't a single thing Microsoft can do about this beyond accepting this reality and gearing itself up to compete as best it can in this new reality.
The Internet is the Operating System for the New Computer - aptly coined: "The Network" by Sun years ago (unfortunately a blind preoccupation with Java has completely obscured Sun's fundamental vision regarding this matter).
Web 2.0 provides the Windows API equivalent for the InternetOS.
The real message in today's well publicized memos from Bill and Ray is a realization on the part of Microsoft that they can no longer bet the house on Windows; Integrated Innovation will no longer imply: covert ways of locking unsuspecting customers and partners into Windows. In short, Microsoft is wrestling with its Local Max. ]]>As you can see Web 2.0 and the Semantic Web are mutually inclusive paradigms as reemphasized via this additional "mashup" (I don't really like the word "mashup", especially as it isn't different from/than "repurposing"?)
. ]]>Quick Example using my blog:
Digest the rest of Dare's post:
Clone the Google APIs: Kill That Noise: "
Yesterday Dave Winer wrote in a post about cloning the Google API Dave Winer wrote
Let's make the Google API an open standard. Back in 2002, Google took a bold first step to enable open architecture search engines, by creating an API that allowed developers to build applications on top of their search engine. However, there were severe limits on the capacity of these applications. So we got a good demo of what might be, now three years later, it's time for the real thing.and earlier thatIf you didn't get a chance to hear yesterday's podcast, it recommends that Microsoft clone the Google API for search, without the keys, and without the limits. When a developer's application generates a lot of traffic, buy him a plane ticket and dinner, and ask how you both can make some money off their excellent booming application of search. This is something Google can't do, because search is their cash cow. That's why Microsoft should do it. And so should Yahoo. Also, there's no doubt Google will be competing with Apple soon, so they should be also thinking about ways to devalue Google's advantage.This doesn't seem like a great idea to me for a wide variety of reasons but first, let's start with a history lesson before I tackle this specific issue
A Trip Down Memory Lane
This history lessonused to be inis in a post entitled The Tragedy of the API by Evan Williamsbut seems to be gone now. Anyway, back in the early days of blogging the folks at Pyra [which eventually got bought by Google] created the Blogger API for their service. Since Blogspot/Blogger was a popular service, a the number of applications that used the API quickly grew. At this point Dave Winer decided that since the Blogger API was so popular he should implement it in his weblogging tools but then he decided that he didn't like some aspects of it such as application keys (sound familiar?) and did without them in his version of the API. Dave Winer's version of the Blogger API became the MetaWeblog API. These APIs became de facto standards and a number of other weblogging applications implemented them.After a while, the folks at Pyra decided that their API needed to evolve due to various flaws in its design. As Diego Doval put it in his post a review of blogging APIs, The Blogger API is a joke, and a bad one at that. This lead to the creation of the Blogger API 2.0. At this point a heated debate erupted online where Dave Winer berated the Blogger folks for deviating from an industry standard. The irony of flaming a company for coming up with a v2 of their own API seemed to be lost on many of the people who participated in the debate. Eventually the Blogger API 2.0 went nowhere.
Today the blogging API world is a few de facto standards based on a hacky API created by a startup a few years ago, a number of site specific APIs (LiveJournal API, MovableType API, etc) and a number of inconsistently implemented versions of the Atom API.
On Cloning the Google Search API
To me the most salient point in the hijacking of the Blogger API from Pyra is that it didn't change the popularity of their service or even make Radio Userland (Dave Winer's product) catch up to them in popularity. This is important to note since this is Dave Winer's key argument for Microsoft cloning the Google API.Off the top of my head, here are my top three technical reasons for Microsoft to ignore the calls to clone the Google Search APIs
Difference in Feature Set: The features exposed by the API do not run the entire gamut of features that other search engines may want to expose. Thus even if you implement something that looks a lot like the Google API, you'd have to extend it to add the functionality that it doesn't provide. For example, compare the features provided by the Google API to the features provided by the Yahoo! search API. I can count about half a dozen features in the Yahoo! API that aren't in the Google API.
Difference in Technology Choice: The Google API uses SOAP. This to me is a phenomenally bad technical decision because it raises the bar to performing a basic operation (data retrieval) by using a complex technology. I much prefer Yahoo!'s approach of providing a RESTful API and
MSNWindows Live Search's approach of providing RSS search feeds and a SOAP API for the folks who need such overkill.- Unreasonable Demands: A number of Dave Winer's demands seem contradictory. He asks companies to not require application keys but then advises them to contact application developers who've built high traffic applications about revenue sharing. Exactly how are these applications to be identified without some sort of application ID? As for removing the limits on the services? I guess Dave is ignoring the fact that providing services costs money, which I seem to remember is why he sold weblogs.com to Verisign for a few million dollars. I do agree that some of the limits on existing search APIs aren't terribly useful. The Google API limit of 1000 queries a day seems to guarantee that you won't be able to power a popular application with the service.
Lack of Innovation: Copying Google sucks.
Stop whatever you are doing ...: "
.. and go and read Tom Coates' explanation of his last project with the BBC. After 21 years working in broadcasting Ireckon this is one of the coolest things to happen for a very, very long time.
The ramifications of this will go very deep indeed."
(Spotted Via The Obvious?.)
Yes, the ramifications are deep! Tom Coates' screencast demonstrates an internal variation of an activity that is taking place on many fronts (concurrently) across the NET. I tend to refer to this effort as "Self Annotation"; the very process that will ultimately take us straight to "Semantic Web". It is going to happen much quicker than anticipated because technology is taking the pain out of metadata annotation (e.g. what you do when you tag everything that is ultimately URI accessible). Technology is basically delivering what Jon Udell calls: "reducing the activation threshold".
Using my comments above for context placement, I suggest you take a look at, or re-read Jon Udell's post titled: Many Meanings of Metadata.
Once again, the Web 2.0 brouhaha (in every sense of the word) is a reaction to a critical inflection that ultimately transitions the "Semantic Web" from "Mirage" to "Nirvana". Put differently (with humor in mind solely!), Web 2.0 is what I tend to call a "John the Baptist" paradigm, and we all know what happened to him :-)
Web 2.0 is a conduit to a far more important destination. The tendency to treat Web 2.0 as a destination rather than a conduit has contributed to the recent spate of Bozo bit flipping posts all over the blogosphere (is this an attempt to behead John, metaphorically speaking?). Humor aside, a really important thing about the Web 2.0 situation is that when we make the quantum evolutionary leap (internet time, mind you) to the "Semantic Web" (or whatever groovy name we dig up for it in due course) we will certainly have a plethora of reference points (I mean Web 2.0 URIs) ensuring that we do not revisit the "Missing Link" evolutionary paradox :-)
BTW - You can see some example of my contribution to the ongoing annotation process by looking at:
"...Also today I came across the latest project of a man who wants to tear down Tim Berners-Lee's World Wide Web and replace it with his own vision. It used to be known as Xanadu, but has since morphed into Transliterature, A Humanist Design. I am of course referring to Ted Nelson, who invented the term 'hypertext' in 1965 and is generally regarded as a computing pioneer.
Ted Nelson recently wrote an essay about 'Indirect Documents', which got Slashdotted today. In the essay Nelson outlines why (in his opinion) the Xanadu project failed and he explains his new vision for Transliterature. He takes a number of potshots at Tim Berners-Lee's WWW on the way, e.g.:
'Why don't I like the web? I hate its flapping and screeching and emphasis on appearance; its paper-simulation rectangles of Valuable Real Estate, artifically created by the NCSA browser, now hired out to advertisers; its hierarchies exposed and imposed; its untyped one-way links only from inside the document. (The one-way links hidden under text were a regrettable simplification of hypertext which I assented to in '68 on the HES project. But that's another story.) Only trivial links are possible; there is nothing to support careful annotation and study; and, of course, there is no transclusion.'
Ted Nelson is certainly an original and I'm glad he's still around to throw spanners in the works. I've written about him before and I'm sure I will again, Web 2.0 or not.
"(Excerpted From: Read/Write Web.)
My thoughts on the commentary above:
There is nothing fundamentally incompatible between Ted Nelson's pursuits and future incarnation's of the Web. None whatsoever -- we are simply working our way through an process. The process in question is what I call "standards driven ubiquity" (becoming de facto at Internet Speed). Remember Sun's "The Network is the Computer" vision? Well, without a "Computer" in mind-space you can't think in terms of "Operating Systems". Thats all changing, because today we are gradually beginning to accept the imminent reality that "The Internet is the Operating System" and not Windows/UNIX/Mac OS X/Others. Ahem! And after the Operating System what comes next? I think a set of Application Programming Interfaces (APIs), and I think we know what that is (in all of its controversial glory), the very thing we refer to as Web 2.0 (the APIs for the Internet Operating System).
Note: In addition to the Computer, Operating System, and Application Programming Interfaces, we also have those frequently misunderstood and under-appreciated workhorses called "Databases" in place (but we still call them Web Sites for now). And by the way, "Internet Filesystem" has been there forever, but for some reason we can't see WebDAV in all its current and future glory (that will change very soon also!).
Ted and TBL are cool with each (whether they know it or not)! I see no mutual exclusivity in their collective visions (IMHO) :-)
]]>Anyway, Marc's article is a very refreshing read because it provides a really good insight into the general landscape of a rapidly evolving Web alongside genuine appreciation of our broader timeless pursuit of "Openness".
To really help this document provide additional value have scrapped the content of the original post and dumped it below so that we can appreciate the value of the links embedded within the article (note: thanks to Virtuoso I only had to paste the content into my blog, the extraction to my Linkblog and Blog Summary Pages are simply features of my Virtuoso based Blog Engine):
]]>Breaking the Web Wide Open! (complete story)
Even the web giants like AOL, Google, MSN, and Yahoo need to observe these open standards, or they'll risk becoming the "walled gardens" of the new web and be coolio no more.
Editorial Note: Several months ago, AlwaysOn got a personal invitation from Yahoo founder Jerry Yang "to see and give us feedback on our new social media product, y!360." We were happy to oblige and dutifully showed up, joining a conference room full of hard-core bloggers and new, new media types. The geeks gave Yahoo 360 an overwhelming thumbs down, with comments like, "So the only services I can use within this new network are Yahoo services? What if I don't use Yahoo IM?" In essence, the Yahoo team was booed for being "closed web," and we heartily agreed. With Yahoo 360, Yahoo continues building its own "walled garden" to control its 135 million customersÂan accusation also hurled at AOL in the early 1990s, before AOL migrated its private network service onto the web. As the Economist recently noted, "Yahoo, in short, has old media plans for the new-media era."
The irony to our view here is, of course, that today's AO Network is also a "closed web." In the end, Mr. Yang's thoughtful invitation and our ensuing disappointment in his new service led to the assignment of this article. It also confirmed our existing plan to completely revamp the AO Network around open standards. To tie it all together, we recruited the chief architect of our new site, the notorious Marc Canter, to pen this piece. We look forward to our reader feedback.
Breaking the Web Wide Open!
By Marc Canter
For decades, "walled gardens" of proprietary standards and content have been the strategy of dominant players in mainframe computer software, wireless telecommunications services, and the World Wide WebÂit was their successful lock-in strategy of keeping their customers theirs. But like it or not, those walls are tumbling down. Open web standards are being adopted so widely, with such value and impact, that the web giantsÂAmazon, AOL, eBay, Google, Microsoft, and YahooÂare facing the difficult decision of opening up to what they don't control.
The online world is evolving into a new open web (sometimes called the Web 2.0), which is all about being personalized and customized for each user. Not only open source software, but open standards are becoming an essential component.
Many of the web giants have been using open source software for years. Most of them use at least parts of the LAMP (Linux, Apache, MySQL, Perl/Python/PHP) stack, even if they aren't well-known for giving back to the open source community. For these incumbents that grew big on proprietary web services, the methods, practices, and applications of open source software development are difficult to fully adopt. And the next open source movementsÂwhich will be as much about open standards as about codeÂwill be a lot harder for the incumbents to exploit.
While the incumbents use cheap open source software to run their back-ends systems, their business models largely depend on proprietary software and algorithms. But our view a new slew of open software, open protocols, and open standards will confront the incumbents with the classic Innovator's Dilemma. Should they adopt these tools and standards, painfully cannibalizing their existing revenue for a new unproven concept, or should they stick with their currently lucrative model with the risk that eventually a bunch of upstarts eat their lunch?
Credit should go to several of the web giants who have been making efforts to "open up." Google, Yahoo, eBay, and Amazon all have Open APIs (Application Programming Interfaces) built into their data and systems. Any software developer can access and use them for whatever creative purposes they wish. This means that the API provider becomes an open platform for everyone to use and build on top of. This notion has expanded like wildfire throughout the blogosphere, so nowadays, Open APIs are pretty much required.
Other incumbents also have open strategies. AOL has got the RSS religion, providing a feedreader and RSS search in order to escape the "walled garden of content" stigma. Apple now incorporates podcasts, the "personal radio shows" that are latest rage in audio narrowcasting, into iTunes. Even Microsoft is supporting open standards, for example by endorsing SIP (Session Initiation Protocol) for internet telephony and conferencing over Skype's proprietary format or one of its own devising.
But new open standards and protocols are in use, under construction, or being proposed every day, pushing the envelope of where we are right now. Many of these standards are coming from startup companies and small groups of developers, not from the giants. Together with the Open APIs, those new standards will contribute to a new, open infrastructure. Tens of thousands of developers will use and improve this open infrastructure to create new kinds of web-based applications and services, to offer web users a highly personalized online experience.
A Brief History of Openness
At this point, I have to admit that I am not just a passive observer, full-time journalist or "just some blogger"Âbut an active evangelist and developer of these standards. It's the vision of "open infrastructure" that's driving my company and the reason why I'm writing this article. This article will give you some of the background behind on these standards, and what the evolution of the next generation of open standards will look like.
Starting back in the 1980s, establishing a software standard was a key strategy for any software company. My former company, MacroMind (which became Macromedia), achieved this goal early on with Director. As Director evolved into Flash, the world saw that other companies besides Microsoft, Adobe, and Apple could establish true cross-platform, independent media standards.
Then Tim Berners-Lee and Marc Andreessen came along, and changed the rules of the software business and of entrepreneurialism. No matter how entrenched and "standardized" software was, the rug could still get pulled out from under it. Netscape did it to Microsoft, and then Microsoft did it back to Netscape. The web evolved, and lots of standards evolved with it. The leading open source standards (such as the LAMP stack) became widely used alternatives to proprietary closed-source offerings.
Open standards are more than just technology. Open standards mean sharing, empowering, and community support. Someone floats a new idea (or meme) and the community runs with it â with each person making their own contributions to the standard â evolving it without a moment's hesitation about "giving away their intellectual property."
One good example of this was Dave Sifry, who built the Technorati blog-tracking technology inspired by the Blogging Ecosystem, a weekend project by young hacker Phil Pearson. Dave liked what he saw and he ran with itÂturning Technorati into what it is today.
Dave Winer has contributed enormously to this area of open standards. He defined and personally created several open standards and protocolsÂsuch as RSS, OPML, and XML-RPC. Dave has also helped build the blogosphere through his enthusiasm and passion.
By 2003, hundreds of programmers were working on creating and establishing new standards for almost everything. The best of these new standards have evolved into compelling web services platforms â such as del.icio.us, Webjay, or Flickr. Some have even spun off formal standards â like XSPF (a standard for playlists) or instant messaging standard XMPP (also known as Jabber).
Today's Open APIs are complemented by standardized SchemasÂthe structure of the data itself and its associated meta-data. Take for example a podcasting feed. It consists of: a) the radio show itself, b) information on who is on the show, what the show is about and how long the show is (the meta-data) and also c) API calls to retrieve a show (a single feed item) and play it from a specified server.
The combination of Open APIs, standardized schemas for handling meta-data, and an industry which agrees on these standards are breaking the web wide open right now. So what new open standards should the web incumbentsÂand youÂbe watching? Keep an eye on the following developments:
Identity
Attention
Open Media
Microcontent Publishing
Open Social Networks
Tags
Pinging
Routing
Open Communications
Device Management and Control
1. Identity
Right now, you don't really control your own online identity. At the core of just about every online piece of software is a membership system. Some systems allow you to browse a site anonymouslyÂbut unless you register with the site you can't do things like search for an article, post a comment, buy something, or review it. The problem is that each and every site has its own membership system. So you constantly have to register with new systems, which cannot share dataÂeven you'd want them to. By establishing a "single sign-on" standard, disparate sites can allow users to freely move from site to site, and let them control the movement of their personal profile data, as well as any other data they've created.
With Passport, Microsoft unsuccessfully attempted to force its proprietary standard on the industry. Instead, a world is evolving where most people assume that users want to control their own data, whether that data is their profile, their blog posts and photos, or some collection of their past interactions, purchases, and recommendations. As long as users can control their digital identity, any kind of service or interaction can be layered on top of it.
Identity 2.0 is all about users controlling their own profile data and becoming their own agents. This way the users themselves, rather than other intermediaries, will profit from their ID info. Once developers start offering single sign-on to their users, and users have trusted places to store their dataÂwhich respect the limits and provide access controls over that data, users will be able to access personalized services which will understand and use their personal data.
Identity 2.0 may seem like some geeky, visionary future standard that isn't defined yet, but by putting each user's digital identity at the core of all their online experiences, Identity 2.0 is becoming the cornerstone of the new open web.
The Initiatives:
Right now, Identity 2.0 is under construction through various efforts from Microsoft (the "InfoCard" component built into the Vista operating system and its "Identity Metasystem"), Sxip Identity, Identity Commons, Liberty Alliance, LID (NetMesh's Lightweight ID), and SixApart's OpenID.
More Movers and Shakers:
Identity Commons and Kaliya Hamlin, Sxip Identity and Dick Hardt, the Identity Gang and Doc Searls, Microsoft's Kim Cameron, Craig Burton, Phil Windley, and Brad Fitzpatrick, to name a few.
2. Attention
How many readers know what their online attention is worth? If you don't, Google and Yahoo doÂthey make their living off our attention. They know what we're searching for, happily turn it into a keyword, and sell that keyword to advertisers. They make money off our attention. We don't.
Technorati and friends proposed an attention standard, Attention.xml, designed to "help you keep track of what you've read, what you're spending time on, and what you should be paying attention to." AttentionTrust is an effort by Steve Gillmor and Seth Goldstein to standardize on how captured end-user performance, browsing, and interest data are used.
Blogger Peter Caputa gives a good summary of AttentionTrust:"As we use the web, we reveal lots of information about ourselves by what we pay attention to. Imagine if all of that information could be stored in a nice neat little xml file. And when we travel around the web, we can optionally share it with websites or other people. We can make them pay for it, lease it ... we get to decide who has access to it, how long they have access to it, and what we want in return. And they have to tell us what they are going to do with our Attention data."
So when you give your attention to sites that adhere to the AttentionTrust, your attention rights (you own your attention, you can move your attention, you can pay attention and be paid for it, and you can see how your attention is used) are guaranteed. Attention data is crucial to the future of the open web, and Steve and Seth are making sure that no one entity or oligopoly controls it.
Movers and Shakers:
Steve Gillmor, Seth Goldstein, Dave Sifry and the other Attention.xml folks.
3. Open Media
Proprietary media standardsÂFlash, Windows Media, and QuickTime, to name a few Âhelped liven up the web. But they are proprietary standards that try to keep us locked in, and they weren't created from scratch to handle today's online content. That's why, for many of us, an Open Media standard has been a holy grail. Yahoo's new Media RSS standard brings us one step closer to achieving open media, as do Ogg Vorbis audio codecs, XSPF playlists, or MusicBrainz. And several sites offer digital creators not only a place to store their content, but also to sell it.
Media RSS (being developed by Yahoo with help from the community) extends RSS and combines it with "RSS enclosures" Âadds metadata to any media itemÂto create a comprehensive solution for media "narrowcasters." To gain acceptance for Media RSS, Yahoo knows it has to work with the community. As an active member of this community, I can tell you that we'll create Media RSS equivalents for rdf (an alternative subscription format) and Atom (yet another subscription format), so no one will be able to complain that Yahoo is picking sides in format wars.
When Yahoo announced the purchase of Flickr, Yahoo founder Jerry Yang insinuated that Yahoo is acquiring "open DNA" to turn Yahoo into an open standards player. Yahoo is showing what happens when you take a multi-billion dollar company and make openness one of its core valuesÂso Google, beware, even if Google does have more research fellows and Ph.D.s.
The open media landscape is far and wide, reaching from game machine hacks and mobile phone downloads to PC-driven bookmarklets, players, and editors, and it includes many other standardization efforts. XSPF is an open standard for playlists, and MusicBrainz is an alternative to the proprietary (and originally effectively stolen) database that Gracenote licenses.
Ourmedia.org is a community front-end to Brewster Kahle's Internet Archive. Brewster has promised free bandwidth and free storage forever to any content creators who choose to share their content via the Internet Archive. Ourmedia.org is providing an easy-to-use interface and community to get content in and out of the Internet Archive, giving ourmedia.org users the ability to share their media anywhere they wish, without being locked into a particular service or tool. Ourmedia plans to offer open APIs and an open media registry that interconnects other open media repositories into a DNS-like registry (just like the www domain system), so folks can browse and discover open content across many open media services. Systems like Brightcove and Odeo support the concept of an open registry, and hope to work with digital creators to sell their work to fulfill the financial aspect of the "Long Tail."
More Movers and Shakers:
Creative Commons, the Open Media Network, Jay Dedman, Ryanne Hodson, Michael Verdi, Eli Chapman, Kenyatta Cheese, Doug Kaye, Brad Horowitz, Lucas Gonze, Robert Kaye, Christopher Allen, Brewster Kahle, JD Lasica, and indeed, Marc Canter, among others.
4. Microcontent Publishing
Unstructured content is cheap to create, but hard to search through. Structured content is expensive to create, but easy to search. Microformats resolve the dilemma with simple structures that are cheap to use and easy to search.
The first kind of widely adopted microcontent is blogging. Every post is an encapsulated idea, addressable via a URL called a permalink. You can syndicate or subscribe to this microcontent using RSS or an RSS equivalent, and news or blog aggregators can then display these feeds in a convenient readable fashion. But a blog post is just a block of unstructured textânot a bad thing, but just a first step for microcontent. When it comes tostructured data, such as personal identity profiles, product reviews, or calendar-type event data, RSS was not designed to maintain the integrity of the structures.
Right now, blogging doesn't have the underlying structure necessary for full-fledged microcontent publishing. But that will change. Think of local information services (such as movie listings, event guides, or restaurant reviews) that any college kid can access and use in her weekend programming project to create new services and tools.
Today's blogging tools will evolve into microcontent publishing systems, and will help spread the notion of structured data across the blogosphere. New ways to store, represent and produce microcontent will create new standards, such as Structured Blogging and Microformats. Microformats differ from RSS feeds in that you can't subscribe to them. Instead, Microformats are embedded into webpages and discovered by search engines like Google or Technorati. Microformats are creating common definitions for "What is a review or event? What are the specific fields in the data structure?" They can also specify what we can do with all this information.OPML (Outline Processor Markup Language) is a hierarchical file format for storing microcontent and structured data. It was developed by Dave Winer of RSS and podcast fame.
Events are one popular type of microcontent. OpenEvents is already working to create shared databases of standardized events, which would get used by a new generation of event portalsâsuch as Eventful/EVDB, Upcoming.org, and WhizSpark. The idea of OpenEvents is that event-oriented systems and services can work together to establish shared events databases (and associated APIs) that any developer could then use to create and offer their own new service or application. OpenReviews is still in the conceptual stage, but it would make it possible to provide open alternatives to closed systems like Epinions, and establish a shared database of local and global reviews. Its shared open servers would be filled with all sorts of reviews for anyone to access.
Why is this important? Because I predict that in the future, 10 times more people will be writing reviews than maintaining their own blog. The list of possible microcontent standards goes on: OpenJobpostings, OpenRecipes, and even OpenLists. Microsoft recently revealed that it has been working on an important new kind of microcontent: Listsâso OpenLists will attempt to establish standards for the kind of lists we all use, such as lists of Links, lists of To Do Items, lists of People, Wish Lists, etc.
Movers and Shakers:
Tantek Ãelik and Kevin Marks of Technorati, Danny Ayers, Eric Meyer, Matt Mullenweg, Rohit Khare, Adam Rifkin, Arnaud Leene, Seb Paquet, Alf Eaton, Phil Pearson, Joe Reger, Bob Wyman among others.
5. Open Social Networks
I'll never forget the first time I met Jonathan Abrams, the founder of Friendster. He was arrogant and brash and he claimed he "owned"Â all his users, and that he was going to monetize them and make a fortune off them. This attitude robbed Friendster of its momentum, letting MySpace, Facebook, and other social networks take Friendster's place.
Jonathan's notion of social networks as a way to control users is typical of the Web 1.0 business model and its attitude towards users in general. Social networks have become one of the battlegrounds between old and new ways of thinking. Open standards for Social Networking will define those sides very clearly. Since meeting Jonathan, I have been working towards finding and establishing open standards for social networks. Instead of closed, centralized social networks with 10 million people in them, the goal is making it possible to have 10 million social networks that each have 10 people in them.
FOAF (which stands for Friend Of A Friend, and describes people and relationships in a way that computers can parse) is a schema to represent not only your personal profile's meta-data, but your social network as well. Thousands of researchers use the FOAF schema in their "Semantic Web" projects to connect people in all sorts of new ways. XFN is a microformat standard for representing your social network, while vCard (long familiar to users of contact manager programs like Outlook) is a microformat that contains your profile information. Microformats are baked into any xHTML webpage, which means thatany blog, social network page, or any webpage in general can "contain" your social network in itÂand be used byany compatible tool, service or application.
PeopleAggregator is an earlier project now being integrated into open content management framework Drupal. The PeopleAggregator APIs will make it possible to establish relationships, send messages, create or join groups, and post between different social networks. (Sneak preview: this technology will be available in the upcoming GoingOn Network.)
All of these open social networking standards mean that inter-connected social networks will form a mesh that will parallel the blogosphere. This vibrant, distributed, decentralized world will be driven by open standards: personalized online experiences are what the new open web will be all aboutÂand what could be more personalized than people's networks?
Movers and Shakers:
Eric Sigler, Joel De Gan, Chris Schmidt, Julian Bond, Paul Martino, Mary Hodder, Drummond Reed, Dan Brickley, Randy Farmer, and Kaliya Hamlin, to name a few.
6. Tags
Nowadays, no self-respecting tool or service can ship without tags. Tags are keywords or phrases attached to photos, blog posts, URLs, or even video clips. These user- and creator-generated tags are an open alternative to what used to be the domain of librarians and information scientists: categorizing information and content using taxonomies. Tags are instead creating "folksonomies."
The recently proposed OpenTags concept would be an open, community-owned version of the popular Technorati Tags service. It would aggregate the usage of tags across a wide range of services, sites, and content tools. In addition to Technorati's current tag features, OpenTags would let groups of people share their tags in "TagClouds." Open tagging is likely to include some of the open identity features discussed above, to create a tag system that is resilient to spam, and yet trustable across sites all over the web.
OpenTags owes a debt to earlier versions of shared tagging systems, which include Topic Exchange and something called the k-collectorÂa knowledge management tag aggregatorÂfrom Italian company eVectors.
Movers & Shakers:
Phil Pearson, Matt Mower , Paolo Valdemarin, and Mary Hodder and Drummond Reed again, among others.
7. Pinging
Websites used to be mostly static. Search engines that crawled (or "spidered") them every so often did a good enough job to show reasonably current versions of your cousin's homepage or even Time magazine's weekly headlines. But when blogging took off, it became hard for search engines to keep up. (Google has only just managed to offer blog-search functionality, despite buying Blogger back in early 2003.)
To know what was new in the blogosphere, users couldn't depend on services that spidered webpages once in a while. The solution: a way for blogs themselves to automatically notify blog-tracking sites that they'd been updated. Weblogs.com was the first blog "ping service": it displayed the name of a blog whenever that blog was updated. Pinging sites helped the blogosphere grow, and more tools, services, and portals started using pinging in new and different ways. Dozens of pinging services and sitesÂmost of which can't talk to each otherÂsprang up.
Matt Mullenweg (the creator of open source blogging software WordPress) decided that a one-stop service for pinging was needed. He created Ping-o-MaticÂwhich aggregates ping services and simplifies the pinging process for bloggers and tool developers. With Ping-o-Matic, any developer can alert all of the industry's blogging tools and tracking sites at once. This new kind of open standard, with shared infrastructure, is a critical to the scalability of Web 2.0 services.
As Matt said:There are a number of services designed specifically for tracking and connecting blogs. However it would be expensive for all the services to crawl all the blogs in the world all the time. By sending a small ping to each service you let them know you've updated so they can come check you out. They get the freshest data possible, you don't get a thousand robots spidering your site all the time. Everybody wins.
Movers and Shakers:
Matt Mullenweg, Jim Winstead, Dave Winer
8. Routing
Bloggers used to have to manually enter the links and content snippets of blog posts or news items they wanted to blog. Today, some RSS aggregators can send a specified post directly into an associated blogging tool: as bloggers browse through the feeds they subscribe to, they can easily specify and send any post they wish to "reblog" from their news aggregator or feed reader into their blogging tool. (This is usually referred to as "BlogThis.") As structured blogging comes into its own (see the section on Microcontent Publishing), it will be increasingly important to maintain the structural integrity of these pieces of microcontent when reblogging them.
Promising standard RedirectThis will combine a "BlogThis"-like capability while maintaining the integrity of the microcontent. RedirectThis will let bloggers and content developers attach a simple "PostThis" button to their posts. Clicking on that button will send that post to the reader/blogger's favorite blogging tool. This favorite tool is specified at the RedirectThis web service, where users register their blogging tool of choice. RedirectThis also helps maintain the integrity and structure of microcontentÂthen it's just up to the user to prefer a blogging tool that also attains that lofty goal of microcontent integrity.
OutputThis is another nascent web services standard, to let bloggers specify what "destinations" they'd like to have as options in their blogging tool. As new destinations are added to the service, more checkboxes would get added to their blogging toolÂallowing them to route their published microcontent to additional destinations.
Movers and Shakers:
Michael Migurski, Lucas Gonze
9. Open Communications
Likely, you've experienced the joys of finding friends on AIM or Yahoo Messenger, or the convenience of Skyping with someone overseas. Not that you're about to throw away your mobile phone or BlackBerry, but for many, also having access to Instant Messaging (IM) and Voice over IP (VoIP) is crucial.
IM and VoIP are mainstream technologies that already enjoy the benefits of open standards. Entire industries are bornÂright this secondÂbased around these open standards. Jabber has been an open IM technology for yearsÂin fact, as XMPP, it was officially dubbed a standard by the IETF. Although becoming an official IETF standard is usually the kiss of death, Jabber looks like it'll be around for a while, as entire generations of collaborative, work-group applications and services have been built on top of its messaging protocol. For VoIP, Skype is clearly the leading standard todayÂthough one could argue just how "open" it is (and defenders of the IETF's SIP standard often do). But it is free and user-friendly, so there won't be much argument from users about it being insufficiently open. Yet there may be a cloud on Skype's horizon: web behemoth Google recently released a beta of Google Talk, an IM client committed to open standards. It currently supports XMPP, and will support SIP for VoIP calls.
Movers and Shakers:
Jeremie Miller, Henning Schulzrinne, Jon Peterson, Jeff Pulver
10. Device Management and Control
To access online content, we're using more and more devices. BlackBerrys, iPods, Treos, you name it. As the web evolves, more and more different devices will have to communicate with each other to give us the content we want when and where we want it. No-one wants to be dependent on one vendor anymoreÂlike, say, SonyÂfor their laptop, phone, MP3 player, PDA, and digital camera, so that it all works together. We need fully interoperable devices, and the standards to make that work. And to fully make use of how content is moving online content and innovative web services, those standards need to be open.
MIDI (musical instrument digital interface), one of the very first open standards in music, connected disparate vendors' instruments, post-production equipment, and recording devices. But MIDI is limited, and MIDI II has been very slow to arrive. Now a new standard for controlling musical devices has emerged: OSC (Open SoundControl). This protocol is optimized for modern networking technology and inter-connects music, video and controller devices with "other multimedia devices." OSC is used by a wide range of developers, and is being taken up in the mainstream MIDI marketplace.
Another open-standards-based device management technology is ZigBee, for building wireless intelligence and network monitoring into all kinds of devices. ZigBee is supported by many networking, consumer electronics, and mobile device companies.
   · · · · · ·  Â
The Change to Openness
The rise of open source software and its "architecture of participation" are completely shaking up the old proprietary-web-services-and-standards approach. Sun MicrosystemsÂwhose proprietary Java standard helped define the Web 1.0Âis opening its Solaris OS and has even announced the apparent paradox of an open-source Digital Rights Management system.
Today's incumbents will have to adapt to the new openness of the Web 2.0. If they stick to their proprietary standards, code, and content, they'll become the new walled gardensÂplaces users visit briefly to retrieve data and content from enclosed data silos, but not where users "live." The incumbents' revenue models will have to change. Instead of "owning" their users, users will know they own themselves, and will expect a return on their valuable identity and attention. Instead of being locked into incompatible media formats, users will expect easy access to digital content across many platforms.
Yesterday's web giants and tomorrow's users will need to find a mutually beneficial new balanceÂbetween open and proprietary, developer and user, hierarchical and horizontal, owned and shared, and compatible and closed.
Marc Canter is an active evangelist and developer of open standards. Early in his career, Marc founded MacroMind, which became Macromedia. These days, he is CEO of Broadband Mechanics, a founding member of the Identity Gang and of ourmedia.org. Broadband Mechanics is currently developing the GoingOn Network (with the AlwaysOn Network), as well as an open platform for social networking called the PeopleAggregator.
A version of the above post appears in the Fall 2005 issue of AlwaysOn's quarterly print blogozine, and ran as a four-part series on the AlwaysOn Network website.(Via Marc's Voice.)
David Cowan is a partner at Bessemer Venture Partners and writes a blog called Who Has Time For This. Heâs on this list partially because he incubated the hottest and most anticipated company on the web right now, Flock.]]>
Tim Draper invested in Skype. Done. He also sits on the board of SocialText, and his fund was in Baidu.
David Hornik is is a General Partner at August Capital and writes a blog that has over 10,000 RSS readers.
Josh Kopelman, through FirstRoundCapital, is quietly filtering through just about every young web 2.0 company, and investing in many of them.
Fred Wilson is a founding partner of Union Square Ventures and writes the extremely popular A VC. If you are new to web 2.0, start with his Blogging 1.0 post.
Jeff Clavier - Jeff is a former VC and still makes the odd angel investment (Feedster, Truveo, and a few others). His new venture allows him to work with pre-funding companies and get them ready for prime time.
Brad Feld - Brad is a managing director at Mobius Venture Capital and writes a must-read web 2.0 blog called Feld Thoughts. Read his posts on Term Sheets if you are in the process of raising capital.
OâReilly AlphaTech Ventures - This is the only non-person on here. OATV just closed a $50 million fund to invest in young companies. Given the incredible access Tim OâReilly has to these companies, OATV could quickly become an important fund in the web 2.0 space.
Pierre Omidyar - Pierre founded ebay and is the Co-founder of Omidyar Network, where heâs invested in a number of interesting companies including EVDB, SocialText and Feedster, and others.
Peter Rip - Peter is a founding partner of Leapfrog Ventures, a $100 million fund. Peter also writes Early Stage VC, another must-read blog. His investments include ojos, an incredible new photo-metadata service that is going to be extremely disruptive (and useful).
Peter Thiel - Peter, the former CEO of paypal, has invested in LinkedIn, Friendster, LinkedIn and other web 2.0 companies. Heâs just created the Founders Fund.
Thomas Ball - Tom is a Venture Partner at Austin Ventures, a fund with $3 billion under management. Heâs their consumer and web 2.0 guy and seems to be spending a lot of time in Silicon Valley and at web 2.0 event.
Dan Grossman - Dan is a principal at Venrock Associates and has recently started a great blog called A Venture Forth (where he wrote a much bookmarked post on Ajax).
Jason Pressman - Jason is a principal at Shasta Ventures, a young $200 million fund that has a deep commitment to and expertise in consumer-focused businesses.
]]>Web 2.0: Conversation with Vinod Khosla:
Vinod started by explaining that, contrary to the rumor, he is not starting a fund, he is just investing his own capital - hiring a few people to help with this. He was a General Partner in all Kleiner funds he was involved in, but decided to forego his GP position in KPCB XI (which means that he is less involved in the fund, and the firm).
He sees opportunities in peer-to-peer infrastructure: communication, media distribution and mangling (as in being on the computer and imâing whilst watching TV - something I do all the time), etc. He also states that personalization has not yet started to deliver on its promises on the web.
Vinod states that there is still too much capital in the coffins of the VCs, even if a lot of the overhang (âleftoversâ from the capital raised by the bubble funds) has been largely used up. This makes raising money âtoo easyâ, and cautions startups being funded nowadays that they should not take this as a sign of future success (neither on their capacity to execute nor raise further funds), and that management teams should spend frugally by trying things at small scale and get market response.
On the subject of search, he recalls Exciteâs rebuttal of Vinodâs suggestion of buying Googleâs technology for $1M (very very early on) because Excite guys thought they could do so much better. Lesson: be open to otherâs capabilities to disrupt your turf - even when you are successful (actually, especially when you are successful because you might become complacent).
The discussion moves to the topic of trust in the blogosphere, put in the perspective of MSM and reporting on disasters, like Katrina and London bombings. Vinod argues that he will trust more aggregated reports from hundreds of bloggers rather than CNN, even if there is no blog trust/reputation infrastructure (I would argue that linking behavior is an OK proxy for now - Scoble would say that this is what we get with Memeorandum).
I don't think so re. Memeorandum (as I stated earlier, his is a genuine big picture thinker and thought leader, Memeorandum is a step in the right direction, but in no way the final destination envisaged).
On the mobile revolution, he sees an increased usage, way beyond ringtones and wallpapers, into communication and new types of interactions. An example he gives is using cell phones to help people improve their english (a real challenge for Chinese, Indian⦠and French people :-). Just in Shanghai, there are over 5M customers/users of such a product.
Moving to education, Vinod states that kids now need to learn about sifting through thousands and thousands of search engine results on any topic they might research, and critically assessing which source to trust or not. He would use new communcation mechanisms to offer remote tutoring, and would open source textbooks in a wikipedia model - which would save billions of dollars every year in California that could be reinvested into more critical projects.
Staying with open source, he gives the example of patented seeds that could benefit greatly to developing country and their ability to engineer seeds matching their particular requirements and environment. This is not feasilble today because of patent protection.
On improving on Googleâs relevance (question from Michael Yang from Become.com), Vinod sees value in collaborative filtering, and diverse applications of information retrieval and extraction.
(Via Software Only.)
]]>Key data points:
- Market cap of big 5: $2B (2000 pre-IPO), $178B (2000 peak), $32B (2002 trough) $261B (2005)
- 27% of US Internet users read blogs
- 54MM registered Skype users (9/05) - fastest product ramp ever?
- China - More Internet users < age of 30 than anywhere
- S. Korea Broadband penetration of 70%+ - No. 1 in world
- Mobile is most important direction now
Conclusion: first ten years (1995-2005) of commercial Internet were a warm up act for what is about to happen
"(Via Silkworm Blog.)
Here is what my Virtuoso based blog system enables me to do with ease:
]]>Web 2.0 Conference Trip Report: Mash-ups 2.0 - Where's the Business Model?: "
I attended the panel on business models for mash-ups hosted by Dave McClure,
Jeffrey McManus, Paul Rademacher, and Adam Trachtenberg.A mash up used to mean remixing two songs into something new and cool but now the term has been hijacked by geeks to means mixing two or more web-based data sources and/or services.
Paul Rademacher is the author of the Housing Maps mash-up which he used as a way to find a house using Craig'sList + Google Maps. The data obtained from Craig's List is fetched via screen scraping. Although Craig's List has RSS feeds, they didn't meet his needs. Paul also talked about some of the issues he had with building the site such as the fact that since most browsers block cross-site scripting using XMLHttpRequest then a server needs to be set up to aggregate the data instead of all the code running in the browser. The site has been very popular and has garnered over 900,000 unique visitors based solely on word-of-mouth.
The question was asked as to why he didn't make this a business but instead took a job at Google. He listed a number of very good reasons
- He did not own the data that was powering the application.
- The barrier to entry for such an application was low since there was no unique intellectual property or user interface design to his application
I asked whether he'd gotten any angry letters from the legal department at Craig's List and he said they seem to be tolerating him because he drives traffic to their site and caches a bunch of data on his servers so as not to hit their servers with a lot of traffic.Â
A related mash-up site which scrapes real estate websites called Trulia was then demoed. A member of the audience asked whether Paul thought the complexity of mash-ups using more than two data sources and/or services increased in a linear or exponential fashion. Paul said he felt it increased in a linear fashion. This segued into a demo of SimplyHired with integrates with a number of sites including PayScale, LinkedIn, Job databases, etc.
At this point I asked whether they would have service providers giving their perspective on making money from mash-ups since they are the gating factor because they own the data and/or services mash-ups are built on. The reply was that the eBay & Yahoo folks would give their perspective later.
Then we get a demo of a Google Maps & eBay Motors mash-up. Unlike the Housing Maps mash-up, all the data is queried live instead of cached on the server. eBay has dozens of APis that encourage people to build against their platform and they have an affiliates program so people can make money from building on their API. We also got showed Unwired Buyer which is a site that enables you to bid on eBay using your cell phone and even calls you just before an auction is about to close. Adam Trachtenberg pointed out that since there is a Skype API perhaps some enterprising soul could mash-up eBay & Skype.
Jeffrey McManus of Yahoo! pointed out that you don't even need coding skills to build a Yahoo! Maps mash-up since all it takes is specifying your RSS feed with longitude and latitude elements on each item to have it embedded in the map. I asked why unlike Google Maps and MSN Virtual Earth, Yahoo! Maps doesn't allow users to host the maps on their page nor does there seem to be an avenue for revenue sharing with mash-up authors via syndicated advertising. The response I got was that they polled various developers and there wasn't significant interest in embedding the maps on developer's sites especially when this would require paying for hosting.
We then got showed a number mapping mashups including a mashup of the London bombings which used Google Maps, Flickr & RSS feeds of news (the presenter had the poor taste to point out opportunities to place ads on the site), a mashup from alkemis which mashes Google Maps, A9.com street level photos and traffic cams, and a mash-up from Analygis which integrates census data with Google Maps data.
The following items were then listed as the critical components of mash-ups
 - AJAX (Jeffrey McManus said it isn't key but a few of the guys on the panel felt that at least dynamic UIs are better)
 - APIs
 - Advertising
 - Payment
 - Identity/Acct mgmt
 - Mapping Services
 - Content Hosting
 - Other?On the topic of identity and account management, the problem of how mash-ups handle user passwords came up as a problem. If a website is password protected then user's often have to enter their usernames and passwords into third party sites. An example of this was the fact that PayPal used to store lots of username/password information of eBay users which caused the company some consternation since eBay went through a lot of trouble to protect their sensitive data only to have a lot of it being stored on Paypal servers.
eBay's current solution is similar to that used by Microsoft Passport in that applications are expected to have user's login via the eBay website then the user is redirected to the originating website with a ticket indicating they have been authenticated. I pointed out that although this works fine for websites, it offers no solution for people trying to build desktop applications that are not browser based. The response I got indicated that eBay hasn't solved this problem.
My main comment about this panel is that it didn't meet expectations. I'd expected to hear a discussion about turning mashups [and maybe the web platforms they are built on] into money making businesses. What I got was a show-and-tell of various mapping mashups. Disappointing.
"
just take a look at the oxymoronic Wikipedia 2.0 imbroglio to get my drift. In retrospect, I should have called on Esquire magazine to get the Web 2.0 article going :-) ).Anyway, back to Dare's analysis of Tim's 7 Web 2.0 litmus test items listed below:
And trimmed down to 3 by Dare:
- Services, not packaged software, with cost-effective scalability
- Control over unique, hard-to-recreate data sources that get richer as more people use them
- Trusting users as co-developers
- Harnessing collective intelligence
- Leveraging the long tail through customer self-service
- Software above the level of a single device
- Lightweight user interfaces, development models, AND business models
Well, I would like to summarize this a little further using a few excerpts from my numerous contributions to the Web 2.0 talk page on Wikipedia (albeit mildly revised; see strikeouts etc.):
Exposes Web services that can be accessed on any device or platform by any developer or user. RSS feeds, RESTful APIs and SOAP APIs are all examples of Web services. Harnesses the collective knowledge of its user base to benefit users Leverages the long tail through customer self-service
Web 2.0 is a web ofexecutableservice invocation endpoints (those Web Services URIs) and well-formed content (all of that RSS, Atom, RDF, XHTML, etc. based Web Content out on the NET). Theexecutableservice invocation endpoints and well-formed content are accessible via URIs.Put in even simpler terms, Web 2.0 is an incarnation of the web defined by URIs for invoking Web Services and/or consuming or syndicating well-formed content.
Looks like I've self edited my own definition in the process. :-)
If you don't grok this definition then consider using it as a trigger for taking a closer look at the dynamics that genuinely differentiate Web 1.0 and Web 2.0.
In another Wikipedia "talk page" contribution (regarding "Web 2.0 Business Impact") I attempt to answer the question posed here, which should also shed light on the premise of my definition above:Web 1.0 was about web sites geared towards an interaction with human beings as opposed to computers. In a sense this mirrors the difference between HTML and XML.
A simple example (purchasing a book):
amazon.com provides value to you by enabling you to search and purchase the desired book online via the site http://www.amazon.com.
In the Web 1.0 era the process of searching for your desired book, and then eventually purchasing the book in question, required visible interaction with the site http://www.amazon.com. In today's Web 2.0 based Web the process of discovering a catalog of books, searching for your particular book of interest, and eventually purchasing the book, occurs via Web Services which amazon has chosen to expose via an executable endpoint (the Web point of presence for exposing its Web Services).
Direct interaction via http://www.amazon.com is no longer required. A weblog can quite easily associate keywords, tags, and post categories with items in amazon.com's catalogs. In addition, weblogs can also act as entry points for consuming the amazon.com value proposition (making books available for purchase online), by enabling you to purchase a book directly from the weblog (assuming the blog owner is an amazon associate etc..). Now compare the impact of this kind of value discovery and consumption cycle driven by software to the same process driven by humans interaction with a static or dynamic HTML page (Web 1.0 site).
To surmise, Web 2.0 is a reflection of the potential of XML expressed through the collective impact of Web Services (XML based distributed computing) and Well-formed Content (Blogosphere, Wikisphere, XHTML micro content etc.). The potential simply comes down to the ability to ultimately connect events, triggers, impulses (chatter, conversation, etc.), and data in general via URIs.
Let's never forget that XML is the reason why we have a blogosphere (RSS/Atom/RDF are applications of XML). Likewise, XML is also the reason why we have Web Services (doesn't matter what format).
As I have stated in the past, we must go by Web 2.0 en route what is popularly referred to as the Semantic Web (it will be known by another name by the time we get there; 3.0 or 4.0, who knows or cares?). At the current time, the prerequisite activity of self annotation is in full swing on the current Web, thanks to the inflective effects of Web 2.0.
BTW - Would this URI to all Semantic Web related posts on my blog pass the Web 2.0 litmus test? Likewise, this URI to all Web 2.0 related posts? I wonder :-)
]]>Tim O'Reilly has posted a meme map of Web 2.0, from the 'What is Web 2.0?' brainstorming session at FOO Camp 2005. Hat-tip Josh for the link. It's kind of a business model map:
The orange box in the middle and brown ovals at the bottom cover some the themes I'll be writing about in the next chapter of Josh and I's book. The chapter is tentatively titled 'Building a Web 2.0 Business' and will explore the principles of Web 2.0 business. e.g. 'Services, not packaged app'.
I've just finished my first chapter, which was a general introduction to the 'Web as Platform' concept. So this will be an interesting follow-on from that.
Alex Barnett has done a nice 'Microsoft mash-up', inserting links.
"(Via Read/Write Web.)
]]>A Webpage is Not An API or a Platform (The Populicio.us Remix): "
A few months ago in my post GMail Domain Change Exposes Bad Design and Poor Code, I wrote Repeat after me, a web page is not an API or a platform. It seems some people are still learning this lesson the hard way. In the post The danger of running a remix service Richard MacManus writes
Populicio.us was a service that used data from social bookmarking site del.icio.us, to create a site with enhanced statistics and a better variety of 'popular' links. However the Populicio.us service has just been taken off air, because its developer can no longer get the required information from del.icio.us. The developer of Populicio.us wrote:
'Del.icio.us doesn't serve its homepage as it did and I'm not able to get all needed data to continue Populicio.us. Right now Del.icio.us doesn't show all the bookmarked links in the homepage so there is no way I can generate real statistics.'
This plainly illustrates the danger for remix or mash-up service providers who rely on third party sites for their data. del.icio.us can not only giveth, it can taketh away.
It seems Richard Macmanus has missed the point. The issue isn't depending on a third party site for data. The problem is depending on screen scraping their HTML webpage. An API is a service contract which is unlikely to be broken without warning. A web page can change depending on the whims of the web master or graphic designer behind the site.
Versioning APIs is hard enough, let alone trying to figure out how to version an HTML website so screen scrapers are not broken. Web 2.0 isn't about screenscraping. Turning the Web into an online platform isn't about legitimizing bad practices from the early days of the Web. Screen scraping needs to die a horrible death. Web APIs and Web feeds are the way of the future.
" Amen! ]]>This site could provide a great exposure point for some very old Web 2.0 (nee "Web Services") demos from our Virtuoso tutorials / demos site.
]]>I ran into two work-related problems today that left me feeling like there are some aspects of two very recent (Web 2.0-esque if we wish to join the buzzword orgy of late) architectures (REST/Services and XForms) that are problematic:
XML as a medium for remote communication (evangelized more with WSDL-related architectures than in REST) has over-stated its usefullness in at least one concrete regard, in my estimation. I've had a hard time taking most of the architectural arguments on the pros/cons of SOAP/XML-RPC versus REST seriously because it seems to be nothing more than buzzword warfare. However, I recently came across a concrete, real world example of the pitfalls of implementing certain remote service needs on XML-based communication mediums (such as SOAP/XML-RPC).
If the objects/resources you wish to manipulate at the service endpoints are run of the mill (consider the standard cliche purchase order example(s)) then the benefits of communicating in XML is obvious: portability, machine readability, extensibility, etc.. However consider the scenario (which I face) in which the objects/resources you wish to manipulate are XML documents themselves! This scenario seems to work to the disadvantage of the communication architecture.
Lets say you have a repository at one end (which I do) that has XML documents you wish to manipulate remotely. How do you update the documents? I've discussed this before (see: Base64 encoded XML content from an XForm) so I'll spare the details of the problem. However, I will mention that in retrospect this particular problem further emphasizes the advantage of a MinimalistRemoteProcedureCall (MRPC) approach - MRPC is my alternative acronym for REST :).
Consider the setContent message:
[SOAP:Envelope]
[SOAP:Body]
[foo:setContent]
[path] .. path to document [/path]
[src]... new document as a fragment ...[/src]
[/foo:setContent]
[/SOAP:Body]
[/SOAP:Envelope]
Notice that the location of the resource we wish to update is embedded within the message transmitted (via SOAP), which is transported on top of another communication medium (HTTP) that already has the neccessary semantics for saying the same thing:
Set the content of the resource identified by a path
In the SOAP scenario, the above message is delivered to a single service endpoint (which serves as an external gateway for all SOAP messages) which has to then parse the entire XML message in order to determine the method invoked (setContent in this case) and the parameters passed to it (both of which are only header information on a document that consists mostly of the new document).
However, in the MRPC scenario this service would be invoked simply as an HTTP PUT request sent directly to the XML document we wish to update:
Method: PUT
Protocol: HTTP/1.0
URI: http://remoteHost:port/< .. path to XML document ..>
CONTENT:
... new document in it's entirety ..
Here, there is no need for a service middleman to interpret the service requested (and no need to parse a large XML document that contains another document embedded as a fragment). The HTTP request by itself specifies everything we need and does it using HTTP alone as the communication medium. This is even more advantageous when the endpoint is a repository that has a very well defined URI scheme or general addressing mechanism for it's resources (which 4Suite does, the repository in my case).
Since i didn't have the option of a REST-based service architecture (the preferred solution) I was relegated to having to base64 encode the new XML content and embed it within the XML message submitted to the service endpoint, like so:
[SOAP:Envelope]
[SOAP:Body]
[foo:setContent]
[path] .. path to document [/path]
[src]... base64 encoding of new document's serialization ...[/src]
[/foo:setContent]
[/SOAP:Body]
[/SOAP:Envelope]
Base 64 seemed like the obvious encoding mechanism mostly because it would seem from an interpretation of the XForms specification that due to the data binding restrictions of the Upload Control when bound to instances of type xsd:base64Binary a conforming XForms processor is responsible for having the capability to encode to Base 64 on the fly. Now, this is fine and dandy if the XML content you wish to submit is retrieved from a file on the local file system of the client communicating remotely with the server. However, what if you wish to use an instance (a live DOM) as the source for the update? This seems like a very reasonable requirement given that one of the primary motivation of XForms is to encourage the use of XML instances as the user interface data model (providing a complete solution to the 'M' in the MVC architecture.)
However:
(Via Uche Ogbuji.)
]]>In the past I have expressed views that echo the essence of John's piece. It has been pretty darn clear to me that Microsoft is struggling as a result of its inability to handle challenges associated with the metaphoric "computing vase" which it sought to own solely as a result of its proclivity for crushing and/or alienating erstwhile technology partners as part of this quest (a process that commenced a long time ago culminating the contradiction and ultimate paradox called IE7; remember not too long ago it was impossible to separate IE from Windows! It could only exist as an OS extension etc.).
Windows in its current incarnation fails to provide a productive working environment, you either have a plethora of viruses and spyware contending for you computing resources, or you have all the software in place to protect against these assaults rendering the computing resources equally busy. The computing power lag is simply too much when using windows, and this is its achilles heel!
I have been using Windows since version 2.0, and although I have always found the Mac OS variations to be superior on the UI front, I never found any of the historic versions viable alternatives. In my case, this is all about providing a productive work environment across the following usage modes, in descending order of priority:
1. Power User (OutLook, Excel, WORD, and other desktop productivity tools)
2. Product Testing and QA
3. Programmer Buddy (a Microsoft term)
4. Programming (for the most part prototyping)
The release of Mac OS X Tiger lead me down an evaluation path that I have repeated many times in the past: test the viability of moving wholesale from Windows to Mac OS X and remain functional (if really lucky, exceed existing productivity levels). This time around I found that I could actually migrate over 6 years worth of emails, contacts, presentations, documents, spreadsheets from Windows to Mac OS X. I also discovered that success extended all the way to my data linked documents that are transparently bound to back-end databases (in my case the norm rather the exception via ODBC).
I now use Mac OS X as my prime working platform (I still have to use Windows as the platform remains strategic for all our product offerings), and I am absolutely loving it! The joint feelings of euphoria and confusion that I experienced post migration were similar to how I felt after making the transition from "stick shift" to "automatic" geared cars (as I transitioned my residence from the UK to the U.S). At the time I couldn't understand why anyone (other than a grand prix driver) would ever drive a "stick shift" by choice.
Today, I can't understand why I stuck with Windows for so long at the expense of my daily working productivity. The biggest bonus from this transition is that Mac OS X has made it easier for me to engage less technical individuals (family & friends) in the sheer joy and potential of Information Technology across a variety of realms as opposed to being confined to the "business computing" realm solely. I can demonstrate the power and potential of the Internet, Web, Web Services, Blogosphere, Wikispehere, with much more sanity and coherence now that my machine responds in a timely fashion during these demos amongst other benefits.
Some may deem this windows bashing, but if they take the time to look a little deeper, this is simply about "straight shooting" from a real computer user (I like my computers to do deliver on their hugh potential promised; I don't compromise this basic expectation; my computer and associate software should save me time and ramp up my productivity!) . If Microsoft is the company that it once was, then it would simply use this kind of commentary to rally its troops and get its act together! That's what I would do if a customer felt so badly about our technology (UDA or Virtuoso).
]]>While I'm still trying to figure this out, you should read Shelley's original post, Steve Levy, Dave Sifry, and NZ Bear: You are Hurting Us and see whether you think the arguments against blogrolls are as wrong as I think they are.
"..The Technorati Top 100 is too much like Google in that âÂÂnoiseâ becomes equated with âÂÂauthorityâÂÂ. Rather than provide a method to expose new voices, your list becomes nothing more than a way for those on top to further cement their positions. More, it can be easily manipulated with just the release of a piece of software.."
Advertising in RSS is just starting now, for all practical purposes. If we wanted to, as an industry, reject the idea, we could.
The Internet Archive initiative is building up an amazing collection of content that includes this "must watch" movie about the somewhat forgotten hypercard development environment.
As I watched the hypercard movie I obtained clear reassurance that my vision of Web 2.0 as critical infrastructure for a future Semantic Web isn't unfounded. The solution building methodology espoused by hypercard is exactly how Semantic Web applications will be built, and this will be done by orchestrating the componentary of Web 2.0.
When watching this clip make the following mental adjustments:
Web 2.0 is a reflection of the web taking its first major step out of the technology stone age (certainly the case relative to the hypercard movie and "pre web" application development in general).
]]>
I absolutely understand the frustration expressed in Dare's post. An additional comment from my perspective is that this devolution has been in motion for a while and it is an integral part of the Misinformation and Disinformation based marketing strategies of many companies.
Misinformation and Disinformation only work when the target audience is apathetic (unfortunately the sad reality to date!). The bad news for marketing strategies that assume perpetuation of the aforementioned apathy is that the Internet is fundamentally reducing the cost of knowledge acquisition; by implication today's naive customer is tomorrow's knowledgeable decision maker. Vendors have a choice: build valuable products, and then market these products by disseminating knowledge. If a competitor's product is better than yours, get back to the labs (developers are actually stimulated and motivated by constructive challenges; especially as any developer worth his or her salt intrinsically believes they are the best at their craft deep down; and so they should!).
In the imminent future (Internet time) I expect to see the Wikisphere, Blogosphere, and other Web 2.0 (and beyond) realms bring clarity to the futility of Misinformation and Disinformation based marketing and PR (see my post about the Wikipedia induced inflection on Marketing and PR ).
BTW -- Does anyone know what's the difference between an ESB and a Universal Server? Likewise, the difference between a Virtual Database and an EII solution?
]]>When putting together a post yesterday about "Virtualization", I instinctively looked to Gurunet's "answers.com" service for information on the subject: Enterprise Information Integration (EII). Woe and behold! Here is what I found at the tail end of the answers.com article on this subject:
Now, I knew this was Wikipedia content repurposed by "answers.com", and I proceeded to clean up the article. The wikified article took a while to complete, because true to the "Wikipedia" ethos, I had to contribute knowledge as opposed to the original weenie marketing gunk. Its naturally easier to cut and paste marketing fluff for a misguided quick win attempt than it is to embed links, add knowledge, and discern Wiki Markup (but "Wiki" don't play that!).
This little exercise has broader implications for marketing as a whole, especially for the IT sector. The end of days for "Misinformation based Marketing" are nigh! Wikis, Blogs, Search Engines, Web Services, and Social Networking are rapidly destroying the historically prohibitive costs associated with customer pursuit of facts.
I am very confident that product quality will soon overshadow market share as the key determinant for both product selection on the part of customers (this is no longer a pipe dream!). I also have increased hope that IT product development and associated product marketing by technology vendors will veer in the same direction.
]]>Today is one of those days where one topic appears to be on the mind of many across cyberspace. You guessed right! Its that Web 2.0 thing again.
Paul Bausch brings Yahoo!'s most recent Web 2.0 contribution to our broader attention in this excerpt from his O'Reilly Network article:
I browse news, check stock prices, and get movie times with Yahoo! Even though I interact with Yahoo! technology on a regular basis, I've never thought of Yahoo! as a technology company. Now that Yahoo! has released a Web Services interface, my perception of them is changing. Suddenly having programmatic access to a good portion of their data has me seeing Yahoo! through the eyes of a developer rather than a user.
The great thing about this move by Yahoo! is two fold (IMHO):
The great thing about the Platform oriented Web 2.0 is the ability to syndicate your value proposition (aka products and services) instead of pursuing fallable email campaigns. It enables the auto-discovery of products and services by user agents (the content aspect). Web 2.0 also provides an infrastructure for user agents to enter into a consumptive interactions with discrete or composite Web Services via published endpoints exposed by a platform (the execution aspect).
A scenario example:
You can obtain RSS feeds (electronic product catalogs) from Amazon today, although you have to explicitly locate these catalog-feeds since Amazon doesn't exploit feed auto-discovery within their domain.
If you use Firefox or another auto-discovery supporting RSS/Atom/RDF user agent; visit this URL; Firefox users should simply click on the little orange icon bottom right of the browser's window to its RSS feed auto-discovery in action.
Anyway, once you have the feeds the next step is execution endpoints discovery within the Amazon domain (the conduits to Amazon's order processing system in this example). At the current time there isn't broad standardization of Web Services auto-discovery but it's certainly coming; WSIL is a potential front runner for small scale discovery while UDDI provides a heavier duty equivalent for larger scale tasks that includes discovery and other related functionality realms.
Back to the example trail, by having the RSS/Atom/RDF feed data within the confines of a user agent (an Internet Application to be precise) nothing stops the extraction of key purchasing data from these feeds, plus your consumer data en route to assembling an execution message (as prescribed by the schema of the service in question)for Amazon's order processing/ shopping cart service. All of this happens without ever seeing/eye-balling the Amazon site (a prerequisite of Web 1.0 hence the dated term: Web Site).
To summarize: Web 2.0 enables you to syndicate your value proposition and then have it consumed via Web Services, leveraging computer, as opposed to human interaction cycles. This is how I believe Web 2.0 will ultimately impact the growth rates (in most cases exponentially) of those companies that comprehend its potential.
]]>It is clear that in comparison to the Web of the last century, the nature of data on the Web later in this decade will be very different in the following aspects:
- Volume of data is growing by orders of magnitudes every year
Multimedia and sensor data are becoming more and more common.- Spatio-temporal attributes of data are important.
- Different data sources provide information to form the holistic picture.
- Users are not concerned with the location of data source, as long as its quality and credibility is assured. They want to know the result of the data assimilation (the big picture of the event).
- Real-time data processing is the only way to extract meaningful information
Exploration, not querying, is the predominant mode of interaction, which makes context and state critical.- The user is interested in experience and information, independent of the medium and the source.
Effectively, the nature of the knowledge on the Web is changing very fast. It used to be mostly static text documents; now it will be a combination of live and static multimedia, including text, data and documents with spatio-temporal attributes. Considering these changes, can the search engines developed for static text documents be able to deal with the needs of the Web? [via E M E R G I C . o r g]
No, but this doesn't render them useless since we wouldn't be at this point without the likes of Google, Yahoo! et al. But building upon the data substrate that web data oriented search engines provide is where the next batch of Information access and Knowledge discovery solutions will carve out their space. The symbiotic relationship between Google (data) and Gurunet's Answers.com (Information and Knowledge) is one interesting example.
The Web is a distributed collection of databases that implement variety of data storage models but are commonly accessible via protocols that rely on HTTP for transport (in-bound and out-bound messages) services. These databases increasingly using well-formed XML for query result (data contextualization) persistence and URIs for permenant reference. 'What Database?" you might ask, "What you once called your Web Site, Blog, Wiki, etc.." my time-less reply.
When you have the database that I describe above, and a collection of entry points from which discrete or composite Web Services can be invoked available from one or more internet domains, you end up with what I prefer to call "Web 2.0" presence, or what Richard McManus describes as: "The Web as a Platform".
Here is a collection of posts I have made in the past relating to Web 2.0, note that this list is dynamic since this blog is Virtuoso based (predictably):
Free Text Search with XHTML results page (with Virtuoso generated URIs for RSS, Atom, and RDF): http://www.openlinksw.com/blog/search.vspx?blogid=127&q=web+2.0&type=text&output=html
It's also no secret that I believe that Virtuoso is a bleeding edge Web 2.0 technology platform (and more..). The URIs that I am exposing provide the foundation layer for other complimentary Web initiatives such as the Semantic Web (Web 2.0 provides infrastructure for the Semantic Web as time will show). They are also completely usable outside the realm of this blog.
BTW - Jon Udell is writing, experimenting with, and demonstrating similar concepts across feeds within his Web 2.0 domain.
These are indeed fun times!
]]>Have RSS feeds killed the email star? silicon.com Feb 28 2005 12:58PM GMT
Business Week has a special report on Web services...
.......
That's a big difference. It means their business model can scale, and the bigger they get, the more profitable they become because they're building on that initial research and development investment." [via E M E R G I C . o r g]
On the issue of scale I would like to add (to the excerpt above) that fact that we will ultimately come to realize that Web Services facilitate computer cycle based value proposition consumption. This is exponentially greater that human interaction based value proposition consumption -- the fundamental difference between Web 1.0 and Web 2.0 IMHO.
It would be interesting to track the growth of companies in line with their adoption of Web Services based initiatives. I expect that when such a graph is produced in the future it will validate my point.
]]>By Uche Ogbuji, IBM developerWorks
The world of XML and Web services is huge, and growing. developerWorks does much to map it out for you, but when you're looking for a schema or a public Web service to meet some pressing need, it's useful to have handy several key resources. This tip shows you how to comb through the enormous variety of Internet resources to find schemata and Web services using common search criteria. The best known source for finding public SOAP Web services is XMethods. It has a comprehensive list of SOAP services that you can sort by several criteria. It also provides a demo client so you can try out the services right from the index site. You can also keep track of the listings on XMethods programmatically using UDDI, RSS, and other means.sites that provide directories of Web services include RemoteMethods.com and Web Service List. A chronicle of interesting Web services is Web service of the Day.
One resource that straddles the Web services/Semantic Web is WSindex.org, a directory of Web services, XML, SOAP, UDDI, WSDL, and Semantic Web resources. This site is a hierarchical and searchable directory.
http://www-106.ibm.com/developerworks/xml/library/x-tiplkws.html
]]>By Bill Gates, Microsoft Executive Mail
Microsoft's product interoperability strategy: "First, we continue to support customers' needs for software that works well with what they have today. Second, we are working with the industry to define a new generation of software and Web services based on eXtensible Markup Language (XML), which enables software to efficiently share information and opens the door to a greater degree of 'interoperability by design' across many different kinds of software. Our goal is to harness all the power inherent in modern (and not so modern) business software, and enable them to work together so that the whole is greater than the sum of the parts. We want to further eliminate friction among heterogeneous architectures and applications without compromising their distinctive underlying capabilities... The XML-based architecture for Web services, known as WS-* ('WS-Star'), is being developed in close collaboration with dozens of other companies in the industry including IBM, Sun, Oracle and BEA. This standard set of protocols significantly reduces the cost and complexity of connecting disparate systems, and it enables interoperability not just within the four walls of an organization, but also across the globe."
http://www.microsoft.com/mscorp/execmail/2005/02-03interoperability.asp
Amen Bill! As long as this doesn't covertly imply "Windows Specificity" by way of "Interoperability" becoming a "Windows Unique Selling Point"!
As per usual, the devil will be in the implementation details of your company's products.
]]>Answers.com was launched a month ago, and its stock is practically on fire! Does this graph tell you anything about subject searches vs keyword searches?
The burgeoning Semantic Web will disrupt the search market in a big way (and for the better IMHO).
]]>
An immediate implication is that you can generate Google AdWords based adds using any development environment (Virtuoso's SQL Stored Procedure Language, any .NET bound language, Java, C/C++, PHP, Ruby, Perl, Python, TCL etc.) that supports SOAP, WSDL, and I would presume WS-Security.
An even more interesting offshoot of this initiative from Google, is the fact that it could bring a degree of clarity to the issue of multi-protocol and multi-purpose servers (what I call Universal Servers e.g. OpenLink Virtuoso). For instance, you could manage AdWords campaigns across product portfolios using Triggers (the SQL database kind) or Notification Services.
]]>By Ingo Rammer, Microsoft MSDN Library.
In this article the author shows how you can create and use a custom security token manager with the Web Services Enhancements 2.0 for Microsoft .NET to check for X.509 certificates, map them to roles and populate context information with custom principal and identity objects.
He shows how easy it is to use WS-Policy from within Visual Studio .NET to add declarative checking of role membership to your applications. The advantage of this approach based on WS-Security when compared to classic HTTP based security is that it doesn't rely on transport-level integrity or security but instead works solely with the SOAP message. This provides you with end-to-end security capabilities over multiple hops and protocols.
http://msdn.microsoft.com/library/en-us/dnwse/html/wserolebasedsec.asp
See also WS-Security references: http://xml.coverpages.org/ws-security.html
]]>By Martin LaMonica, CNET News.com
The World Wide Web consortium, the standards body in charge of developing XML, said Tuesday that it has issued three recommendations designed to make handling XML-formatted data more efficient. The specifications have the backing of large industry software providers, including IBM, Microsoft and BEA Systems, which provide the software infrastructure to build and run XML data and Web services applications.
The W3C and vendors are looking at a variety of methods of speeding up the performance of XML, which can be slow for certain applications.
http://news.com.com/2110-1013_3-5551788.html
See also the news story:
http://xml.coverpages.org/ni2005-01-25-a.html]]>I have always believed that self-annotation will ultimately drive the realization of the semantic web vision. GuruNet is an interesting effort that should lead down this path.
Here is GuruNet's answer to the question: What Is SQL?
The Web Services, XML, and RDF angles should be pretty obvious (I hope!).
BTW - GuruNet does have a sync latency issue re. Wikipedia that it will need to address sooner rather than later.
]]>]]>
Stickiness is a defining characteristic of Web 1.0 . It's all about eyeballs (site visitors) which implied ultimately that all early Web business models ended up down the advertising route.
I always felt that Web 1.0 was akin to having a crowd of people at your reception area seeking a look at your corporate brochures, and then someone realizes that you could start selling AD space in these brochures in response to the growing crowd size and frequency of congregation. The long-term folly of this approach is now obvious, as many organizations forgot their core value propositions (expressed via product offerings) in the process and wandered blindly down the AD model cul-de-sac, and we all know what happened down there..
Web 2.0 is taking shape (the inflection is in its latter stages), and the defining characteristics of Web 2.0 are:
When you factor in all of the above, the real question is whether Google and others are equipped to exploit Web 2.0? To some degree, is the best answer at the current time as they have commenced the transition from "content only" web site to web platform (via the many Web Services initiatives that expose SOAP and REST interfaces to various services), but there is much more to this journey, and that's the devil in the "competitive landscape details".
From my obviously biased perspective, I think Virtuoso and Yukon+WinFS provide the server models for driving Web 2.0 points of presence (single server instances that implement multiple protocols). Thus, if Google, Yahoo! et al. aren't exploiting these or similar products, then they will be vulnerable over the long term to the competitve challenges that a Web 2.0 landscape will present.
]]>Mozilla, FireFox, Opera et al. are all viable alternatives. Even better, get with the Web 2.0 program using the emerging pool of RSS Readers / Web Browser hybrids (note: unfortunately many sill use IE for browsing by default).
What really gets to me is that the fact that once the ill perceived destruction of Netscape was achieved, Microsoft went into predictable mode mode with IE (nothing to kill so why innovate, I mean we only innovate to kill products that potentially re-route users away from the Windows Lock-in / technology cul-de-sac etc..).
]]>Internet Explorer Frame Injection Vulnerability âMark Laurence has discovered a 6 year old vulnerability in Microsoft Internet Explorer, allowing malicious people to spoof the content of websites. The problem is that Internet Explorer doesnât check if a target frame belongs to a website containing a malicious link, which therefore doesnât prevent one browser window from loading content in a named frame in another window. Successful exploitation allows a malicious website to load arbitrary content in an arbitrary frame in another browser window owned by e.g. a trusted site. Secunia has constructed a test, which can be used to check if your browser is affected by this issue. This vulnerability is similar to an old vulnerability fixed by MS98-020 in Internet Explorer version 3 and 4. The vulnerability has been confirmed in a fully patched Internet Explorer 6 running on Microsoft Windows XP. Other versions of Internet Explorer may also be affected. Solution: Disable the following security setting: âNavigate sub-frames across different domainsâ. [Tools/Internet Options/Security tab in an Internet Explorer windows or Internet Options/Security tab from Control Panel.] Do not visit or follow links from untrusted websites.â
..
If this "Internet Operating System" and Web 2.0 stuff is really happening, I think I've just found the filesystem we'll all be using--in one form or another.
Tim Berners-Lee provided a keynote at WWW2004 earlier this week, and Paul Ford provided a keynote breakdown from which I have scrapped a poignant excerpt that helps me illuminate Virtuoso's role in the inevitable semantic web.
First off, I see the Semantic Web as a core component of Web 2.x (a minor upgrade of Web 2.0), and I see Virtuoso as a definitive Web 2.0 (and beyond) technology, hence the use today of the branding term "Universal Server". A term that I expect to become a common product moniker in the not too distant future.
The first challenge that confronts the semantic web is the creation of Semantic content. How will the content be created? Ideally, this should come from data, at the end of the day this is a data contextualization process. The excerpt below from Paul's article highlights the point:
Rather than concerning themselves unduly with hewing to existing ontologies, Berners-Lee pushed developers to start using RDF and triples more aggressively. In particular, he wants to see existing databases exported as RDF, with ontologies created ad-hoc to match the structure of that data. Rather than using PHP scripts only to produce HTML, he suggested, create RDF as well. Then, when all of the RDF is aggregated, apply rules and see what happens. "Let's not fall back on handmade markup."
Data in existing databases does not have to be exported as RDF, especially if sensitivity to change is a specific contextual requirement. Naturally, the assumption is made that most databases don't have the ability to produce RDF so an additonal tool would be required to perform the data exports and transformation, and then a separate HTTP server makes this repurposed RDF data accessible over HTTP.
Later in the talk, he described a cascade of Semantic Web connections, postulating that one day, individuals may be able to follow links from a parts catalog to order status, from location to weather to taxes.
The final excerpt (above) outlines the kinds of interactions that the Semantic Web facilitates. The traversal from a "part catalog" to "order status", or from "location" to "weather" to "taxes", illustrates the roles that services and service orchestration will also play in the Semantic Web era.
Thus, we can safely deduce the following about the semantic web:
I would also like to conclude that what we know today, as the monolithic "point of presence" on the web called a "Web Site" (which infers browsing and page serving), is naturally going to morph into a different kind of "point of presence" that is capable of delivering the following from a single process:
This is what Virtuoso is all about, and why it is described as a "Universal Server"; a server instance that speaks many protocols, delivering a plethora of functionality (Database, Web Services Platform, Orchestration Engine, and more).
]]>Databases get a grip on XML
From Inforworld.
The
next iteration of the SQL standard was supposed to arrive in 2003. But
SQL standardization has always been a glacially slow process, so nobody
should be surprised that SQL:2003 ? now known as SQL:200n ? isn?t ready
yet. Even so, 2003 was a year in which XML-oriented data management,
one of the areas addressed by the forthcoming standard, showed up on
more and more developers? radar screens.ÃÂ >> READ MORE
This article rounds up product for 2003 in the critical area of Enterprise Database Technology. It's certainly provides an apt reflection of how Virtuoso compares with offerings from some the larger (but certainly slower to implement) database vendors in this space. As usual Jon Udell's quote pretty much sums this up:
"While the spotlight shone on the heavyweight contenders, a couple of agile innovators made noteworthy advances in 2003. OpenLink Software?s Virtuoso 3.0, which we reviewed in March, stole thunder from all three major players. Like Oracle, it offers a WebDAV-accessible XML repository. Like DB2 Information Integrator, it functions as database middleware that can perform federated ?joins? across SQL and XML sources. And like the forthcoming Yukon, it embeds the .Net CLR (Common Language Runtime), or in the case of Linux, Novell/Ximian?s Mono."
Albeit still somewhat unknown to the broader industry we have remained true our "innovator" discipline, which still remains our chosen path to market leadership. Thus, its worth a quick Virtuoso release history, and featuresÃÂ recap as we get set to up the ante even further in 2004:
1998 - Virtuoso's initial public beta release with functional emphasis on Virtual Database Engine for ODBC and JDBC Data Sources.
1999 - Virtuoso's official commercial release, with emphasis stillÃÂ on Virtual Database functionality for ODBC, JDBC accessible SQL Databases.
2000 - Virtuoso 2.0 adds XML Storage, XPath, XML Schema, XQuery, XSL-T, WebDAV, SOAP, UDDI, HTTP, Replication, Free Text Indexing (*feature update*), POP3, and NNTP support.
2002 - Virtuoso 2.7 extends Virtualization prowess beyond data access via enhancements to its Web Services protocol stack implementation by enabling SQL Stored Procedures to be published as Web Services. It also debutsÃÂ its Object-Relational engine enhancements that include theÃÂ incorporation of Java and Microsoft .NET Objects into its User Defined Type, User Defined Functions, and Stored ProcedureÃÂ offerings.
2003 - Virtuoso 3.0 extends data and application logic virtualization into the Application Server realm (basically a Virtual Application server too!), by adding support for ASP.NET, PHP, Java Server Pages runtime hosting (making applications built using any of these languages deployable using Virtuoso across all supported platforms).
Collectively each of these releases have contributed to a very premeditated architecture and vision that will ultimately unveil the inherent power of critical I.S infrastructure virtualizationÃÂ along the following lines; data storage, data access , and application logic via coherent integration of SQL, XML, Web Services, and Persistent Stored Modules (.NET, Java, and other object based component building blocks).
ÃÂ
]]>]]>RSS information routers will emerge in 2004 with the following characteristics:
? Persistent storage of XHTML full-text/graphics/audio/video of RSS feeds
? XPATH search across local and Net stores
? Self-forming and reordering subscriptions lists based on the aggregated priorities of user-chosen domain experts
? Use of IM notification for post notification to aggregate affinity groups and active conversations
? Integration of Hydra-like collaborative tools for multi-author conference transcripts
? Videoconferencing routing and broadcast/recording tools
? Integration of speech recognition and real-time indexing to allow quoting of linear audio and video streams
? Mesh networked peer-to-peer synchronization engine for item propagation across shared spaces on multiple clients, including phones; iPods; and eventually Longhorn PDAs (circa 2006).Armed with these tools, new industries will emerge in rapid succession:
? Metadata-driven directories that dynamically create RSS feeds based on affinity
? Virtual conferences
? IM/RSS presence networks for rich collaboration and e-mail replacement
? Content-generation tools based on small, routable XHTML objects
? A DRM network with enough creative and hardware support to blunt the Microsoft/RIAA DRM threat to peer-to-peer port hijacking.
It starts here with one of many excerpts from Scoble's blog:
Dave Winer, Jon Udell, and now Gerald Bauer says that Microsoft is killing the Web. Or trying to.
The guys above are pretty seasoned individuals (they save me a lot of writing too amongst other things).
Now here is a response from Microsoft?s Blog evangelist supremo Scoble to their comments and genuine concerns.
OK, let's assume that's true.
Microsoft has 55,000 employees. $50 billion or so in the bank.
Yet what has gotten me to use the Web less and less lately? RSS 2.0.
Seriously. I rarely use the browser anymore (except to post my weblog since I use Radio UserLand).
See the irony there? Dave Winer (who at minimum popularized RSS 2.0) has done more to get me to move away from the Web than a huge international corporation that's supposedly focused on killing the Web.
Now, let's look at what's really going on here. We're going back to being a great platform company. We're trying to provide a platform that lets developers build new applications that are impossible to build on other platforms. At the PDC you saw some of that. New kinds of forms. New kinds of games. New kinds of business apps. New kinds of experiences.
But, we also are looking for ways to make the Web better too. Now, we haven't talked about what we're doing with the browser. I hear that'll come later. Astute Longhorn testers have already seen that we snuck a pop-up ad blocker into the browser without telling anyone about it. Whoa. That means we're gonna turn off MSN's capabilities of selling popup ads.
I hear there's more coming too. But, why should we do it all? Wasn't the point of the past four years to get Microsoft to stop trying to do it all? The DOJ and now the European Union are still after us cause we tried to do it all. Instead, let's just go back and be a great platform company.
We just gave you a great foundation for a killer new kind of application. One that goes FAR beyond HTML. And, even if you stick with Mozilla, your experiences on Longhorn will get better. For instance, fonts are being rendered in the GPU now on Longhorn. Your Web pages will look better and behave better on Longhorn than they will on any other platform. Period.
And wait until Mozilla's and other developers start exploiting things like WinFS to give you new features that display Internet-based information in whole new ways.
If Microsoft really wants to create a better platform shouldn?t this be truly futuristic? If so, then it should issue the first major salvo by dropping the restrictions on Rotor?
We are moving into the distributed component based computing age where runtime environments (.NET CLR, Mono, J2EE, and others) act a Component Execution Junction boxes (instead of the Monolithic Operating Systems of today) in a continuum of services orchestrated by messages in response to events emanating from value consumption requests (what we call application behviour today) from a myriad of value consumers (application users).
There is no need for covert and protracted protection of an obsolete Windows Operating System (the underlying fear that keeps Rotor shackled in my opinion), since its obsolescence is in full motion as Longhorn clearly demonstrates.
Imagine a fusion of sorts across Microsoft .NET, Mono, and Rotor, with a single portable runtime as the end product (slotting nicely into its place in the imminent distributed component and services era). All the benefits of programming language independence in true glory - the ECMA-CLI is all about programming language independence. Now that would be unequivocally revolutionary, and Microsoft would actually be doing what I think it has been desperately trying to achieve for a long time; the delivery of really cool technology that seriously impact us all in a positive way without the usual World Domination Concerns.
Anyway, back to the current reality where we have covert attempts to lock us all into Windows getting more and more transparent per technology release cycle. The very antithesis of what I espoused in the last paragraph (or dream). I believe that Scoble's instincts lie in this realm too, and you never know this evangelist may turn Messiah :-)
Here's the final excerpt from Scoble?s post:
There's a whole lot of more useful stuff coming. Both for the Web and for newer Internet-centric rich-client approaches. Personally, it's about time. I'm already using the Web less and less thanks to things like RSS 2.0.
I'm watching 636 sites every day. Try to do THAT in your Web browser.
So, yes, blame it on me. I'm trying to kill the Web. Isn't it time to move on? Didn't we move on from the Apple II? Didn't we move on from DOS? Didn't we move on from Windows 3.11? Can't you see a day when we move on from the Web and get something even more fantastic? I can. Dave Winer can. Why not you? [via The Scobleizer Weblog]
If you kill the Web en route to getting us a Portable Execution Junction box from Microsoft, I think you would have served mankind pretty damned well. We won't have to gripe about Web 1.0 (Browser Driven Web) because we would be well into Web 2.0 and beyond (which doesn?t define the Web experience predominantly via browsing).
]]>
]]>
NETWORK WORLD NEWSLETTER: MARK GIBBS ON WEB APPLICATIONS
Today's focus: A Virtuoso of a server
By Mark Gibbs
One of the bigger drags of Web applications development is that building a system of even modest complexity is a lot like herding cats - you need a database, an applications server, an XML engine, etc., etc. And as they all come from different vendors you are faced with solving the constellation of integration issues that inevitably arise.
If you are lucky, your integration results in a smoothly functioning system. If not, you have a lot of spare parts flying in loose formation with the risk of a crash and burn at any moment.
An alternative is to look for all of these features and services in a single package but you'll find few choices in this arena.
One that is available and looks very promising is OpenLink's Virtuoso (see links below).
Virtuoso is described as a cross platform (runs on Windows, all Unix flavors, Linux, and Mac OS X) universal server that provides databases, XML services, a Web application server and supporting services all in a single package.
OpenLink's list of supported standards is impressive and includes .Net, Mono, J2EE, XML Web Services (Simple Object Application Protocol, Web Services Description Language, WS-Security, Universal Description, Discovery and Integration), XML, XPath, XQuery, XSL-T, WebDav, HTTP, SMTP, LDAP, POP3, SQL-92, ODBC, JDBC and OLE-DB.
Virtuoso provides an HTTP-compliant Web Server; native XML document creation, storage and management; a Web services platform for creation, hosting and consumption of Web services; content replication and synchronization services; free text index server, mail delivery and storage and an NNTP server.
Another interesting feature is that with Virtuoso you can create Web services from existing SQL Stored Procedures, Java classes,
C++ classes, and 'C' functions as well as create dynamic XML
documents from ODBC and JDBC data sources.
This is an enormous product and implies a serious commitment on the part of adopters due to its scope and range of services.
]]>Virtuoso is enormous by virtue of its architectural ambitions, but actual disk requirements are
Q: Amazon.com now runs sites and on-line operations for retailers such as Target and Toys 'R' Us. What's the future for that services business? A: It's a rapidly growing part of our business. And that goes from [large] companies that are customers of that all the way down to individuals using our Web services to tap into the fundamental platform that is Amazon.com. They can build their own applications very effectively. It's almost closer to an ecosystem. Q: So Amazon is becoming a kind of software platform a bit like Microsoft (MSFT )? A: People are building stuff that surprises us. That's what's so interesting about this. We've built this big base of technology to serve ourselves, and now we're opening it up and letting people access it. They're taking these fundamental pieces and building completely new things that not only would we have never gotten around to but in some cases maybe never even have thought of. There are thousands of developers who are building applications using Amazon Web services. The sky's the limit on their creativity. Q: What arises from all those efforts? A: People will be able to build very powerful applications by hooking together a whole bunch of Web services from a whole bunch of different companies. Q: What benefit is Amazon.com getting from this? A: It's too early to say. It's certainly not a major source of revenue for us. But when people use our Web services, they give us credit for that. That turns out to be very helpful.A few years ago the race was on to simply have a Web Site, then this requirement evolved into a requirement for a database driven site. Today we are seeing the final stages of the Web 2.0 inflection which will inevitably change the focus toward the need for a Point of Presence on the Web for exposing or invoking Web Services and/or Syndicating or Subscribing to XML based content. ]]>
EBay has a new developer evangelist (Jeffrey McManus) and today he started a weblog. [via The Scobleizer Weblog]
I found out that eBay are no longer Web Services laggards relative to Google and Amazon.
What is the eBay API?
The Application Programming Interface (API) is the heart of the Developers Program. Normally, users buy and sell items using the eBay online interface, interacting with eBay directly. But with the eBay API, you communicate directly with the eBay database in XML format. By using the API, your application can provide a custom interface, functionality and specialized operations not otherwise afforded by the eBay interface.
Using the API, you can create programs that:
Because the API is not dependent on the eBay user interface, it allows you to create stable, custom functionality and interfaces that best meet your business needs.
The definition of Web Services has just gotten simpler:
Web Services define technology that let's you buy and sell from eBay or Amazon without browsing either site.
Cool!
]]>In the year 2000 the question of the shape and form of XML data was unclear to many, and reading the article below basically took me back in time to when we released Virtuoso 2.0 (we are now at release 3.0 commercially with a 3.2 beta dropping any minute).
RSS is a great XML application, and it does a great job of demonstrating how XML --the new data access foundation layer-- will galvanize the next generation Web (I refer to this as Web 2.0.).
RSS: INJAN (It's not just about news)
RSS is not just about news, according to Ian Davis on rss-dev.
He presents a nice list of alternatives, which I reproduce here (and to which I�d add, of course, bibliography management)
- Sitemaps: one of the S�s in RSS stands for summary. A sitemap is a summary of the content on a site, the items are pages or content areas. This is clearly a non-chronological ordering of items. Is a hierarchy of RSS sitemaps implied here � how would the linking between them work? How hard would it be to hack a web browser to pick up the RSS sitemap and display it in a sidebar when you visit the site?
- Small ads: also known as classifieds. These expire so there�s some kind of dynamic going on here but the ordering of items isn�t necessarily chronological. How to describe the location of the seller, or the condition of the item or even the price. Not every ad is selling something � perhaps it�s to rent out a room.
- Personals: similar model to the small ads. No prices though (I hope). Comes with a ready made vocabulary of terms that could be converted to an RDF schema. Probably should do that just for the hell of it anyway � gsoh
- Weather reports: how about a week�s worth of weather in an RSS channel. If an item is dated in the future, should an aggregator display it before time? Alternate representations include maps of temperature and pressure etc.
- Auctions: again, related to small ads, but these are much more time limited since there is a hard cutoff after which the auction is closed. The sequence of bids could be interesting � would it make sense to thread them like a discussion so you can see the tactics?
- TV listings: this is definitely chronological but with a twist � the items have durations. They also have other metadata such as cast lists, classification ratings, widescreen, stereo, program type. Some types have additional information such as director and production year.
- Top ten listings: top ten singles, books, dvds, richest people, ugliest, rear of the year etc. Not chronological, but has definate order. May update from day to day or even more often.
- Sales reporting: imagine if every department of a company reported their sales figures via RSS. Then the divisions aggregate the departmental figures and republish to the regional offices, who aggregate and add value up the chain. The chairman of the company subscribes to one super-aggregate feed.
- Membership lists / buddy lists: could I publish my buddy list from Jabber or other instant messengers? Maybe as an interchange format or perhaps could be used to look for shared contacts. Lots of potential overlap with FOAF here.
- Mailing lists: or in fact any messaging system such as usenet. There are some efforts at doing this already (e.g. yahoogroups) but we need more information � threads; references; headers; links into archives.
- Price lists / inventory: the items here are products or services. No particular ordering but it�d be nice to be able to subscribe to a catalog of products and prices from a company. The aggregator should be able to pick out price rises or bargains given enough history.
Thus, if we can comprehend RSS (the blog article below does a great job) we should be able to see the fundamental challenges that are before any organization seeking to exploit the potential of the imminent Web 2.0 inflection; how will you cost-effectively create XML data from existing data sources? Without upgrading or switching database engines, operating systems, programming languages? Put differently how can you exploit this phenomenon without losing your ever dwindling technology choices (believe me choices are dwindling fast but most are oblivious to this fact).
Â
xmlrsssyndication]]>* The sluggish U.S. economy has impacted the investments by many companies in Web services projects, but it has not killed these projects, according to a survey by Gartner.
]]>Um, Joi, did you have some bad sushi before you wrote this?
Let me explain why you're wrong.
First of all, Microsoft is investing a LOT in "non PC devices." So, even if you're right that the desktop will become less important (hint: you're not), I don't think you can count Microsoft out, or say it'll become less relevant.
Second of all, TONS of people are getting camera phones. What's the first thing they do? Post them on a web site, right? OK. So far, camera phones + server means that the desktop is outta the picture, right? But, where do people view those camera phone pictures? I'll tell you where I look at Chris Pirillo's moblog, for instance: on my Tablet PC.
So, how again did the new device that came along decrease the relevance of the desktop?
Now, I predict Joi's answer will be that Japanese kids don't use PCs and they just use cell phones for everything. Well, sorry. Viewing a photo off of one of those new Nikon multi-mega-pixel pro cameras on a small cell phone screen just isn't my idea of fun. And trying to type ASCII characters into a weblog on a cell phone's keypad ain't my idea of fun either (and, yes, I've played with the latest in phones -- a co-worker just brought a bunch back from Tokyo). The fact that some kid somewhere is doing that, doesn't prove a thing.
But, I've been corrupted. I can predict the future a bit since I've seen a ton of secret stuff inside Microsoft. I certainly don't think the desktop becomes less relevant. In fact, to a whole raft of users, the desktop (or, the Tablet top, if you will) will be more important in 2005, not less.
"A Manager's Guide," as the title suggests, is the perfect pragmatic guide for managing a current web-services project. If you want to know what works today, right down to the specific products from individual vendors, Anne's book is the one to buy. .NET versus Java? Which J2EE platform or UDDI registry server? The current state of the basic protocols: SOAP, WSDL, UDDI? You'll find the answers in one place. As with my book, there are no code fragments or XML listings. It's for managers, not programmers. But this book is the one to buy for your tactical requirements.
"Loosely Coupled," on the other hand, takes a more strategic view, and in a sense picks up where Anne's book leaves off. I don't explain any of the protocols. In fact I rarely mention them by name. I assume (a) you'll learn about them somewhere else (such as from Anne's book), and (b) they'll change quickly anyway. Anne has a 30-page chapter on "Advanced Web-Services Standards," which is where my book kicks in. As the subtitle suggests, I look more deeply at the missing pieces of web services: transactions, security, reliable asynchronous messaging, orchestration and choreography, QoS, contracts and other business issues, infrastructure, and the big one: industry-specific semantics.
Both books cover the fundamental concepts of web services such as service-oriented architectures. Anne, however, sees web services as being fundamentally about application integration, which clearly is the sweet spot today. I look at the issues surrounding inter-organizational loosely coupled web services, taking a longer-term and more strategic view. If you're thrust into managing a web-services project, need to ramp-up quickly, select vendors and products, and be able to communicate with your developers, buy Anne's book. If you need to develop a long-term web-services strategy for your organization, buy mine. In other words: buy them both. I think you'll like the combination.
]]>Tim O'Reilly wrote some thoughts about network aware software. Good sumup and nice ideas, why not only blogs should be net-aware (and where even blogs can be improved ;) )
"For the desktop, my personal vision is to see existing software instrumented to become increasingly web aware. It seems that Apple are doing a good job with this. (What does web aware mean for me? Being able to grok URIs, speaking WebDAV, and using open standard data formats.)" -- Edd Dumbill[via Bitflux Blog]
Rendezvous-like functionality for automatic discovery of and potential synchronization with other instances of the application on other computers. Apple is showing the power of this idea with iChat and iTunes, but it really could be applied in so many other places. For example, if every PIM supported this functionality, we could have the equivalent of "phonester" where you could automatically ask peers for contact information. Of course, that leads to guideline 2.
Another application is discovery of ODBC data sources, and database servers. Rendezvous can also simply security and administration of data sources accessible by either one of these standards data access mechanisms. It can also apply to XML databases and data sources exposed by XML Databases.
The very point I continue to make about Internet Points of Presence beingactual data acces points, in short these end points should be served by database serverprocesses. This is the very basis of Virtuoso, the inevitability of this realization remains the undepinings of this product. There are other products out there that have some sense of this vision too, but there is a little snag (at least so far in my research efforts), and that is the tendency to create dedicated independent server per protocol (an ultimate integration, administration, and maintenance nightmare).
Ahh, Doc Searls is covering the Corporate Weblogging thing.
Personally, I think corporate weblogging is a non-event. For instance? Am I a corporate weblogger? I don't think so. I don't have Microsoft's executive blessing for this.
The blessing isn't the point. Corporations have always blogged (or attempted to, they just never called it blogging, or simply lacked cohesive technology to make the concept gel). Every second of the day in any corporation data come in, and goes out (after numerous transformations across a plethora of contexts).
Every corporation knows that it has to create, persist, and disseminate knowledge, and like the Internet, Web, XML, Web Services, and now Blogging, technology is simply catching up in a somewhat standardized form.
Funny, I was talking with my boss's boss today. Vic Gundotra (General Manager of Platform Evangelism). I asked him "so, from a Microsoft's exec point of view, what would you like me to do on my weblog?"
He answered: "I don't want to tell you what to do, because anything I tell you will only screw it up and make it boring."
Oh, you mean like Eric Rudder's weblog? Now I'm in trouble... ;-)
[via The Scobleizer Weblog]
Your boss was right on every count :-)
]]>This Blog Site is actually powered by Virtuoso 3.2 (has been doing so prior to the announcement). Hmm. product utilization preceding press release? Why not?
OpenLink adds Weblog client and server functionality to
Virtual Database Engine for SQL, XML, and Web Services
Burlington, MA. June 25, 2003 - OpenLink Software, Inc., a leading provider of universal data access and enterprise information integration middleware, announces Virtuoso 3.2 the latest edition of its cross platform Virtual Database for SQL, XML, and Web Services for Mac® OS X.
The new release incorporates full client and server support for the Blogger, Moveable Type, and MetaWeblog APIs, providing users with choice over location, format, data storage, development environment, and host operating system, for personal, community, and corporate Weblogs. The new release also facilitates the transparent integration of Weblog data with other enterprise data sources.
Putting together the community site took 5 minutes and it basically involved the following steps:
1. Standard installation from installer program (Mac OS X in this case, but Windows, Linux, and UNIX supported)
2. Creation of WebDAV user account for WebDAV repository (where all the gems reside)
3. Clicking on the "Generate Web Site" button situated in the Weblog menu tree with the Virtuoso HTML based Admin UI
4. Filled up my channel and blogrolls by asking Virtuoso to use its very old web content aggregation functionality
5. Setup my upstreams (so that I post once and propagate to my numerous blog sites on a conditional basis)
6. Create a Virtuoso HTTP Virtual Domain for the community/personal Blog
7. Start blogging using any Blog Client that supports; Blogger API, MetaWeblog, or Moveable Type
No more no less. Most importantly I have a choice of programming languages (VSP, VSX, PHP, ASP.NET, JSP, Perl, Python), operating systems, and databases that constitute the shape and form of my blog home.
See the Virtuoso FAQ for how this all comes together.
]]>RSS feeds are everywhere, and they are changing the Web landscape fast. The Web is shifting from distributed freeform database, to distributed semi-structured database.
Amazon.com RSS Feeds They never got around to it, so we set up 160+ separate RSS channels for darn near every type of product on Amazon.com for you. If you have any feedback for this new (free) service, please let us know immediately! We're looking to make it an outstanding and permanent part to your collection. Enjoy! (Chris) [via Lockergnome's Bits and Bytes]
Your Web Site is gradually becoming a database (what?). Yes, your Web Site needs to be driven by database software that can rapidly create RSS feeds for your organizations non XML and XML data sources. Your web site needs to provide direct data access to users, bots, Web Services.
Here is my blog database for instance, you can query the XML data in this database using XQuery, XPath, and Web Services (if I decide to publish any of my XML Query Templates as Web Services).
Note the teaser here, each XML document is zero bytes! This is becuase these are live Virtuoso SQL-XML documents that are producing a variety of XML documents on the fly, which means that they retain a high degree of sensitivity to changes in the underlying databases supplying the data. I could have chosen to make these persistent XML docs with interval based synchronization with the backen data sources (but I chose not to for maximum effect).
As you can see SQL and XML (Relational and Hierarchical Models) engines can co-exist in a single server, ditto Object-Relational (which might be hidden from view but could be used in the SQL that serves the SQL-XML docs), ditto Full Text (see the search feature of this blog) and finally, ditto directed graph model for accessing my RDF data.(more on this as the RDF data pool increases).
]]>Just yesterday we had an article about how Amazon's technology was becoming their biggest product, but that could soon change as people continue to innovate around Amazon's web services offering, letting just about anyone access Amazon's vast database, and built interesting and useful applications on it. When they originally launched this offering a number of developers thought it was cool, but weren't sure what could actually be done with it. However, given some time, data, and an open API, creative developers are always going to come up with interesting solutions. I don't know if any of these are really a "killer app" yet, but Amazon now has a vision of being the "e-commerce platform" for the world. There's something appealing about that notion. If, anytime you wanted to sell something on your website, you could easily hook into Amazon's catalog, transaction processing, and fulfillment process, there are some interesting possibilities. Right now, it's just simple things, such as creating a way to automatically match up the top song titles being played on the radio with those CDs at Amazon. In the future, though, you could see how an even bigger and more powerful Amazon could become something of a central "bucket of e-commerce" which many other sites pull from in creative ways. So, then, the question becomes how big is this opportunity, really? As I said, it's an appealing idea, but how many people actually buy through these sorts of applications vs. those who just go to Amazon and buy it themselves. The "killer app" built on top of Amazon would need to have really compelling reasons to buy directly through it - and I don't think anyone's gotten that far yet. [via Techdirt]
There is nothing wrong with embracing Open Standards. Amazon is demonstrating
]]>Amazon has pretty much got it right!
The perennial question re. Web Services has how does one define Web Services in simple terms. My response has always been:
The ability to interact with a Web Point of Presence without visual navigation. A good example being the ability to send the "amazon.com" site a message in order to order a book instead of physically navigating to the site.
This has been my definition since 2001 long before Amazon implemented it's Web Services APIs.
In recent times I came a cross this post in the general blogsphere at Ecademy(sheer coincedence I might add. I wasn't looking for it, but that's what this emerging semantic web experience is all about):
I thought I'd kick off that old chestnut - "What is a web service?" - again with the definition according to the W3C. They should know ... shouldn't they ...
A Web service is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML. Its definition can be discovered by other software systems. These systems may then interact with the Web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols.
http://www.w3.org/TR/2002/WD-ws-arch-20021114/#whatisws
Accurate, but kind of obscure for the none technical reader.
Sofware companies always seek to reach the land of critical mass (this is the single destination of every software vendor), and critical mass implies the creation of an ecosystem served by the software vendor (Microsoft is king of critical mass and this is the secret of their success!).
Amazon as an eCommerce pioneer has pretty much figure this out (their patent pounding sometime compromises this reality, I certainly don't like this part of their behavior), and they have correctly used Web Services as the vehicle.
Google has pretty much figured this out too, and before Amazon I might add.
Amazon's Software Emerges As Valuable Product I'm surprised that it's taken people this long to realize that the most valuable part of Amazon.com's business might not be their stores, but their ability to run stores for others. Amazon.com still has, by far, some of the best technology out there for running an e-commerce site. In the early days of e-commerce, any good online shopping innovation was quickly copied, but more recently it seems that no one has been able to keep up with Amazon's advancements. It's not clear if this is due to Amazon's patent-crazy nature, or if most others have simply given up the fight. Either way, Amazon is doing their best to capitalize on their technology lead, and it seems that there's no shortage of willing customers. [via Techdirt]
See this futuristic piece (How Google beat Amazon and eBay to the Semantic Web) that sheds some speculative light on how this could play out.
The next release of SQL Server promises increased developer productivity and reduced DBA workload.
by Roger Jennings June 2003 Issue .NET Magazine
After reading this article I decided to put together a simple comparitive analysis of our existing product and the soon to be released Yukon.
Our Universal Server product called Virtuoso will compete head on with this future release of SQL Server in many regards (.NET CLR hosting, Native XML Types, SQL-XML, XMLA, Web Services etc.), but I am also keen to see what interesting perspectives Microsoft's implementation brings to the table. Here is a summary comparison, note that some of the hyperlinks in the table below actually take you to live functionality demos (for effect these links point to a Linux server, and you can change the machine part of the url from "demo" to "kingsleydemo" to see the equivalent demos on an XP server).
Ingres (technically, Advantage Ingres Enterprise) is, arguably, the forgotten database. There used to be five major databases: Oracle, DB2, Sybase, Informix and Ingres. Then along came Microsoft and, if you listened to most press comment (or the lack of it), you would think that there were only two of these left, plus SQL Server. [From IT-Director]
]]>Oracle, Microsoft, and IBM would certainly like the illusion of a 3 horse race, as this is the only way they can induce Ingres, Informix, and Sybase users to jump ship, and this, even though database migrations are by far the most risk prone and problematic aspects of any IT infrastructure.
Here is the interesting logic from the self-made big three, if you want to take advanatage of new paradigms and technologies such as XML, Web Services, and anything else in the pipeline you have to move all your data out of these databases, and then get all the mission critical applications re-associated with one of these databases, and by the way when you do so it is advisable that you use native interfaces (so that sometime in the future you have no chance whatsoever of repeating this folly at their expense).
The simple fact of the matter (which the self-made big three do not want you to know) is that you can put ODBC, JDBC, even platform specific data access APIs such as OLE DB and ADO.NET atop any of these databases, and then explore and exploit the benefits of new technologies and paradigms as long as the tool pool supports one of more of these standards.
Unfortunately the no-brainer above appears to be the more difficult of the choices before decision makers. In other words, many would rather dig themselves into a deeper hole (unknowingly i can only presume) that ultimately leads to technology lock-in.
The biggest challenge before any RDBMS based infrastructure today isn't which of the self-made big three to migrate to wholesale, rather, how to make progressive use of the pool of disparate applications, and application databases that proliferate the enterprise.
This is another way of understanding the burgeoning market for Virtual Databases, which in my opiion present the new frontier in database technology.
Key quote.
Since the dawn of the database era more than three decades ago, enterprises have been amassing an ever-increasing volume of information - both current and historical - about their operations. For the past two of those three decades, the database world has struggled with the problem of somehow integrating information that natively resides in multiple database systems or other information sources (Landers and Rosenberg).
This is the root cause of many of the systems integration challenges facing may IT decsion makers. They want to exploit the new and emerging technologies, but the internal disparity of data and application logic presents many obstacles.
Michael had this to say in his introduction.
The IT world knows this problem today as the enterprise information integration (EII) problem: enterprise applications need to be able to easily access and combine information about a given business entity from a distributed and highly varied collection of information sources. Relevant sources include various relational database systems (RDBMSs); packaged applications from vendors such as Siebel, PeopleSoft, SAP, and others; "homegrown" proprietary systems; and an increasing number of data sources that are starting to speak XML, such as XML files and Web services.
Virtuoso (which coincedentally has been used to build and host this blog) has been developed to address the challenges presented above; by providing a Virtual Database Engine for disparate data and application logic (all the GEMs on this page have been generated on the fly using it's SQL-XML functionality).
Additional article excerpts:
With XQuery, the solution sketched above can be implemented by viewing the enterprise's different data sources all as virtual XML documents and functions. XQuery can stitch the distributed customer information together into a comprehensive, reusable base view.
A critical issue at this point is how sensistive the XML VIEW is to underlying data source changes. Enterprises are dynamic, so static XML VIEWs are going to be suboptimal in many situations. Applications are only as relevant as the underlying data fluidity served up by the data access (this issue is data format agnostic).
Virtuoso addresses this problem through its support of Persistent and Transient forms of XML VIEWs (which are derived from SQL, XML, Web Services, or any combination of these).
Final excerpt:
The relational data sources can be exposed using simple default XML Schemas, and the other sources - SAP and the credit-checking Web service - can be exposed to XQuery as callable XQuery functions with appropriate signatures.
Unfortunately XML Schemas aren't easy, so making this a requirement for producing XML VIEWs is somewhat problematic (or should I say challenging). Of course this approach has it merits, but it does put a significant knowledge acquisition burden on the end-user or developer. This is why Virtuoso also supports an approach based on SQL extensions for generating XML from SQL that facilitate the production of Well Formed and/or Valid XML documents on the fly from heterogeneous SQL Data Sources (this syntax is identical to the FOR XML RAW | AUTO | EXPLICIT modes of SQL Server). It can also use it's in-built XSL-T engine to further transform other non SQL XML data sources (and then generate an XML Schema for the final product if required and validate against this schema using it's in-build XML Schema validaton engine).
This article certainly sheds light on the kinds of problems that EII based technologies such as Virtual Databases are positioned to address.
There is a live XQuery demo of Virtuoso at: http://demo.openlinksw.com:8890/xqdemo
]]>When will these guys get? You don't implement industry standards in order to become product or vendor dependent. Web Services support should not reduce choice of Application Servers. I guess we need to show them what I mean via our eCRM; it services will be SOAP consumable via a WSDL file and that's it.
]]>]]>A Web service is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML. Its definition can be discovered by other software systems. These systems may then interact with the Web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols.
]]>A Web service is a software system identified by a URI, whose public interfaces and bindings are defined and described using XML. Its definition can be discovered by other software systems. These systems may then interact with the Web service in a manner prescribed by its definition, using XML based messages conveyed by Internet protocols.
IBM TO SHIP DB2 INTEGRATION SOFTWARE
Posted May 15, 2003 4:46 PM Pacific Time
IBM on Tuesday plans to announce availability of its DB2 Information Integrator software, for integrating and analyzing multiple forms of information, the company acknowledged on Thursday.
In beta since February, the software is intended to enable customers to manage centrally data, text, images, photos, video and audio files stored in different databases, according to IBM. XML content and Web services also are supported.
Interesting Quote:
"If we move to information as a utility for giant data grids, this is key technology for hiding or making unimportant the location and type of data. This software enables the data to be accessed transparently wherever it might be," Jones said.
Product Pricing
DB2 Information Integrator will be available for $20,000 per processor and $15,000 per data source connector.
Detail will also be available on Tuesday.
The cost for a bulk adapter license is about $75,000. If change capture is involved, the adapter license costs about $150,000. Real-time integration costs are mips-based, with a starting cost of about $300,000. One adapter can be used to translate and make native calls to all environments.
Very interesting pricing!
For the full story: http://www.infoworld.com/article/03/05/15/HNdb2integrate_1.html
]]>