Details

Kingsley Uyi Idehen
Lexington, United States

Subscribe

Post Categories

Subscribe

E-Mail:

Recent Articles

Display Settings

articles per page.
order.
Web 2.0's Open Data Access Conundrum

Open Data Access and Web 2.0 have a very strange relationship that continues to blur the lines of demarcation between where Web 2.0 ends and where Web.Next (i.e Web 3.0, Semantic/Data Web, Web of Databases etc.) starts. But before I proceed, let me attempt to define Web 2.0 one more time:

A phase in the evolution web usage patterns that emphasizes Web Services based interaction between “Web Users” and “Points of Web Presence” over traditional “Web Users” and “Web Sites” based interaction. Basically, a transition from visual site interaction to presence based interaction.

BTW - Dare Obasanjo also commented about Web usage patterns in his post titled: The Two Webs. Where he concluded that we had a dichotomy along the lines of: HTTP-for-APIs (2.0) and HTTP-for-Browsers (1.0). Which Jon Udell evolved into: HTTP-Services-Web and HTTP-Intereactive-Web during our recent podcast conversation.

With definitions in place, I will resume my quest to unveil the aforementioned Web 2.0 Data Access Conundrum:

  • Emphasis on XML's prowess in the realms of Data and Protocol Modeling alongside Data Representation. Especially as SOAP or REST styles of Web Services and various XML formats (RSS 0.92/1.0/1.1/2.0, Atom, OPML, OCS etc.) collectively define the Web 2.0 infrastructure landscape
  • Where a modicum of Data Access appreciation and comprehension does exist it is inherently compromised by business models that mandate some form of “Walled Gardens” and “Data Silos”
  • Mash-ups are a response to said “Walled Gardens” and “Data Silos” . Mash-ups by definition imply combining things that were not built for recombination.

As you can see from the above, Open Data access isn't genuinely compatible with Web 2.0.

We can also look at the same issue by way of the popular M-V-C (Model View Controller) pattern. Web 2.0 is all about the “V” and “C” with a modicum of “M” at best (data access, open data access, and flexible open data access are completely separate things). The “C” items represent application logic exposed by SOAP or REST style web services etc. I'll return to this later in this post.

What about Social Networking you must be thinking? Isn't this a Web 2.0 manifestation? Not at all (IMHO). The Web was developed / invented by Tim Berners-Lee to leverage the “Network Effects” potential of the Internet for connecting People and Data. Social Networking on the other hand, is simply one of several ways by which construct network connections. I am sure we all accept the fact that connections are built for many other reasons beyond social interaction. That said, we also know that through social interactions we actually develop some of our most valuable relationships (we are social creatures after-all).

The Web 2.0 Open Data Access impedance reality is ultimately going to be the greatest piece of tutorial and usecase material for the Semantic Web. I take this position because it is human nature to seek Freedom (in unadulterated form) which implies the following:

  • Access Data from a myriad of data sources (irrespective of structural differences at the database level)
  • Mesh (not Mash) data in new and interesting ways
  • Share the meshed data with as many relevant people as possible for social, professional, political, religious, and other reasons
  • Construct valuable networks based on data oriented connections

Web 2.0 by definition and use case scenarios is inherently incompatible with the above due to the lack of Flexible and Open Data Access.

If we take the definition of Web 2.0 (above) and rework it with an appreciation Flexible and Open Data Access you would arrive at something like this:

A phase in the evolution of the web that emphasizes interaction between “Web Users” and “Web Data” facilitated by Web Services based APIs and an Open & Flexible Data Access Model “.


In more succinct form:

A pervasive network of people connected by data or data connected by people.


Returning to M-V-C and looking at the definition above, you now have a complete of ”M“ which is enigmatic in Web 2.0 and the essence of the Semantic Web (Data and Context).

To make all of this possible a palatable Data Model is required. The model of choice is the Graph based RDF Data Model - not to be mistaken for the RDF/XML serialization which is just that, a data serialization that conforms to the aforementioned RDF data model.

The Enterprise Challenge

Web 2.0 cannot and will not make valuable inroads into the the enterprise because enterprises live and die by their ability to exploit data. Weblogs, Wikis, Shared Bookmarking Systems, and other Web 2.0 distributed collaborative applications profiles are only valuable if the data is available to the enterprise for meshing (not mashing).

A good example of how enterprises will exploit data by leveraging networks of people and data (social networks in this case) is shown in this nice presentation by Accenture's Institute for High Performance Business titled: Visualizing Organizational Change.

Web 2.0 commentators (for the most part) continue to ponder the use of Web 2.0 within the enterprise while forgetting the congruency between enterprise agility and exploitation of people & data networks (The very issue emphasized in this original Web vision document by Tim Berners-Lee). Even worse, they remain challenged or spooked by the Semantic Web vision because they do not understand that Web 2.0 is fundamentally a Semantic Web precursor due to Open Data Access challenges. Web 2.0 is one of the greatest demonstrations of why we need the Semantic Web at the current time.

Finally, juxtapose the items below and you may even get a clearer view of what I am an attempting to convey about the virtues of Open Data Access and the inflective role it plays as we move beyond Web 2.0:

Information Management Proposal - Tim Berners-Lee
Visualizing Organizational Change - Accenture Institute of High Performance Business

# PermaLink Comments [0]
09/02/2006 16:47 GMT-0500 Modified: 11/16/2006 15:51 GMT-0500
Data Spaces and Web of Databases

Note: An updated version of a previously unpublished blog post:

Continuing from our recent Podcast conversation, Jon Udell sheds further insight into the essence of our conversation via a “Strategic Developer” column article titled: Accessing the web of databases.

Below, I present an initial dump of a DataSpace FAQ below that hopefully sheds light on the DataSpace vision espoused during my podcast conversation with Jon.

What is a DataSpace?

A moniker for Web-accessible atomic containers that manage and expose Data, Information, Services, Processes, and Knowledge.

What would you typically find in a Data Space? Examples include:

  • Raw Data - SQL, HTML, XML (raw), XHTML, RDF etc.

  • Information (Data In Context) - XHTML (various microformats), Blog Posts (in RSS, Atom, RSS-RDF formats), Subscription Lists (OPML, OCS, etc), Social Networks (FOAF, XFN etc.), and many other forms of applied XML.
  • Web Services (Application/Service Logic) - REST or SOAP based invocation of application logic for context sensitive and controlled data access and manipulation.
  • Persisted Knowledge - Information in actionable context that is also available in transient or persistent forms expressed using a Graph Data Model. A modern knowledgebase would more than likely have RDF as its Data Language, RDFS as its Schema Language, and OWL as its Domain Definition (Ontology) Language. Actual Domain, Schema, and Instance Data would be serialized using formats such as RDF-XML, N3, Turtle etc).

How do Data Spaces and Databases differ?
Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled.

How do Data Spaces and Content Management Systems differ?
Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity.

How do Data Spaces and Knowledgebases differ?
A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models.

What Architectural Components make up a Data Space?

  • ORDBMS Engine - for Data Modeling agility (via complex purpose specific data types and data access methods), Data Atomicity, Data Concurrency, Transaction Isolation, and Durability (aka ACID).

  • Virtual Database Engine - for creating a single view of, and access point to, heterogeneous SQL, XML, Free Text, and other data. This is all about Virtualization at the Data Access Level.
  • Web Services Platform - enabling controlled access and manipulation (via application, service, or protocol logic) of Virtualized or Disparate Data. This layer handles the decoupling of functionality from monolithic wholes for function specific invocation via Web Services using either the SOAP or REST approach.

Where do Data Spaces fit into the Web's rapid evolution?
They are an essential part of the burgeoning Data Web / Semantic Web. In short, they will take us from data “Mash-ups” (combining web accessible data that exists without integration and repurposing in mind) to “Mesh-ups” (combining web accessible data that exists with integration and repurposing in mind).

Where can I see a DataSpace along the lines described, in action?

Just look at my blog, and take the journey as follows:

What about other Data Spaces?

There are several and I will attempt to categorize along the lines of query method available:
Type 1 (Free Text Search over HTTP):
Google, MSN, Yahoo!, Amazon, eBay, and most Web 2.0 plays .

Type 2 (Free Text Search and XQuery/XPath over HTTP)
A few blogs and Wikis (Jon Udell's and a few others)

Type 3 (RDF Data Sets and SPARQL Queryable):
Type 4 (Generic Free Text Search, OpenSearch, GData, XQuery/XPath, and SPARQL):
Points of Semantic Web presence such as the Data Spaces at:

What About Data Space aware tools?

  • OpenLink Ajax Toolkit - provides Javascript Control level binding to Query Services such as XMLA for SQL, GData for Free Text, OpenSearch for Free Text, SPARQL for RDF, in addition to service specific Web Services (Web 2.0 hosted solutions that expose service specific APIs)
  • Semantic Radar - a Firefox Extension
  • PingTheSemantic - the Semantic Webs equivalent of Web 2.0's weblogs.com
  • PiggyBank - a Firefox Extension
# PermaLink Comments [1]
08/28/2006 19:38 GMT-0500 Modified: 09/04/2006 18:58 GMT-0500
Standards as social contracts

Standards as social contracts: "Looking at Dave Winer's efforts in evangelizing OPML, I try to draw some rough lines into what makes a de-facto standard. De Facto standards are made and seldom happen on their own. In this entry, I look back at the history of HTML, RSS, the open source movement and try to draw some lines as to what makes a standard.

"

(Via Tristan Louis.)

I posted a comment to the Tristan Louis' post along the following lines:

Analysis is spot on re. the link between de facto standardization and bootstrapping. Likewise, the clear linkage between boostrapping and connected communities (a variation of the social networking paradigm).

Dave built a community around a XML content syndication and subscription usecase demo that we know today as the blogosphere. Superficially, one may conclude that Semantic Web vision has suffered to date from a lack a similar bootstrap effort. Whereas in reality, we are dealing with "time and context" issues that are critical to the base understanding upon which a "Dave Winer" style bootstrap for the Semantic Web would occur.

Personally, I see the emergence of Web 2.0 (esp. the mashups phenomenon) as the "time and context" seeds from which the Semantic Web bootstrap will sprout. I see shared ontologies such as FOAF and SIOC leading the way (they are the RSS 2.0's of the Semantic Web IMHO).

# PermaLink Comments [0]
07/04/2006 17:25 GMT-0500 Modified: 07/04/2006 14:53 GMT-0500
SPARQL Parameterized Queries (Virtuoso using SPARQL in SQL)

SPARQL with SQL (Inline)

Virtuoso extends its SQL3 implementation with syntax for integrating SPARQL into queries and subqueries.Thus, as part of a SQL SELECT query or subquery, one can write the SPARQL keyword and a SPARQL query as part of query text processed by Virtuoso's SQL Query Processor.

Example 1 (basic) :

Using Virtuoso's Command line or the Web Based ISQL utility type in the following (note: "SQL>" is the command line prompt for the native ISQL utility):

SQL> sparql select distinct ?p where { graph ?g { ?s ?p ?o } };

Which will return the following:

   p varchar
     ----------
     http://example.org/ns#b
     http://example.org/ns#d
     http://xmlns.com/foaf/0.1/name
     http://xmlns.com/foaf/0.1/mbox
     ...   

Example 2 (a subquery variation):

SQL> select distinct subseq (p, strchr (p, '#')) as fragment
 from (sparql select distinct ?p where { graph ?g { ?s ?p ?o } } ) as all_predicates
 where p like '%#%' ;
     fragment varchar
     ----------
     #query
     #data
     #name
     #comment
     ...

Parameterized Queries:

You can pass parameters to a SPARQL query using a Virtuoso-specific syntax extension. '??' or '$?' indicates a positional parameter similar to '?' in standard SQL. '??' can be used in graph patterns or anywhere else where a SPARQL variable is accepted. The value of a parameter should be passed in SQL form, i.e. this should be a number or an untyped string. An IRI ID can not be passed, but an absolute IRI can. Using this notation, a dynamic SQL capable client (ODBC, JDBC, ADO.NET, OLEDB, XMLA, or others) can execute parametrized SPARQL queries using parameter binding concepts that are common place in dynamic SQL. Which implies that existing SQL applications and development environments (PHP, Ruby, Python, Perl, VB, C#, Java, etc.) are capable of issuing SPARQL queries via their existing SQL bound data access channels against RDF Data stored in Virtuoso.

Note: This is the Virtuoso equivalent of a recently published example using Jena (a Java based RDF Triple Store).

Example:

Create a Virtuoso Function by execting the following:

SQL> create function param_passing_demo ();
 {
        declare stat, msg varchar;
        declare mdata, rset any;
        exec ('sparql select ?s where { graph ?g { ?s ?? ?? }}',
                        stat, msg,
                        vector ('http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#int1',
                                   4 ), -- Vector of two parameters 
                        10,                     -- Max. result-set rows
                        mdata,          -- Variable for handling result-set metadata
                        rset            -- Variable for handling query result-set
                 ); 
     return rset[0][0];
 }

Test new "param_passing_demo" function by executing the following:
SQL> select param_passing_demo ();

Which returns:

callret VARCHAR
 _______________________________________________________________________________
http://www.w3.org/2001/sw/DataAccess/tests/data/Sorting/sort-0#four
1 Rows. -- 00000 msec.

Using SPARQL in SQL Predicates:

A SPARQL ASK query can be used as an argument of the SQL EXISTS predicate.

create function sparql_ask_demo () returns varchar
  {
                if (exists (sparql ask where { graph ?g { ?s ?p 4}})) return 'YES';
                else return 'NO';
   };


Test by executing:

SQL> select sparql_ask_demo ();

Which returns:

_________________________
YES
# PermaLink Comments [0]
05/11/2006 18:54 GMT-0500 Modified: 06/22/2006 08:56 GMT-0500
Swoogle knows how Semantic Web ontologies are used

Swoogle knows how Semantic Web ontologies are used: "

The Dublin Core Metadata Initiative is updating the RDF expression of DC and might add range restrictions to some properties. Mikael Nilsson wondered if we would use the Swoogle Semantic Web search engine to see what types of values are being used with DC properties.

This kind of query is just the ticket for Swoogle. Well, almost. The current web-based interface supports a limited number of query types. Many more can be asked if you use SQL directly to query Swoogle’s underlying databases. We don’t want to provide a direct SQL query service over the main Swoogle database because it’s easy to ask a query that will take a looooooong time to answer and some could even crash the database server. We are planning to put up a second server with a copy of the database and we give Swoogle Power Users (SPUs) access to it.

We ran a simple SQL query to generate some initial data for Mikael showing fall of the DC properties. For each one, we list all of the ranges that values were drawn from and the number of separate documents and triples for each combination. For example

Property
Range
Documents
Triples
dc:creater rdfs:Literal
32
648
dc:creator rdfs:Literal
234655
2477665
dc:creator wn:Person
2714
1138250
dc:creator cc:Agent
4090
6359
dc:creator foaf:Person
2281
5969
dc:creator foaf:Agent
1723
3234

Notice that the first property in this partial table is an obvious typo. You can see the complete table as pdf file or as an excel spreadsheet.

[Tim Finin, UMBC ebiquity lab]

"

(Via Planet RDF.)

# PermaLink Comments [0]
04/05/2006 20:00 GMT-0500 Modified: 06/22/2006 08:56 GMT-0500
This Week’s Semantic Web

(Via Danny Ayers.):

This Week’s Semantic Web:

"Ok, my first attempt at a round-up (in response to Phil’s observation of Planetary damage). Thanks to the conference there’s loads more here than there’s likely to be subsequent weeks, although it’s still only a fairly random sample and some of the links here are to heaps of other resources…
Incidentally, if anyone’s got a list/links for SemWeb-related blogs that aren’t on Planet RDF, I’d be grateful for a pointer. PS. Ok, I forget… are there any blogs that aren’t on Dave’s list yet..?

Quote of the week:

In the Semantic Web, it is not the Semantic which is new, it is the Web which is new.

- Chris Welty, IBM (lifted from TimBL’s slides)

Events

Docs etc

Software and stuff

  • Semantic Web Challenge applications (winner: CONFOTO - congrats bengee!)
  • Piggy Bank 2.1.1 released.
  • IRIS is a semantic desktop application framework that enables users to create a ‘personal map’ across their office-related information objects. IRIS includes a machine-learning platform to help automate this process. It provides ‘dashboard’ views, contextual navigation, and relationship-based structure across an extensible suite of office applications, including a calendar, web and file browser, e-mail client, and instant messaging client.
    (open source release due Jan 2006)
  • MKSearch - ‘A new kind of search engine’ - RDF-backed (Sesame) with Web crawler, extracts and indexes metadata.
  • FOAFRealm - Our goal is to design and implement D-FOAF, a distributed authentication and trust infrastructure without a centralised authority. D-FOAF will be a backbone for trust applications based on social relationships and will establish identity of users similar to the way we establish identify and trust in real life.
  • Perl Net::Flickr::RDF
  • WordPress SIOC (Semantically-Interlinked Online Communities) plugin updated (just copy wp-sioc.php into the root of your WP install and it just works)
  • OntoMedia is intended for the representation of heterogenous media through description of the semantic content of that media. The representation may be limited to the description of some or all of the elements contained within the source or may include information regarding the narrative relationship that these elements have both to the media and to each other.
  • mSpace is an interaction model to help explore relationships in information - ‘Imagine Google on iTunes’

Blog post title of the week:

Don’t give me that monkey-ass Web 1.0, either

- Uche Ogbuji

Also…a new threat to Semantic Web developers has been discovered: typhoid!, and the key to the Web’s full potential is…Tetris."

# PermaLink Comments [0]
11/14/2005 19:44 GMT-0500 Modified: 06/22/2006 08:56 GMT-0500
Breaking the Web Wide Open!

Marc Canter's Breaking the Web Wide Open! article is something I found pretty late (by my normal discovery standards). This was partly due to the pre- and post- Web 2.0 event noise levels that have dumped the description of an important industry inflection into the "Bozo Bin" of many. Personally, I think we shouldn't confuse the Web 2.0 traditional-pitch-fest conference with an attempt to identify an important industry inflection).

Anyway, Marc's article is a very refreshing read because it provides a really good insight into the general landscape of a rapidly evolving Web alongside genuine appreciation of our broader timeless pursuit of "Openness".

To really help this document provide additional value have scrapped the content of the original post and dumped it below so that we can appreciate the value of the links embedded within the article (note: thanks to Virtuoso I only had to paste the content into my blog, the extraction to my Linkblog and Blog Summary Pages are simply features of my Virtuoso based Blog Engine):

Breaking the Web Wide Open! (complete story)

Even the web giants like AOL, Google, MSN, and Yahoo need to observe these open standards, or they'll risk becoming the "walled gardens" of the new web and be coolio no more.

Editorial Note: Several months ago, AlwaysOn got a personal invitation from Yahoo founder Jerry Yang "to see and give us feedback on our new social media product, y!360." We were happy to oblige and dutifully showed up, joining a conference room full of hard-core bloggers and new, new media types. The geeks gave Yahoo 360 an overwhelming thumbs down, with comments like, "So the only services I can use within this new network are Yahoo services? What if I don't use Yahoo IM?" In essence, the Yahoo team was booed for being "closed web," and we heartily agreed. With Yahoo 360, Yahoo continues building its own "walled garden" to control its 135 million customers—an accusation also hurled at AOL in the early 1990s, before AOL migrated its private network service onto the web. As the Economist recently noted, "Yahoo, in short, has old media plans for the new-media era."

The irony to our view here is, of course, that today's AO Network is also a "closed web." In the end, Mr. Yang's thoughtful invitation and our ensuing disappointment in his new service led to the assignment of this article. It also confirmed our existing plan to completely revamp the AO Network around open standards. To tie it all together, we recruited the chief architect of our new site, the notorious Marc Canter, to pen this piece. We look forward to our reader feedback.

Breaking the Web Wide Open!
By Marc Canter

For decades, "walled gardens" of proprietary standards and content have been the strategy of dominant players in mainframe computer software, wireless telecommunications services, and the World Wide Web—it was their successful lock-in strategy of keeping their customers theirs. But like it or not, those walls are tumbling down. Open web standards are being adopted so widely, with such value and impact, that the web giants—Amazon, AOL, eBay, Google, Microsoft, and Yahoo—are facing the difficult decision of opening up to what they don't control.

The online world is evolving into a new open web (sometimes called the Web 2.0), which is all about being personalized and customized for each user. Not only open source software, but open standardsare becoming an essential component.

Many of the web giants have been using open source software for years. Most of them use at least parts of the LAMP (Linux, Apache, MySQL, Perl/Python/PHP) stack, even if they aren't well-known for giving back to the open source community. For these incumbents that grew big on proprietary web services, the methods, practices, and applications of open source software development are difficult to fully adopt. And the next open source movements—which will be as much about open standards as about code—will be a lot harder for the incumbents to exploit.

While the incumbents use cheap open source software to run their back-ends systems, their business models largely depend on proprietary software and algorithms. But our view a new slew of open software, open protocols, and open standards will confront the incumbents with the classic Innovator's Dilemma. Should they adopt these tools and standards, painfully cannibalizing their existing revenue for a new unproven concept, or should they stick with their currently lucrative model with the risk that eventually a bunch of upstarts eat their lunch?

Credit should go to several of the web giants who have been making efforts to "open up." Google, Yahoo, eBay, and Amazon all have Open APIs (Application Programming Interfaces) built into their data and systems. Any software developer can access and use them for whatever creative purposes they wish. This means that the API provider becomes an open platform for everyone to use and build on top of. This notion has expanded like wildfire throughout the blogosphere, so nowadays, Open APIs are pretty much required.

Other incumbents also have open strategies. AOL has got the RSS religion, providing a feedreader and RSS search in order to escape the "walled garden of content" stigma. Apple now incorporates podcasts, the "personal radio shows" that are latest rage in audio narrowcasting, into iTunes. Even Microsoft is supporting open standards, for example by endorsing SIP (Session Initiation Protocol) for internet telephony and conferencing over Skype's proprietary format or one of its own devising.

But new open standards and protocols are in use, under construction, or being proposed every day, pushing the envelope of where we are right now. Many of these standards are coming from startup companies and small groups of developers, not from the giants. Together with the Open APIs, those new standards will contribute to a new, open infrastructure. Tens of thousands of developers will use and improve this open infrastructure to create new kinds of web-based applications and services, to offer web users a highly personalized online experience.

A Brief History of Openness

At this point, I have to admit that I am not just a passive observer, full-time journalist or "just some blogger"—but an active evangelist and developer of these standards. It's the vision of "open infrastructure" that's driving my company and the reason why I'm writing this article. This article will give you some of the background behind on these standards, and what the evolution of the next generation of open standards will look like.

Starting back in the 1980s, establishing a software standard was a key strategy for any software company. My former company, MacroMind (which became Macromedia), achieved this goal early on with Director. As Director evolved into Flash, the world saw that other companies besides Microsoft, Adobe, and Apple could establish true cross-platform, independent media standards.

Then Tim Berners-Lee and Marc Andreessen came along, and changed the rules of the software business and of entrepreneurialism. No matter how entrenched and "standardized" software was, the rug could still get pulled out from under it. Netscape did it to Microsoft, and then Microsoft did it back to Netscape. The web evolved, and lots of standards evolved with it. The leading open source standards (such as the LAMP stack) became widely used alternatives to proprietary closed-source offerings.

Open standards are more than just technology. Open standards mean sharing, empowering, and community support. Someone floats a new idea (or meme) and the community runs with it – with each person making their own contributions to the standard – evolving it without a moment's hesitation about "giving away their intellectual property."

One good example of this was Dave Sifry, who built the Technorati blog-tracking technology inspired by the Blogging Ecosystem, a weekend project by young hacker Phil Pearson. Dave liked what he saw and he ran with it—turning Technorati into what it is today.

Dave Winer has contributed enormously to this area of open standards. He defined and personally created several open standards and protocols—such as RSS, OPML, and XML-RPC. Dave has also helped build the blogosphere through his enthusiasm and passion.

By 2003, hundreds of programmers were working on creating and establishing new standards for almost everything. The best of these new standards have evolved into compelling web services platforms – such as del.icio.us, Webjay, or Flickr. Some have even spun off formal standards – like XSPF (a standard for playlists) or instant messaging standard XMPP (also known as Jabber).

Today's Open APIs are complemented by standardized Schemas—the structure of the data itself and its associated meta-data. Take for example a podcasting feed. It consists of: a) the radio show itself, b) information on who is on the show, what the show is about and how long the show is (the meta-data) and also c) API calls to retrieve a show (a single feed item) and play it from a specified server.

The combination of Open APIs, standardized schemas for handling meta-data, and an industry which agrees on these standards are breaking the web wide open right now. So what new open standards should the web incumbents—and you—be watching? Keep an eye on the following developments:

Identity
Attention
Open Media
Microcontent Publishing
Open Social Networks
Tags
Pinging
Routing
Open Communications
Device Management and Control



1. Identity

Right now, you don't really control your own online identity. At the core of just about every online piece of software is a membership system. Some systems allow you to browse a site anonymously—but unless you register with the site you can't do things like search for an article, post a comment, buy something, or review it. The problem is that each and every site has its own membership system. So you constantly have to register with new systems, which cannot share data—even you'd want them to. By establishing a "single sign-on" standard, disparate sites can allow users to freely move from site to site, and let them control the movement of their personal profile data, as well as any other data they've created.

With Passport, Microsoft unsuccessfully attempted to force its proprietary standard on the industry. Instead, a world is evolving where most people assume that users want to control their own data, whether that data is their profile, their blog posts and photos, or some collection of their past interactions, purchases, and recommendations. As long as users can control their digital identity, any kind of service or interaction can be layered on top of it.

Identity 2.0 is all about users controlling their own profile data and becoming their own agents. This way the users themselves, rather than other intermediaries, will profit from their ID info. Once developers start offering single sign-on to their users, and users have trusted places to store their data—which respect the limits and provide access controls over that data, users will be able to access personalized services which will understand and use their personal data.

Identity 2.0 may seem like some geeky, visionary future standard that isn't defined yet, but by putting each user's digital identity at the core of all their online experiences, Identity 2.0 is becoming the cornerstone of the new open web.

The Initiatives:
Right now, Identity 2.0 is under construction through various efforts from Microsoft (the "InfoCard" component built into the Vista operating system and its "Identity Metasystem"), Sxip Identity, Identity Commons, Liberty Alliance, LID (NetMesh's Lightweight ID), and SixApart's OpenID.

More Movers and Shakers:
Identity Commons and Kaliya Hamlin, Sxip Identity and Dick Hardt, the Identity Gang and Doc Searls, Microsoft's Kim Cameron, Craig Burton, Phil Windley, and Brad Fitzpatrick, to name a few.


2. Attention

How many readers know what their online attention is worth? If you don't, Google and Yahoo do—they make their living off our attention. They know what we're searching for, happily turn it into a keyword, and sell that keyword to advertisers. They make money off our attention. We don't.

Technorati and friends proposed an attention standard, Attention.xml, designed to "help you keep track of what you've read, what you're spending time on, and what you should be paying attention to." AttentionTrust is an effort by Steve Gillmor and Seth Goldstein to standardize on how captured end-user performance, browsing, and interest data are used.

Blogger Peter Caputa gives a good summary of AttentionTrust:
"As we use the web, we reveal lots of information about ourselves by what we pay attention to. Imagine if all of that information could be stored in a nice neat little xml file. And when we travel around the web, we can optionally share it with websites or other people. We can make them pay for it, lease it ... we get to decide who has access to it, how long they have access to it, and what we want in return. And they have to tell us what they are going to do with our Attention data."

So when you give your attention to sites that adhere to the AttentionTrust, your attention rights (you own your attention, you can move your attention, you can pay attention and be paid for it, and you can see how your attention is used) are guaranteed. Attention data is crucial to the future of the open web, and Steve and Seth are making sure that no one entity or oligopoly controls it.

Movers and Shakers:
Steve Gillmor, Seth Goldstein, Dave Sifry and the other Attention.xml folks.


3. Open Media

Proprietary media standards—Flash, Windows Media, and QuickTime, to name a few —helped liven up the web. But they are proprietary standards that try to keep us locked in, and they weren't created from scratch to handle today's online content. That's why, for many of us, an Open Media standard has been a holy grail. Yahoo's new Media RSS standard brings us one step closer to achieving open media, as do Ogg Vorbis audio codecs, XSPF playlists, or MusicBrainz. And several sites offer digital creators not only a place to store their content, but also to sell it.

Media RSS (being developed by Yahoo with help from the community) extends RSS and combines it with "RSS enclosures" —adds metadata to any media item—to create a comprehensive solution for media "narrowcasters." To gain acceptance for Media RSS, Yahoo knows it has to work with the community. As an active member of this community, I can tell you that we'll create Media RSS equivalents for rdf (an alternative subscription format) and Atom (yet another subscription format), so no one will be able to complain that Yahoo is picking sides in format wars.

When Yahoo announced the purchase of Flickr, Yahoo founder Jerry Yang insinuated that Yahoo is acquiring "open DNA" to turn Yahoo into an open standards player. Yahoo is showing what happens when you take a multi-billion dollar company and make openness one of its core values—so Google, beware, even if Google does have more research fellows and Ph.D.s.

The open media landscape is far and wide, reaching from game machine hacks and mobile phone downloads to PC-driven bookmarklets, players, and editors, and it includes many other standardization efforts. XSPF is an open standard for playlists, and MusicBrainz is an alternative to the proprietary (and originally effectively stolen) database that Gracenote licenses.

Ourmedia.org is a community front-end to Brewster Kahle's Internet Archive. Brewster has promised free bandwidth and free storage forever to any content creators who choose to share their content via the Internet Archive. Ourmedia.org is providing an easy-to-use interface and community to get content in and out of the Internet Archive, giving ourmedia.org users the ability to share their media anywhere they wish, without being locked into a particular service or tool. Ourmedia plans to offer open APIs and an open media registry that interconnects other open media repositories into a DNS-like registry (just like the www domain system), so folks can browse and discover open content across many open media services. Systems like Brightcove and Odeo support the concept of an open registry, and hope to work with digital creators to sell their work to fulfill the financial aspect of the "Long Tail."

More Movers and Shakers:
Creative Commons, the Open Media Network, Jay Dedman, Ryanne Hodson, Michael Verdi, Eli Chapman, Kenyatta Cheese, Doug Kaye, Brad Horowitz, Lucas Gonze, Robert Kaye, Christopher Allen, Brewster Kahle, JD Lasica, and indeed, Marc Canter, among others.


4. Microcontent Publishing

Unstructured content is cheap to create, but hard to search through. Structured content is expensive to create, but easy to search. Microformats resolve the dilemma with simple structures that are cheap to use and easy to search.

The first kind of widely adopted microcontent is blogging. Every post is an encapsulated idea, addressable via a URL called a permalink. You can syndicate or subscribe to this microcontent using RSS or an RSS equivalent, and news or blog aggregators can then display these feeds in a convenient readable fashion. But a blog post is just a block of unstructured text—not a bad thing, but just a first step for microcontent. When it comes tostructureddata, such as personal identity profiles, product reviews, or calendar-type event data, RSS was not designed to maintain the integrity of the structures.

Right now, blogging doesn't have the underlying structure necessary for full-fledged microcontent publishing. But that will change. Think of local information services (such as movie listings, event guides, or restaurant reviews) that any college kid can access and use in her weekend programming project to create new services and tools.

Today's blogging tools will evolve into microcontent publishing systems, and will help spread the notion of structured data across the blogosphere. New ways to store, represent and produce microcontent will create new standards, such as Structured Blogging and Microformats. Microformats differ from RSS feeds in that you can't subscribe to them. Instead, Microformats are embedded into webpages and discovered by search engines like Google or Technorati. Microformats are creating common definitions for "What is a review or event? What are the specific fields in the data structure?" They can also specify what we can do with all this information.OPML (Outline Processor Markup Language) is a hierarchical file format for storing microcontent and structured data. It was developed by Dave Winer of RSS and podcast fame.

Events are one popular type of microcontent. OpenEvents is already working to create shared databases of standardized events, which would get used by a new generation of event portals—such as Eventful/EVDB, Upcoming.org, and WhizSpark. The idea of OpenEvents is that event-oriented systems and services can work together to establish shared events databases (and associated APIs) that any developer could then use to create and offer their own new service or application. OpenReviews is still in the conceptual stage, but it would make it possible to provide open alternatives to closed systems like Epinions, and establish a shared database of local and global reviews. Its shared open servers would be filled with all sorts of reviews for anyone to access.

Why is this important? Because I predict that in the future, 10 times more people will be writing reviews than maintaining their own blog. The list of possible microcontent standards goes on: OpenJobpostings, OpenRecipes, and even OpenLists. Microsoft recently revealed that it has been working on an important new kind of microcontent: Lists—so OpenLists will attempt to establish standards for the kindof lists we all use, such as lists of Links, lists of To Do Items, lists of People, Wish Lists, etc.

Movers and Shakers:
Tantek Çelik and Kevin Marks of Technorati, Danny Ayers, Eric Meyer, Matt Mullenweg, Rohit Khare, Adam Rifkin, Arnaud Leene, Seb Paquet, Alf Eaton, Phil Pearson, Joe Reger, Bob Wyman among others.


5. Open Social Networks

I'll never forget the first time I met Jonathan Abrams, the founder of Friendster. He was arrogant and brash and he claimed he "owned" all his users, and that he was going to monetize them and make a fortune off them. This attitude robbed Friendster of its momentum, letting MySpace, Facebook, and other social networks take Friendster's place.

Jonathan's notion of social networks as a way to control users is typical of the Web 1.0 business model and its attitude towards users in general. Social networks have become one of the battlegrounds between old and new ways of thinking. Open standards for Social Networking will define those sides very clearly. Since meeting Jonathan, I have been working towards finding and establishing open standards for social networks. Instead of closed, centralized social networks with 10 million people in them, the goal is making it possible to have 10 million social networks that each have 10 people in them.

FOAF (which stands for Friend Of A Friend, and describes people and relationships in a way that computers can parse) is a schema to represent not only your personal profile's meta-data, but your social network as well. Thousands of researchers use the FOAF schema in their "Semantic Web" projects to connect people in all sorts of new ways. XFN is a microformat standard for representing your social network, while vCard (long familiar to users of contact manager programs like Outlook) is a microformat that contains your profile information. Microformats are baked into any xHTML webpage, which means thatanyblog, social network page, or any webpage in general can "contain" your social network in it—and be used byanycompatible tool, service or application.

PeopleAggregator is an earlier project now being integrated into open content management framework Drupal. The PeopleAggregator APIs will make it possible to establish relationships, send messages, create or join groups, and post between different social networks. (Sneak preview: this technology will be available in the upcoming GoingOn Network.)

All of these open social networking standards mean that inter-connected social networks will form a mesh that will parallel the blogosphere. This vibrant, distributed, decentralized world will be driven by open standards: personalized online experiences are what the new open web will be all about—and what could be more personalized than people's networks?

Movers and Shakers:
Eric Sigler, Joel De Gan, Chris Schmidt, Julian Bond, Paul Martino, Mary Hodder, Drummond Reed, Dan Brickley, Randy Farmer, and Kaliya Hamlin, to name a few.


6. Tags

Nowadays, no self-respecting tool or service can ship without tags. Tags are keywords or phrases attached to photos, blog posts, URLs, or even video clips. These user- and creator-generated tags are an open alternative to what used to be the domain of librarians and information scientists: categorizing information and content using taxonomies. Tags are instead creating "folksonomies."

The recently proposed OpenTags concept would be an open, community-owned version of the popular Technorati Tags service. It would aggregate the usage of tags across a wide range of services, sites, and content tools. In addition to Technorati's current tag features, OpenTags would let groups of people share their tags in "TagClouds." Open tagging is likely to include some of the open identity features discussed above, to create a tag system that is resilient to spam, and yet trustable across sites all over the web.

OpenTags owes a debt to earlier versions of shared tagging systems, which include Topic Exchange and something called the k-collector—a knowledge management tag aggregator—from Italian company eVectors.

Movers & Shakers:
Phil Pearson, Matt Mower , Paolo Valdemarin, and Mary Hodder and Drummond Reed again, among others.


7. Pinging

Websites used to be mostly static. Search engines that crawled (or "spidered") them every so often did a good enough job to show reasonably current versions of your cousin's homepage or even Timemagazine's weekly headlines. But when blogging took off, it became hard for search engines to keep up. (Google has only just managed to offer blog-search functionality, despite buying Blogger back in early 2003.)

To know what was new in the blogosphere, users couldn't depend on services that spidered webpages once in a while. The solution: a way for blogs themselves to automatically notify blog-tracking sites that they'd been updated. Weblogs.com was the first blog "ping service": it displayed the name of a blog whenever that blog was updated. Pinging sites helped the blogosphere grow, and more tools, services, and portals started using pinging in new and different ways. Dozens of pinging services and sites—most of which can't talk to each other—sprang up.

Matt Mullenweg (the creator of open source blogging software WordPress) decided that a one-stop service for pinging was needed. He created Ping-o-Matic—which aggregates ping services and simplifies the pinging process for bloggers and tool developers. With Ping-o-Matic, any developer can alert all of the industry's blogging tools and tracking sites at once. This new kind of open standard, with shared infrastructure, is a critical to the scalability of Web 2.0 services.

As Matt said:
There are a number of services designed specifically for tracking and connecting blogs. However it would be expensive for all the services to crawl all the blogs in the world all the time. By sending a small ping to each service you let them know you've updated so they can come check you out. They get the freshest data possible, you don't get a thousand robots spidering your site all the time. Everybody wins.

Movers and Shakers:
Matt Mullenweg, Jim Winstead, Dave Winer


8. Routing

Bloggers used to have to manually enter the links and content snippets of blog posts or news items they wanted to blog. Today, some RSS aggregators can send a specified post directly into an associated blogging tool: as bloggers browse through the feeds they subscribe to, they can easily specify and send any post they wish to "reblog" from their news aggregator or feed reader into their blogging tool. (This is usually referred to as "BlogThis.") As structured blogging comes into its own (see the section on Microcontent Publishing), it will be increasingly important to maintain the structural integrity of these pieces of microcontent when reblogging them.

Promising standard RedirectThis will combine a "BlogThis"-like capability while maintaining the integrity of the microcontent. RedirectThis will let bloggers and content developers attach a simple "PostThis" button to their posts. Clicking on that button will send that post to the reader/blogger's favorite blogging tool. This favorite tool is specified at the RedirectThis web service, where users register their blogging tool of choice. RedirectThis also helps maintain the integrity and structure of microcontent—then it's just up to the user to prefer a blogging tool that also attains that lofty goal of microcontent integrity.

OutputThis is another nascent web services standard, to let bloggers specify what "destinations" they'd like to have as options in their blogging tool. As new destinations are added to the service, more checkboxes would get added to their blogging tool—allowing them to route their published microcontent to additional destinations.

Movers and Shakers:
Michael Migurski, Lucas Gonze


9. Open Communications

Likely, you've experienced the joys of finding friends on AIM or Yahoo Messenger, or the convenience of Skyping with someone overseas. Not that you're about to throw away your mobile phone or BlackBerry, but for many, also having access to Instant Messaging (IM) and Voice over IP (VoIP) is crucial.

IM and VoIP are mainstream technologies that already enjoy the benefits of open standards. Entire industries are born—right this second—based around these open standards. Jabber has been an open IM technology for years—in fact, as XMPP, it was officially dubbed a standard by the IETF. Although becoming an official IETF standard is usually the kiss of death, Jabber looks like it'll be around for a while, as entire generations of collaborative, work-group applications and services have been built on top of its messaging protocol. For VoIP, Skype is clearly the leading standard today—though one could argue just how "open" it is (and defenders of the IETF's SIP standard often do). But it is free and user-friendly, so there won't be much argument from users about it being insufficiently open. Yet there may be a cloud on Skype's horizon: web behemoth Google recently released a beta of Google Talk, an IM client committed to open standards. It currently supports XMPP, and will support SIP for VoIP calls.

Movers and Shakers:
Jeremie Miller, Henning Schulzrinne, Jon Peterson, Jeff Pulver


10. Device Management and Control

To access online content, we're using more and more devices. BlackBerrys, iPods, Treos, you name it. As the web evolves, more and more different devices will have to communicate with each other to give us the content we want when and where we want it. No-one wants to be dependent on one vendor anymore—like, say, Sony—for their laptop, phone, MP3 player, PDA, and digital camera, so that it all works together. We need fully interoperable devices, and the standards to make that work. And to fully make use of how content is moving online content and innovative web services, those standards need to be open.

MIDI (musical instrument digital interface), one of the very first open standards in music, connected disparate vendors' instruments, post-production equipment, and recording devices. But MIDI is limited, and MIDI II has been very slow to arrive. Now a new standard for controlling musical devices has emerged: OSC (Open SoundControl). This protocol is optimized for modern networking technology and inter-connects music, video and controller devices with "other multimedia devices." OSC is used by a wide range of developers, and is being taken up in the mainstream MIDI marketplace.

Another open-standards-based device management technology is ZigBee, for building wireless intelligence and network monitoring into all kinds of devices. ZigBee is supported by many networking, consumer electronics, and mobile device companies.


· · · · · ·

The Change to Openness

The rise of open source software and its "architecture of participation" are completely shaking up the old proprietary-web-services-and-standards approach. Sun Microsystems—whose proprietary Java standard helped define the Web 1.0—is opening its Solaris OS and has even announced the apparent paradox of an open-source Digital Rights Management system.

Today's incumbents will have to adapt to the new openness of the Web 2.0. If they stick to their proprietary standards, code, and content, they'll become the new walled gardens—places users visit briefly to retrieve data and content from enclosed data silos, but not where users "live." The incumbents' revenue models will have to change. Instead of "owning" their users, users will know they own themselves, and will expect a return on their valuable identity and attention. Instead of being locked into incompatible media formats, users will expect easy access to digital content across many platforms.

Yesterday's web giants and tomorrow's users will need to find a mutually beneficial new balance—between open and proprietary, developer and user, hierarchical and horizontal, owned and shared, and compatible and closed.


Marc Canter is an active evangelist and developer of open standards. Early in his career, Marc founded MacroMind, which became Macromedia. These days, he is CEO of Broadband Mechanics, a founding member of the Identity Gang and of ourmedia.org. Broadband Mechanics is currently developing the GoingOn Network (with the AlwaysOn Network), as well as an open platform for social networking called the PeopleAggregator.

A version of the above post appears in the Fall 2005 issue of AlwaysOn's quarterly print blogozine, and ran as a four-part series on the AlwaysOn Network website.

(Via Marc's Voice.)

# PermaLink Comments [0]
10/26/2005 19:28 GMT-0500 Modified: 06/22/2006 08:56 GMT-0500
Wired News: Pumping Indies on MTV

Wired News: Pumping Indies on MTV: "Search: Pumping Indies on MTV Page 1 of 1 By Ryan Singel| Also by this reporter 02:00 AM Oct. 21, 2005 PT Norbury and Finch are a folk duo based on Vancouver Island. They aren't signed to a music label, but fans of the Ozzy Osbourne reality show and HBO's Real Sex have heard their music, thanks to an innovative music-licensing company that has placed thousands of songs by little-known artists on big-name television shows. Pump Audio is a Hudson Valley, New York-based company that helps independent musicians and artists who are on small labels, or no label, get paid for their art. The company provides hard drives full of music to harried production teams at networks such as MTV and the Food Network. See also Bands Embrace Social Networking You, Too, Could Be in Advertising Bands to Labels: Play With Us DAT's Entertainment, So Enjoy Today's Top 5 Stories Creating the Global Hot Spot Hear, Hear for Audio Erotica Pumping Indies on MTV Cliff Notes From the Blog Wor"

(Via .)

# PermaLink Comments [0]
10/22/2005 13:37 GMT-0500 Modified: 06/22/2006 08:56 GMT-0500
World Wide Web of Junk
After digesting Oblique Angle's post titled: World Wide Web of Junk, it was nice to be reassured that I am not part of a shrinking minority of increasingly peturbed Web users. The post excerptbelow is what compelled me to contributesome of my thoughts about the current state of the Web and a future "Semantic Web".
The value of the Internet as a repository of useful information is very low. Carl Shapiro in “Information Rules” suggests that the amount of actually useful information on the Internet would fit within roughly 15,000 books, which is about half the size of an average mall bookstore. To put this in perspective: there are over 5 billion unique, static & publicly accessible web pages on the www. Apparently Only 6% of web sites have educational content (Maureen Henninger, “Don’t just surf the net: Effective research strategies”. UNSW Press). Even of the educational content only a fraction is of significant informational value.
Noise is taking over the Web at an alarming rate (to be expected in a sense ), and even though Tim Berners-Lee (TBL) had the foresight to create the Web,many see nothing butfutilityin hisvision for a "Semantic Web" (I don't!). A recent exampleof such commentary comes from Eric Nee's CIO article, titled: Web Future is Not Semantic, Or Overly Orderly. I take issue with this article because, like most (who have been bitten at least once), I don't like mono culture. This article inadvertently promotes "Google Mono Culture". I haveexcerpted the more frustrating parts of this article below:

..As Stanford students, Larry Page and Sergey Brin looked at the same problem—how to impart meaning to all the content on the Web—and decided to take a different approach. The two developed sophisticated software that relied on other clues to discover the meaning of content, such as which Web sites the information was linked to. And in 1998 they launched Google..

You mean noise ranking. Now, I don't think Larry and Sergey set out to do this, but Google page ranks are ultimately based on the concept of "Google Juice" (aka links). The value quotient of this algorithm is accelerating at internet speed (ironically, but naturally). Human beings are smarter than computers, we just processdata (not information!)much slower that's all. Thus, we can conjure up numerous ways to bubble up the google link ranking algorithms in no time (as is the case today).

..What most differentiates Google's approach from Berners-Lee's is that Google doesn't require people to change the way they post content..

The Semantic Web doesn't require anyone to change how they post content either! It just provides a roadmap for intelligent content managment and consumption through innovative products.

..As Sergey Brin told Infoworld's 2002 CTO Forum, "I'd rather make progress by having computers under-stand what humans write, than by forcing -humans to write in ways that computers can understand." In fact, Google has not participated at all in the W3C's formulation of Semantic Web standards, says Eric Miller..

Semantic Content generated by next generation content managers will make more progress, and they certainly won't require humans to write any differently. If anything, humans will find the process quite refreshing as and when participation is required e.g. clickingbookmarklets associated with tagging services such as'del.icio.us', 'de.lirio.us', or Unalog and others. But this is only the beginning, if I can click on a bookmarklet to post this blog post to a tagging service, then why wouldn't I be able to incorporate the "tag service post" into the same process that saves my blog post (the post is content that ends up in a content management system aka blog server)?

Yet Google's impact on the Web is so dramatic that it probably makes more sense to call the next generation of the Web the "Google Web" rather than the "Semantic Web."

Ah! so you think wereally want the noisy "Google Web" as opposed to a federation of distributed Information- and Knowledgbases ala the "Semantic Web"? I don't think so somehow!

Today we are generally excited about "tagging" but fail to see its correlation with the "Semantic Web", somehow? I have said this before, and I will say it again, the "Semantic Web" is going to be self-annotated by humanswith theaid ofintelligent and unobtrusive annotation technology solutions. These solutions willprovide context and purposeby using ourour social essence as currency. The annotationeffort will be subliminal, therewon't be a "Semantic Web Day" parade or anything of the like.It will appear before us all, in all its glory, without any fanfare.Funnily enough, wemight not even call it "The Semantic Web", who cares? But it will have the distinct attributes of being very "Quiet" and highly "Valuable"; withno burden on"how wewrite", but constructiveburden on "why wewrite" as part of the content contributionprocess (less Google/Yahoo/etc juicechasingfor more knowledgeassembly and exchange).

We are social creatures at our core. The Internet and Web have collectively reduced the connectivity hurdles thatonce made social network oriented solutions implausible. The eradication ofthese hurdles ultimately feeds the very impulses that trigger the critical self-annotation that is the basis of my fundamental belief in the realization of TBL's Semantic Web vision.

# PermaLink Comments [0]
05/20/2005 23:07 GMT-0500 Modified: 06/22/2006 08:56 GMT-0500
Social Construction of Reality

An interesting article by Brad Cox. (inventor of Objective-C) that's provides great foundation for a understanding number of issuesthatare relevant to social networking systems.

# PermaLink Comments [0]
05/13/2005 11:31 GMT-0500 Modified: 06/22/2006 08:56 GMT-0500
<< | 1 | 2 | 3 | >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform
The posts on this weblog are my personal views, and not those of OpenLink Software.