Enjoy!
]]>The deadline for submitting papers, presentations, demo, and tutorial proposals is the 28th of January, 2008.
]]>The quality of service factors above nullify many of the typical concerns associated data driven business models, such as:
Like most of us in the Linked Data community, he sees the upcoming Linked Data Conference by Jupiter as a watershed moment.
]]>Here is a simple What and Why guide covering the essence of Data Spaces.
A Data Space is a point of presence on a network, where every Data Object (item or entity) is given a Name (e.g., a URI) by which it may be Referenced or Identified.
In a Data Space, every Representation of those Data Objects (i.e., every Object Representation) has an Address (e.g., a URL) from which it may be Retrieved (or "gotten").
In a Data Space, every Object Representation is a time variant (that is, it changes over time), streamable, and format-agnostic Resource.
An Object Representation is simply a Description of that Object. It takes the form of a graph, pictorially constructed from sets of 3 elements which are themselves named Subject, Predicate, and Object (or SPO); or Entity, Attribute, and Value (or EAV). Each Entity+Attribute+Value or Subject+Predicate+Object set (or triple), is one datum, one piece of data, one persisted observation about a given Subject or Entity.
The underlying Schema that defines and constrains the construction of Object Representations is based on Logic, specifically First-Order Logic. Each Object Representation is a collection of persisted observations (Data) about a given Subject, which aid observers in materializing their perception (Information), and ultimately comprehension (Knowledge), of that Subject.
In the real-world -- which is networked by nature -- data is heterogeneously (or "differently") shaped, and disparately located.
Data has been increasing at an alarming rate since the advent of computing; the interWeb simply provides context that makes this reality more palpable and more exploitable, and in the process virtuously ups the ante through increasingly exponential growth rates.
We can't stop data heterogeneity; it is endemic to the nature of its producers -- humans and/or human-directed machines. What we can do, though, is create a powerful Conceptual-level "bus" or "interface" for data integration, based on Data Description oriented Logic rather than Data Representation oriented Formats. Basically, it's possible for us to use a Common Logic as the basis for expressing and blending SPO- or EAV-based Object Representations in a variety of Formats (or "dialects").
The roadmap boils down to:
Assigning unambiguous Object Names to:
Every record (or, in table terms, every row);
Every record attribute (or, in table terms, every field or column);
Every record relationship (that is, every relationship between one record and another);
Every record container (e.g., every table or view in a relational database, every named graph, every spreadsheet, every text file, etc.);
Making each Object Name resolve to an Address through which Create, Read, Update, and Delete ("CRUD") operations can be performed against (can access) the associated Object Representation graph.
Here is how I see Linked Data providing tangible value to MDM tools vendors and users:
Of course Virtuoso was designed and developed to deliver the above from day one (circa. 1998 re. the core and 2005 re. the use of RDF for the final mile) as depicted below:
Virtuoso is an extremely compact product that is very easy to install. The ease of installation carries over to the PHP runtime when bound to Virtuoso.
]]>Enjoy!
]]>The keynote: Creating, Deploying, and Exploiting Linked Data, sought to achieve the fundamental goal of: Demystify the concept of "Linked Data" using anecdotal material that resonates with enterprise decision makers.
To my pleasure, 90% of the audience members confirmed familiarization with the "Data Source Name" concept of Open Database Connectivity (ODBC). Thus, all I had to do was map "Linked Data" to ODBC, and then unveil the fundamental add-ons that "Linked Data" delivers:
I believe a majority of attendees came to realize that the combination above injects a new Web interaction dynamic: access to "Subject matter Concepts" and Named Entities contained within a page via HTTP base Data Source Names (URIs).
BTW - My presentation is a Linked Data Space in it's own right courtesy of the Bibliographic Ontology (which provides slide show modeling) and RDFa that allows me to embed annotations into my Slidy based presentation :-)
During this particular podcast interview, I deliberately wanted to have an conversation about the practical value of Linked Data, rather than the technical innards. The fundamental utility of Linked Data remains somewhat mercurial, and I am certainly hoping to do my bit at the upcoming Linked Data Planet conference re. demonstrating and articulating linked data value across the blurring realms of "the individual" and "the enterprise".
Note to my old schoolmates on Facebook: when you listen to this podcast you will at least reconcile "Uyi Idehen" with "Kingsley Idehen". Unfortunately, Facebook refuses to let me Identify myself in the manner I choose. Ideally, I would like to have the name: "Kingsley (Uyi) Idehen" associated with my Facebook ID since this is the Identifier known to my personal network of friends, family, and old schoolmates. This Identity predicament is a long running Identity case study in the making.
]]>
Both browsers should lead you to the posts from Danny, Nova, and Tim. In both cases the URI < xmlns="http" www.openlinksw.com="www.openlinksw.com" dataspace="dataspace" kidehen="kidehen" openlinksw.com="openlinksw.com" weblog="weblog" s="s" blog="blog" b127="b127" d="d"> is a pointer to structured data (in my Blog Data Space) if your user agent (browser or other Web Client) requests an RDF representation of this post via its HTTP request payload (what the Browser are doing via the "Accept:" headers).</>
As you can see the Data Web is actually here! Without RDF generation upheaval (or Tax).
]]>The items that follow attempt to demonstrate the point by way of SIOC (Semantically-Interlinked Online Communities Ontology) and MO (Music Ontology) domain exploration:
Linked Data or Dynamic Data Web Pages:
Semantic Web Browser Sessions:
Key point, if you are modeling People, Communities, Organizations, Documents, and other entities in the People, Organizations, Documents etc. Data Space, don't forget to : FOAF-FOAF-FOAF it Up! :-)
]]>Naturally, this triggered an obvious opportunity to demonstrate the prowess of Linked Data on the Semantic Web. What follows is a quick dump of what I sent to the foaf-dev mailing list:
Here are variety of FOAF Views built using:
Enabling you to explore the following lines:
The journey towards this watershed moment started with the Semantic Web Project, gained focus and pragmatism via the Linked Data meme, attained substance & credibility via efforts such as DBpedia and the resulting cloud of Open Linked Data Spaces, and finally arrived at the most important destination of all: broad comprehension and coherence, via RDFa.
Over the years, I've chronicled the journey above via entries in this particular data space (my blog) and most recently, via my rapid-fire comments and debates on Twitter (basically hastag #linkeddata account: kidehen).
On a parallel front re. my chronicles, I've periodically had conversations with Jon Udell, who has always provided a coherent sounding board and reconciliation framework for my world views and open data access vision; naturally, this has a lot to do with his holistic grasp of the big picture issues, associated technical details, and special communication prowess :-)
Against this backdrop, I refer you to my most recent podcast conversation with Jon, which is about how the tandem of HTML+RDFa and the GoodRelations vocabulary deliver the critical missing links re. broad comprehension of the Semantic Web vision en route to mass exploitation.
I would tweak of the law modification expressed in Mike Bergman's post which states:
the value of a Linked Data network is proportional to the square of the number of links between the data objects.By simply injecting "Context" which is what a high fidelity linked data mesh facilitates i.e. a mesh of weighted links endowed with specifically typed links (as opposed to a single ambiguous type unspecific link), you end up with an even more insight into the power of a Linked Data Web.
How about Einstein's famous equaton: E=mc2? I am talking Energy (vitality) and Mass equivalence, where "E" is for Energy, "m" for Network Mesh base Mass ( where each entity network node contains sub-particles that are themselves dense network meshes all endowed with typed links and weightings), and "c" is for computer processing speed (processing speed is growing exponentially!). When you beam queries down a context rich mesh (a giant global graph comprised of named and dereferencable data sources), especially a mesh to which we are all connected, what do you get? Infrastructure for generating an unbelievable amount of intellectual energy (the result of exploding the sub-data-graphs within graph nodes) that is much better equipped to handle current and future challenges. Even better, we end up making constructive use of Einstein's findings (remember, we built a bomb the first time around!). TimBL articulates this fundamental value of the Web in slightly different language, but at the core, this is the essence of the Web as I believe he envisioned; the ability to connect us all in such a way that we exploit our collective manpower and knowledge constructively and unobtrusively, en route to making the world a much better place :-)
Note: None of this in incongruent with being compensated (i.e. making money) for contributing tangible value into, or around, the Mesh we know as the Web :-)
Although the Web continues to shrink the planet by removing the restrictions of geopgrahic location, meeting people face-to-face remains invaluable (*priceless in Mastercard AD speak*). Naturally, meeting and chatting with as many LOD community members as possible was high up on my agenda.
As one of the co-chairs of the Linking Open Data Workshop (LODW), I had a 5 minute workshop opening slot during which I spoke about the following:
We have DBpedia as a major hub on the burgeoning Linked Data Web. When OpenLink offered to host DBpedia (a combination of Virtuoso DBMS Software and sizable backend Hardware infrastructure), it did so knowing that such an effort would emphatically address the "chicken and egg" conundrum that, prior to this undertaking, stifled the ability to demonstrate practical utility of HTTP based Linked Data.
Today, the Linked Data bootstrap mission has been accomplished.
Although DBpedia is a hub (ground zero of Linked Data), we have to put it into perspective in relation to a new set of needs and expectations moving forward. Today, DBpedia is a Sun at the heart of a Solar System within the Linked Data Galaxy. But unlike Space as we know it, in Cyberspace we can have connectivity and collaboration across Solar Systems -- life exists elsewhere and we are part of a collaborative collective unimpeded by constraints of space travel etc. Thus, expect to see the emergence of other Solar Systems accessible to DBpedia and its collections of planets (see. LOD diagram). Examples underway include UMBEL which will serve the Linked Data planets from OpenCyc (Subject Matter Concepts), Yago (Named Entities), and Bio2RDF (which provides powerful Bio Informatics based Linked Data planet).
I urged the community to veer more aggressively towards developing and demonstrating practical Linked Data driven solutions that are aligned to well known problems. Of course, I encouraged all presenters to make this an integral part of their presentations :-)
The workshop was well attended and I found all the presentations engaging and full of enthusiasm.
As the sessions progressed, it became clear during a number of accompanying Q&A sessions that a new Linked Data exploitation frontier is emerging. The frontier in question takes the form of a Linked Data substrate capable of addressing the taxonomic needs of solutions aimed at automated Named Entity Extraction, Disambiguation, Subject matter Concept alignment, transparently integrated with existing Web Content. Thus, we are moving beyond the minting and deployment of of dereferencable URIs and RDF data sets to automagically associating existing Web Content with Named Entities (People, Organizations, Places, Events etc..) and Subject matter Concepts (Politics, Music, Sports, and others) while remaining true to the Linking Open Data Community creed i.e. ensuring the Named Entity and Subject matter Concept URIs are available to user agents or users seeking to produce alternative data views (i.e. Mesh-ups).
I will get to part 2 of this report once the actual workshop sessions slides go live (*these are different from the pre-event PDFs links*).
]]>As I can't quite remix Videos on the spur of the moment (yet), I would encourage you to watch the video and then click on the link to my FOAF Profile, then follow the "Linked Data" tab to see how Linked Data oriented platforms (in my case OpenLink Data Spaces) that exist today actually deliver what's explained in the video.
"What You Know" (Data & Friend Networks) ultimately trumps "Who You Know" (Friend only Networks). The exploitation power of this reality is enhanced exponentially via the Linked Data Web once the implications of beaming SPARQL queries down specific URIs (entry points to Linked Data graphs) become clearer :-)
]]>Information overload and Data Portability are two of the most pressing and imminent challenges affecting every individual connected to the global village exposed by the Internet and World Wide Web. I wrote an earlier post titled: Why We Need Linked Data that shed light on frequently overlooked realities about the Document Web.
The real Killer application of the Semantic Web (imho) is Linked Data (or Hyperdata), just as the killer application of the Document Web was Linked Documents (Hyperlinks). Linked Data enables human users (indirectly) and software agents (directly in response to human instruction) to traverse Web Data Spaces (Linked Data enclaves within the Giant Global Graph).
Semantic Web applications (conduits between humans and agents) that take advantage of Linked Data include:
DBpedia - General Knowledge sourced from Wikipedia and a host of other Linked Data Spaces.
Various Linked Data Browsers: Zitgist Data Viewer, OpenLink RDF Browser, DISCO Browser, and TimBL's Tabulator.
zLknks - Linked Data Lookup technology for Web Content Publishing systems (note: more to come on this in a future post).
OpenLink Data Spaces - a solution for Data Portability via a Linked Data Junction Box for Web 1.0 ((X)HTML Document Webs), 2.0 (XML Web Services based Content Publishing, Content Syndication, and Aggregation), and 3.0 (Linked Data) Data Spaces. Thus, via my URI (when viewed through a Linked Data Browser/Viewer) you can traverse my Data Space (i.e my Linked Data Graph) generated by the following activities:
Virtuoso - a Universal Server Platform that includes RDF Data Management, RDFization Middleware, SQL-RDF Mapping, RDF Linked Data Deployment, alongside a hybrid/multi-model, virtual/federated data service in a single product offering.
BTW - There is a Linked Data Workshop at this years World Wide Web conference. Also note the Healthcare & Life Science Workshop which is a related Linked Data technology and Semantic Web best practices realm. ]]>Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.
The fundamental steps to creating Linked Data are as follows:
Choose a Name Reference Mechanism — i.e., URIs.
Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes
Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, OData, OpenGraph, and many others.
Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.
Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:
You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more.
Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data.
The fundamental steps to creating Linked Data are as follows:
Choose a Name Reference Mechanism — i.e., URIs.
Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes
Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, and OData; there are many others.
Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data.
Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows:
You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more.
A few months ago, Aldo Bucchi posted a message to the LOD mailing list seeking a discussion space for more business and marketing oriented topic, in relation to Linked Data. At the time, my assumption was that the existing LOD mailing list served that purpose absolutely fine, but in due course I came to realize that Aldo's request had a much lager foundation than I initially suspected.
Linked Data, like its umbrella Semantic Web Project, has suffered from an inadvertent oversight on the parts of many of its enthusiasts (myself included): 100% of the discussion spaces are created by, geared towards, or dominated by researchers (from Academia primarily) and/or developers. Thus, at the very least, we've been operating in an echo chamber that only feed the existing void between the core community and those who are more interested in discussing business and marketing related topics.
The new discussion space seeks to cover the following:
How Do I Join The Conversation? Simply sign up on the Google hosted BOLD mailing list, introduce yourself (ideally), and then start conversing! :-)
]]>A few months ago, Aldo Bucchi posted a message to the LOD mailing list seeking a discussion space for more business and marketing oriented topic, in relation to Linked Data. At the time, my assumption was that the existing LOD mailing list served that purpose absolutely fine, but in due course I came to realize that Aldo's request had a much lager foundation than I initially suspected.
Linked Data, like its umbrella Semantic Web Project, has suffered from an inadvertent oversight on the parts of many of its enthusiasts (myself included): 100% of the discussion spaces are created by, geared towards, or dominated by researchers (from Academia primarily) and/or developers. Thus, at the very least, we've been operating in an echo chamber that only feed the existing void between the core community and those who are more interested in discussing business and marketing related topics.
The new discussion space seeks to cover the following:
How Do I Join The Conversation? Simply sign up on the Google hosted BOLD mailing list, introduce yourself (ideally), and then start conversing! :-)
]]>The HTTP URI is the secret sauce of the Web that is powerfully and unobtrusively reintroduced via the Linked Data meme (classic back to the future act). This powerful sauce possess a unique power courtesy of its inherent duality i.e., how it uniquely combines Data Item Identity (think keys in traditional DBMS parlance) with Data Access (e.g. access to negotiable representations of associated metadata).
As you can see, I've made no mention of RDF or SPARQL, and I can still articulate the inherent value of the "Linked Data" dimension that the "Linked Data" meme adds to the World Wide Web.
As per usual this post is a live demonstration of Linked Data (dog-food style) :-)
The acronym stands for: Resource Description Framework. And that's just what it is.
RDF is comprised of a Data Model (EAV/CR Graph) and Data Representation Formats such as: N3, Turtle, RDF/XML etc.
RDF's essence is about: "Entities" and "Attributes" being URI based, while "Values" may be URI or Literals (typed or untyped) based.
URIs are Entity Identifiers.
Short for "Web of Linked Data" or "Linked Data Web".
A term coined by TimBL that describes an HTTP based "data access by reference pattern" that uses a single pointer or handle for "referring to" and "obtaining actual data about" an entity.
Linked Data uses the deceptively simple messaging scheme of HTTP to deliver a granular entity reference and access mechanism that transcends traditional computing boundaries such as: operating system, application, database engines, and networks.
Linked Data simply mandates the following re. RDF:
Note: by Entity I am also referring to: a resource (Web parlance), data item, data object, real-world object, or datum.
Linked Data is also about, using URIs and HTTP's content negotiation feature to separate: presentation, representation, access, and identity of data items. Even better, content negotiation can be driven by user agent and/or data server based quality of service algorithms (representation preference order schemes).
To conclude, Linked Data is ultimately about the realization that: Data is the new Electricity, and it's conductors are URIs :-)
Tip to governments of the world: we are in exponential times, the current downturn is but one side of the "exponential times ledger", the other side of the "exponential times ledger" is simply about unleashing "raw data" -- in structured form -- into the Web, so that "citizen analysts" can blossom and ultimately deliver the transparency desperately sought at every level of the economic value chain. Think: "raw data ready" whenever you ponder about "shovel ready" infrastructure projects!
]]>"..There is evidence that they promote LINKED DATA at any expense without understanding the rationale behind other approaches...".
To answer the question above, Linked Data is always relevant as long as we are actually talking about "Data" which is simply the case all of the time, irrespective of interaction medium.
If XBRL can be disconnected in anyway from Linked Data, I desperately would like to be enlightened (as per my comments to the post). Why wouldn't anyone desire the ability to navigate the linked data inherent in any financial report? Every entity in an XBRL instance document is an entity, directly or indirectly related to other entities. Why "Mash" the data when you can harmonize XBRL data via a Generic Financial Dictionary (schema or ontology) such that descriptions of Balance Sheet, P&L, and other entities are navigable via their attributes and relationships? In short, why "Mash" (code based brute force joining across disparately shaped data) when you can "Mesh" (natural joining of structured data entities)?
"Linked Data" is about the ability to connect all our observations (data)? , perceptions (information), and inferences / conclusions (knowledge) across a spectrum of interaction media. And it just so happens that the RDF data model (Entity-Attribute-Vaue + Class Relationships + HTTP based Object Identifiers), a range of RDF data model serialization formats, and SPARQL (Query Language and Web Service combo) actually make this possible, in a manner consistent with the essence of the global space we know as the World Wide Web.
As more Linked Data is injected into the Web from the Linking Open Data community and other initiatives, it's important to note that "Linked Data" is available in a variety of forms such as:
Note: The common glue across the different types of Linked Data remains the commitment to data object (entity) identification and access via de-referencable URIs (aka. record / entity level data source names).
As stated in my recent post titled: Semantic Web: Travails to Harmony Illustrated. Harmonious intersections of instance data, data dictionaries (schemas, ontologies, rules etc.) provide a powerful substrate (smart data) for the development and deployment of "People" and/or "Machine" oriented solutions. Of course, others have commented on these matters and expressed similar views (see related section below).
The clickable venn diagram below, provides a simple exploration path that exposes the linkage that already exists, across the different Linked Data types, within the burgeoning Linked Data Web.
Over emphasis on Description Logics (RDFS, OWL, Inference & Reasoning etc) matters without any actual real-world instance data (e.g., lot's of reasoning over RDF in zip files or local drives).
Over emphasis on Instance Data without Data Dictionary appreciation and utilization (e.g., Linked Data instance level linkage via "owl:sameAs").
Here we are dealing with numerous applications and frameworks that inextricably bind Instance Data Management and Data Dictionaries. Basically, an all or nothing proposition, if you want to delve into the RDF Linked Data solutions realm.
Often overlooked, is the fact that the Linked Data Web - as an aspect of the Semantic Web innovation continuum - is fundamentally about designing and constructing an "Open World" compatible DBMS for the Internet. Thus, erstwhile "Closed World" DBMS components such as Data Dictionaries (handlers of Data Definition, Referential Integrity etc.) and actual Instance Data, are now distributed and loosely coupled. Thus, your data could be in one Data Space while the data dictionary resides in another. In actual fact, you could have several loosely bound data dictionaries that serve the specific Inference and Reasoning needs of a variety of applications, services, or agents.
]]>Understanding potential Linked Data Web business models, relative to other Web based market segments, is best pursued via a BCG Matrix diagram, such as the one I've constructed below:
To conclude, the Linked Data Web's market opportunities are all about the evolution of the Web into a powerful substrate that offers a unique intersection of "Link Density" and "Relevance", exploitable across horizontal and vertical market segments to solutions providers. Put differently, SDQ is how you take "The Ad" out of "Advertising" when matching Web users to relevant things :-)
]]>The statement above resonates with a lot of my fundamental views about the essence of Web. It also drives right at the core of what we are trying to address with the OpenLink Data Explorer (ODE) which simply isn't about Linked Data visualization, but the combination of visualization, user interaction, and unobtrusive exposure and exploitation of Linked Data Entities culled from the existing Web of Linked Documents. ODE consumes and processes URIs or URLs. Thus, as long as the (X)HTML container / host document keeps URIs or URLs in "agent view", ODE will give you the option to interact with the-data-behind Web information resources (e.g., Web Pages, Images, Audio etc..)
Do remember, "mission-critical" is no longer a corporate / enterprise theme. The lines of demarcation between the individual and enterprise are blurring at warp speed.
]]>The big deal about LINQ has been the singular focus on addressing point 1, in particular.
I've already written about the Linq2Rdf effort that meshes the best of .NET with the virtues of the "Linked Data Web".
Here is an architecture diagram that seeks to illustrate the powerful data access and manipulation options that the combination of Linq2RDF and Linked Data deliver:
What may not have been obvious to most in the past, is the fact that Mapping from Object Models to Relational Models wasn't really the solution to the problem at hand. Instead, the mapping should have been the other way around i.e., Relational to Object Model mapping. The emergence of RDF and RDBMS to RDF mapping technology is what makes this age-old headache addressable in very novel ways.
Naturally, we've decided to join the Crunchbase RDFization party, and have just completed a Virtuoso Sponger Cartridge (an RDFizer) for Crouncbase. What we add in our particular cartridge is additional meshing with DBpedia and Wikicompany Linked Data Spaces, plus RDFizaton of the Crunchbase (X)HTML pages :-)
As I've postulated for a while, Linked Data is about data "Meshing" and "Meshups". This isn't a buzzword play. I am pointing out an important distinction between "Mashups" and "Meshpus". Which goes as follows: "Mashups" are about code level joining devoid of structured modelling, hence the revelation of code as opposed to data when you look behind a "Mashup". "Meshups" on the other hand, are about joining disparate structured data sources across the Web. And when you look behind a "Meshup" you see structured data (preferably Linked Data) that enables further "Meshing".
I truly believe that we are now inches away from critical mass re. Linked Data, and because we are dealing with data, the network-effect will be sky-high! I shudder to think about the state of the Linked Data Web in 12 months time. Yes, I am giving the explosion 12 months (or less). These are very exciting times.
Demo Links:
For best experience I encourage you to look at the OpenLink Data Explorer extension for Firefox (2.x - 3.x). This enables you to go to Crunchbase (X)HTML pages (and other sites on the Web of course), and then simply use the "View | Linked Data Sources" main or context menu sequence to unveil the Linked Data Sources associated with any Web Page.
Of course there is much more to come!
]]>Anyway, thanks to the Blogosphere, I can attempt to fix this problem myself -- via this post :-)
Q. If you wanted to provide a bewildered but still curious novice a public example of Linked Data at work in their everyday life, what would it be?
Kingsley Idehen: Any one of the following:
My Linking Open Data community Profile Page - the Linked Data integration is exposed via the "Explore Data" Tab My Linked Data Space - viewed via OpenLink's AJAR (Asynchronous Javascript and RDF) based Linked Data Brower My Events Calendar Tag Cloud - a Linked Data view of my Calendar Space using an RDF-aware browser In all cases, you have the ability to explore my data spaces by simply clicking on the links, which on the surface appear to be standard hypertext links, although in reality you are dealing with hyperdata links (i.e., links to entities that result in the generation of entity description pages that expose entity properties via hyperdata links). Thus, you have a single page that describes me in a very rich way since it encompasses all data associated with me, covering: personal profile, blog posts, bookmarks, tag clouds, social networks etc.
Q. What would you show the CEO or CTO of a company outside the tech industry?
Kingsley Idehen: A link to the Entity ALFKI, from the popular Northwind Database associated with Microsoft Access and SQL Server database installations. This particular link exposes a typical enterprise data space (orders, customers, employees, suppliers ...) in a single page. The hyperdata links represent intricate data relationships common to most business systems that will ultimately seek to repurpose existing legacy data sources and SOA services as Linked Data. Alternatively, I would show the same links via the Zitgist Data Viewer (another Linked Data-aware browser). In both cases, I am exploiting direct access to entities via HTTP due to the protocols incorporation into the Data Source Naming scheme.
]]>Note: the enhanced hyperlink (typed data link) lookup presents options to perform an Explore (all data about subject across Domains in the data space i.e. data links to and from Subject), Dereference (specific data in the Subject's Domain i.e. data links originating from subject).
I built these Linked Data Pages by simply doing the following:
DBpedia is a community effort to provide a contemporary deductive database derived from Wikipedia content. Project contributions can be partitioned as follows:
Comprising the nucleus of the Linked Open Data effort, DBpedia also serves as a fulcrum for the burgeoning Web of Linked Data by delivering a dense and highly-interlinked lookup database. In its most basic form, DBpedia is a great source of strong and resolvable identifiers for People, Places, Organizations, Subject Matter, and many other data items of interest. Naturally, it provides a fantastic starting point for comprehending the fundamental concepts underlying TimBL's initial Linked Data meme.
Depending on your particular requirements, whether personal or service-specific, DBpedia offers the following:
OpenLink Software has preloaded the DBpedia 3.6 datasets into a preconfigured Virtuoso Cluster Edition database, and made the package available for easy installation.
The DBpedia+Virtuoso package provides a cost-effective option for personal or service-specific incarnations of DBpedia.
For instance, you may have a service that isn't best-served by competing with the rest of the world for ad-hoc query time and resources on the live instance, which itself operates under various restrictions which enable this ad-hoc query service to be provided at Web Scale.
Now you can easily commission your own instance and quickly exploit DBpedia and Virtuoso's database feature set to the max, powered by your own hardware and network infrastructure.
Pre-requisites are simply:
To install the Virtuoso Cluster Edition simply perform the following steps:
Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell):
. /opt/virtuoso/virtuoso-enterprise.sh
mkcluster.sh
script which defaults to a 4 node cluster
VIRTUOSO_HOME
environment variable -- if you want to start cluster databases distinct from single server databases via distinct root directory for database files (one that isn't adjacent to single-server database directories)
virtuoso-start.sh
virtuoso-stop.sh
To install your personal or service specific edition of DBpedia simply perform the following steps:
dbpedia-install.sh
)
chmod 755 dbpedia-install.sh
VIRTUOSO_HOME
environment variable, e.g., to the current directory, via command (this may vary depending on your shell):
export VIRTUOSO_HOME=`pwd`
sh dbpedia-install.sh
Once the installation completes (approximately 1 hour and 30 minutes from start time), perform the following steps:
http://localhost:[port]/conductor
http://localhost:[port]/fct
http://localhost:[port]/resource/DBpedia
At the end of the marathon session, it was clear to me that a blog post was required for future reference, at the very least :-)
"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.
Information makes the world tick!
Information doesn't exist without data to contextualize.
Information is inaccessible without a projection (presentation) medium.
All information (without exception, when produced by humans) is subjective. Thus, to truly maximize the innate heterogeneity of collective human intelligence, loose coupling of our information and associated data sources is imperative.
Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources bearing structured representations of data object descriptions. Remember, you have a single Identifier abstraction (generic HTTP URI) that embodies: Data Object Name and Data Representation Location (aka URL).
A structured representation of data exists when an Entity (Datum), its Attributes, and its Attribute Values are clearly discernible. In the case of a Linked Data Object, structured descriptions take the form of a hypermedia based Entity-Attribute-Value (EAV) graph pictorial -- where each Entity, its Attributes, and its Attribute Values (optionally) are identified using Generic HTTP URIs.
Examples of structured data representation formats (content types) associated with Linked Data Objects include:
You markup resources by expressing distinct entity-attribute-value statements (basically these a 3-tuple records) using a variety of notations:
You can achieve this task using any of the following approaches:
Our data access middleware heritage (which spans 16+ years) has enabled us to assemble a rich portfolio of coherently integrated products that enable cost-effective evaluation and utilization of Linked Data, without writing a single line of code, or exposing you to the hidden, but extensive admin and configuration costs. Post installation, the benefits of Linked Data simply materialize (along the lines described above).
Our main Linked Data oriented products include:
Enjoy!
]]>Enjoy!
]]>The primary topic of a meme penned by TimBL in the form of a Design Issues Doc (note: this is how TimBL has shared his thoughts since the Beginning of the Web).
There are a number of dimensions to the meme, but its primary purpose is the reintroduction of the HTTP URI -- a vital component of the Web's core architecture.
They possess an intrinsic duality that combines persistent and unambiguous Data Identity with platform & representation format independent Data Access. Thus, you can use a string of characters that look like a contemporary Web URL to unambiguously achieve the following:
Enabling more productive use of the Web by users and developers alike. All of which is achieved by tweaking the Web's Hyperlinking feature such that it now includes Hypertext and Hyperdata as link types.
Note: Hyperdata Linking is simply what an HTTP URI facilitates.
Examples problems solved by injecting Linked Data into the Web:
If all of the above still falls into the technical mumbo-jumbo realm, then simply consider Linked Data as delivering Open Data Access in granular form to Web accessible data -- that goes beyond data containers (documents or files).
The value proposition of Linked Data is inextricably linked to the value proposition of the World Wide Web. This is true, because the Linked Data meme is ultimately about an enhancement of the current Web; achieved by reintroducing its architectural essence -- in new context -- via a new level of link abstraction, courtesy of the Identity and Access duality of HTTP URIs.
As a result of Linked Data, you can now have Links on the Web for a Person, Document, Music, Consumer Electronics, Products & Services, Business Opening & Closing Hours, Personal "WishLists" and "OfferList", an Idea, etc.. in addition to links for Properties (Attributes & Values) of the aforementioned. Ultimately, all of these links will be indexed in a myriad of ways providing the substrate for the next major period of Internet & Web driven innovation, within our larger human-ingenuity driven innovation continuum.
Dr. Dre is one of the artists in the Linked Data Space we host for the BBC. He is also referenced in music oriented data spaces such as DBpedia, MusicBrainz and Last.FM (to name a few).
How do I obtain a holistic view of the entity "Dr. Dre" across the BBC, MusicBrainz, and Last.FM data spaces? We know the BBC published Linked Data, but what about Last.FM and MusicBrainz? Both of these data spaces only expose XML or JSON data via REST APIs?
The following took place:
The new enhanced URI for Dr. Dre now provides a rich holistic view of the aforementioned "Artist" entity. This URI is usable anywhere on the Web for Linked Data Conduction :-)
Note: In proper Web parlance, a data object is referred to as a resource.
In the Linked Data realm, If you want to make a reference to the Linked Data meme in a blog post, you are better off using the resource URI: http://dbpedia.org/resource/Linked_Data, instead of the Web page URL: http://dbpedia.org/page/Linked_Data, which is the address of a physical document (an information conveying artifact) that at best visually presents the negotiated representation of a resource description.
In the simplest sense, you only have one focal point for referencing (referring to) and de-referencing (retrieving data about) a given Web resource. It protects you from the impact of Web document location changes (amongst many other things).
Remember, a single URI is a conduit into a realm where the identity, access, representation, presentation, and storage of a resource (data object) are completely distinct. It's the mechanism for conducting data across network, machine, operating system, dbms engine, application, and service (API) boundaries. Thus, without "linked data meme" prescribed URI referencing and de-referencing, we are simply back to "business as usual" re. the industry at large, where networks, operating systems, dbms engines, applications, and services (APIs) become the basis for "data lock-in" and silo construction.
Take a second to think about the profound virtues of the ubiquitous Web of Linked Document URLs that we have today, and then apply that thinking to the burgeoning Web of Linked Data URIs, that has just turned corner and heading in everyone's direction at full blast.
Note to "Social Media" players: Who you know isn't the canonical object of sociality. What you are i.e., your description and the data objects it exposes, are real objects of your sociality :-)
Web 1.0 | Web 2.0 | Web 3.0 | |
Simple Definition | Interactive / Visual Web | Programmable Web | Linked Data Web |
Unit of Presence | Web Page | Web Service Endpoint | Data Space (named structured data enclave) |
Unit of Value Exchange | Page URL | Endpoint URL for API | Resource / Entity / Object URI |
Data Granularity | Low (HTML) | Medium (XML) | High (RDF) |
Defining Services | Search | Community (Blogs to Social Networks) | Find |
Participation Quotient | Low | Medium | High |
Serendipitous Discovery Quotient | Low | Medium | High |
Data Referencability Quotient | Low (Documents) | Medium (Documents) | High (Documents and their constituent Data) |
Subjectivity Quotient | High | Medium (from A-list bloggers to select source and partner lists) | Low (everything is discovered via URIs) |
Transclusence | Low | Medium (Code driven Mashups) | HIgh (Data driven Meshups) |
What You See Is What You Prefer (WYSIWYP) | Low | Medium | High (negotiated representation of resource descriptions) |
Open Data Access (Data Accessibility) | Low | Medium (Silos) | High (no Silos) |
Identity Issues Handling | Low | Medium (OpenID) | High (FOAF+SSL) |
Solution Deployment Model | Centralized | Centralized with sprinklings of Federation | Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia) |
Data Model Orientation | Logical (Tree based DOM) | Logical (Tree based XML) | Conceptual (Graph based RDF) |
User Interface Issues | Dynamically generated static interfaces | Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath) | Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF |
Data Querying | Full Text Search | Full Text Search | Full Text Search + Structured Graph Pattern Query Language (SPARQL) |
What Each Delivers | Democratized Publishing | Democratized Journalism & Commentary (Citizen Journalists & Commentators) | Democratized Analysis (Citizen Data Analysts) |
Star Wars Edition Analogy | Star Wars (original fight for decentralization via rebellion) | Empire Strikes Back (centralization and data silos make comeback) | Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation") |
Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)
Here are some examples of how we distill Entities (People, Places, Music, and other things) from Freebase (X)HTML pages (meaning: we don't have to start from RDF information resources as data sources for the eventual RDF Linked Data we generate):
Tip: Install our OpenLink Data Explorer extension for Firefox. Once installed, simply browse through Freebase, and whenever you encounter a page about something of interest, simply use the following sequences to distill (via the Page Description feature) the entities from the page you are reading:
Here is a look at our offerings by product family:
As you explore the Linked Data graph exposed via our product portfolio, I expect you to experience, or at least spot, the virtuous potential of high SDQ (Serendipitous Discovery Quotient) courtesy of Linked Data, which is Web 3.0's answer to SEO. For instance, how Database, Operating System, and Processor family paths in the product portfolio graph (data network) unveil a lot more about OpenLink Software than meets the proverbial "eye" :-)
]]>First up, the Library of Congress, take a look at the following pages which are "Human" and machine based "User Agent" friendly:
Key point: The pages above are served up in line with Linked Data deployment and publishing tenets espoused by the Linking Open Data Community (LOD) which include (in my preferred terminology):
The items above are features that users and decision makers should start to hone into when seeking, and evaluating, platforms that facilitate cost-effective exploitation of the Linked Data Web.
]]>Ivan's presentation titled: State of the Semantic Web, is a must view for those who need a quick update on where things are re. the Semantic Web in general.
I also liked the fact that in proper "Lead by example" manner, his presentation isn't PDF or PPT based, it's a Web Document :-)
Hint: as per usual, this post contains a Linked Data demo nugget. This time around, it's in the form of a shared calendar covering a large number of Semantic Web Technology events. All I had to do was subscribe to a number of WebDAV accessible iCal files from my Calendar Data Space and the platform did the rest i.e. produce Linked Data Objects for events associated with a plethora of conferences.
If you assimilate Ivan's presentation properly, you will note I've just generated, and shared, a large number of URIs covering a range of conference events. Thus, you can extend my contributions (thereby enriching the GGG) by simply associating additional data from your Linked Data Space with mine. All you have to do is use my calendar data objects URIs in your statements.
]]>Of course, I also believe that Linked Data serves Web Data Integration across the Internet very well too, and the fact that it will be beneficial to businesses in a big way. No individual or organization is an island, I think the Internet and Web have done a good job of demonstrating that thus far :-) We're all data nodes in a Giant Global Graph.
Daniel lewis did shed light on the read-write aspects of the Linked Data Web, which is actually very close to the callout for a Wikipedia for Data. TimBL has been working on this via Tabulator (see Tabulator Editing Screencast), Bengamin Nowack also added similar functionality to ARC, and of course we support the same SPARQL UPDATE into an RDF information resource via the RDF Sink feature of our WebDAV and ODS-Briefcase implementations.
]]>The great thing about the Linked Data Web is that it's much easier to discovery and respond to these points of view before the ink dries :-) Ben certainly needs to take a look at the Semantic Web FAQ pre or post assimilation of Daniel's response.
]]>Evoluation is evolution devoid of the randomness of mutation. A state of being in which it is possible to evaluate and choose evolutionary paths.
Evoluation actually describes where we are today in relation to the World Wide Web; to the Linking Open Data community (LOD), it's taking the path towards becoming a Giant Global Graph of Linked Data; to the Web 2.0 community, it's simply a collection of Web Services and associated APIs; and to many others, it remains an opaque collection of interlinked documents.
The great thing about the Web is that it allows netizens to explore a plethora of paths without adversely affecting the paths of others. That said, controlling one's path may take mutation out of evolution, but we are still left with the requirement to adapt and eventually survive in a competitive environment. Thus, although we can evaluate and choose from the many paths the Web's evolution offers us, the path that delivers the most benefits ultimately dominates. :-)
]]>In the form above (the norm), Wordpress data can be injected into the Linked Data Web via RDFization middleware such as theVirtuoso Sponger (built into all Virtuoso instances) and Triplr. The downside of this approach is that the blog owner doesn't necessary possess full control over their contributions to the emerging Giant Global Graph or Linked Data.
Another route to Linked Data exposure is via Virtuoso's Metaschema Language for producing RDF Views over ODBC/JDBC accessible Data Sources, that enables the following setup:
Alternatively, you can also exploit Virtuoso as the SQL DBMS, RDF DBMS, Application Server, and Linked Data Deployment platform:
How Do I map the WordPress SQL Schema to RDF using Virtuoso?
Read the Meta Schema Language guide or simply apply our "WordPress SQL Schema to RDF" script to your Virtuoso hosted instance. Of course, there are other mappings that cover other PHP applications deployed via Virtuoso:
Now I can simply state the following using Linked Data (hyperdata) links:
OpenLink Software's product porfolio is comprised of the following product families:We no longer have to explain (repeatedly) why our drivers exist in Express, Lite, and Multi-Tier Edition formats, or why you ultimately need Multi-Tier Drivers over Single Tier Drivers (Express or Lite Editions) since you ultimately heed high-performance, data encryption, and policy based security across each of the data access driver formats.
]]>A while back, I wrote a post titled:Why we need Linked Data. The aim of the post was to bring attention to the implications of exponential growth of User Generated Content (typically, semi-structured and unstructured data) on the Web. The growth in question is occurring within a fixed data & information processing timeframe (i.e. there will always be 24hrs in a day), which sets the stage for Information Overload as expressed in a recent post from ReadWriteWeb titled: Visualizing Social Media Fatigue.
The emerging "Web of Linked Data" augments the current "Web of Linked Documents", by providing a structured data corpus partitioned by containers I prefer to call: Data Spaces. These spaces enable Linked Data aware solutions to deliver immense value such as, complex data graph traversal, starting from document beachheads, that expose relevant data within a faction of the time it would take to achieve the same thing using traditional document web methods such as full text search patterns, scraping, and mashing etc.
Remember, our DNA based data & information system far exceeds that of any inorganic system when it comes to reasoning, but it remains immensely incapable of accurately and efficiently processing huge volumes of data & information -- irrespective of data model.
The Idea behind the Semantic Web has always been about an evolution of the Web into a structured data collective comprised of interlinked Data items and Data Containers (Data Spaces). Of course we can argue forever about the Semantics of the solution (ironically), but we can't shirk away from the impending challenges that "Information Overload" is about to unleash on our limited processing time and capabilities.
For those looking for a so called "killer application" for the Semantic Web, I would urge you to align this quest with the "Killer Problem" of our times, because when you do so you will that all routes lead to: Linked Data that leverages existing Web Architecture.
Once you understand the problem, you will hopefully understand that we all need some kind of "Data Junction Box" that provides a "Data Access Focal Point" for all of the data we splatter across the net as we sign up for the next greatest and latest Web X.X hosted service, or as we work on a daily basis with a variety of tools within enterprise Intranets.
BTW - these "Data Junction Boxes" will also need to be unobtrusively bound to our individual Identities.
]]>Writing a JDBC Driver for SPARQL is a little overkill. OpenOffice.org simply needs to make XML or Web Data (HTML, XHTML, and XML) bonafide data sources within its "Pivot Table" functionality realm. Then all that would then be required is a SPARQL SELECT Query transported via the SPARQL Protocol with results sent back using the SPARQL XML results serialization format (all part of a single SPARQL Protocol URL).
Excel successfully consumes the following information resource URI: http://tinyurl.com/yvoccj (a tiny url for a SPARQL SELECT against my FOAF file).
Alternatively, and currently achievable, you could simply use SPASQL (SPARQL within SQL) using a DBMS engine that supports SQL, SPARQL, and SPARQL e.g. Virtuoso.
Virtuoso SPASQL support is exposed via it's ODBC and/or JDBC Drivers. Thus you can do things such as:
BTW - My News Years Resolution: get my act together and shrink the ever increasing list of "simple & practical Virtuoso use case demos" on my todo which now spans all the way back to 2006 :-(
]]>A dynamically generated Web Page comprised of Semantic Data Web style data links (formally typed links) and traditional Document Web links (generic links lacking type specificity).
Linked Data Pages will ultimately enable Facebook users to inject their public data into the Semantic Data Web as RDF based Linked Data. For instance, my Facebook Profile & Photo albums data is now available as RDF, without paying a cent of RDF handcrafting tax, thanks to the Virtuoso Sponger (middleware for producing RDF from non RDF data sources) which is now equipped with a new RDFizer Cartridger for the Facebook Query Language (FQL) and RESTful Web Service.
Demo Notes:
When you click on a link in DLD pages, you will be presented with a lookup that exposes the different interaction options associated with a given URI. Examples include:
Remember, the facebook URLs (links to web pages) are being converted, on the fly, into RDF based Structured Data ( graph model database) i.e Entity Sets that possess formally defined characteristics (attributes) and associations (relationships).
A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).
SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.
Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.
Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.
What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:
Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:
Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.
Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:
SPARQL Endpoint: Linked Open Data Cache (8.5 Billion+ Quad Store which includes data from Geonames and the Linked GeoData Project Data Sets) .
Sticking with the TechCrunch layout, here is why all roads simply lead to Linked Data come 2010 and beyond:
As I've stated in the past (across a variety of mediums), you cannot build applications that have long term value without addressing the following issues:
The items above basically showcase the very essence of the HTTP URI abstraction that drives HTTP based Linked Data; which is also the basic payload unit that underlies REST.
I simply hope that the next decade marks a period of broad appreciation and comprehension of Data Access, Integration, and Management issues on the parts of: application developers, integrators, analysts, end-users, and decision makers. Remember, without structured Data we cannot produce or share Information, and without Information, we cannot produce of share Knowledge.
Note to Web Programmers: Linked Data is about Data (Wine) and not about Code (Fish). Thus, it isn't a "programmer only zone", far from it. More than anything else, its inherently inclusive and spreads its participation net widely across: Data Architects, Data Integrators, Power Users, Knowledge Workers, Information Workers, Data Analysts, etc.. Basically, everyone that can "click on a link" is invited to this particular party; remember, it is about "Linked Data" not "Linked Code", after all. :-)
Here is an example of a Linked Data value pyramid that I am stumbling across --with some frequency-- these days (note: 1 being the pyramid apex):
Basically, Linked Data deployment (assigning de-referencable HTTP URIs to DBMS records, their attributes, and attribute values [optionally] ) is occurring last. Even worse, this happens in the context of Linked Open Data oriented endeavors, resulting in nothing but confusion or inadvertent perpetuation of the overarching pragmatically challenged "Semantic Web" stereotype.
As you can imagine, hitting SPARQL as your introduction to Linked Data is akin to hitting SQL as your introduction to Relational Database Technology, neither is an elevator-style value prop. relay mechanism.
In the relational realm, killer demos always started with desktop productivity tools (spreadsheets, report-writers, SQL QBE tools etc.) accessing, relational data sources en route to unveiling the "Productivity" and "Agility" value prop. that such binding delivered i.e., the desktop application (clients) and the databases (servers) are distinct, but operating in a mutually beneficial manner to all, courtesy of a data access standards such as ODBC (Open Database Connectivity).
In the Linked Data realm, learning to embrace and extend best practices from the relational dbms realm remains a challenge, a lot of this has to do with hangovers from a misguided perception that RDF databases will somehow completely replace RDBMS engines, rather than compliment them. Thus, you have a counter productive variant of NIH (Not Invented Here) in play, taking us to the dreaded realm of: Break the Pot and You Own It (exemplified by the 11+ year Semantic Web Project comprehension and appreciation odyssey).
From my vantage point, here is how I believe the Linked Data value pyramid should be layered, especially when communicating the essential value prop.:
Note to Web Programmers: Linked Data is about Data (Wine) and not about Code (Fish). Thus, it isn't a "programmer only zone", far from it. More than anything else, its inherently inclusive and spreads its participation net widely across: Data Architects, Data Integrators, Power Users, Knowledge Workers, Information Workers, Data Analysts, etc.. Basically, everyone that can "click on a link" is invited to this particular party; remember, it is about "Linked Data" not "Linked Code", after all. :-)
Here is an example of a Linked Data value pyramid that I am stumbling across --with some frequency-- these days (note: 1 being the pyramid apex):
Basically, Linked Data deployment (assigning de-referencable HTTP URIs to DBMS records, their attributes, and attribute values [optionally] ) is occurring last. Even worse, this happens in the context of Linked Open Data oriented endeavors, resulting in nothing but confusion or inadvertent perpetuation of the overarching pragmatically challenged "Semantic Web" stereotype.
As you can imagine, hitting SPARQL as your introduction to Linked Data is akin to hitting SQL as your introduction to Relational Database Technology, neither is an elevator-style value prop. relay mechanism.
In the relational realm, killer demos always started with desktop productivity tools (spreadsheets, report-writers, SQL QBE tools etc.) accessing, relational data sources en route to unveiling the "Productivity" and "Agility" value prop. that such binding delivered i.e., the desktop application (clients) and the databases (servers) are distinct, but operating in a mutually beneficial manner to all, courtesy of a data access standards such as ODBC (Open Database Connectivity).
In the Linked Data realm, learning to embrace and extend best practices from the relational dbms realm remains a challenge, a lot of this has to do with hangovers from a misguided perception that RDF databases will somehow completely replace RDBMS engines, rather than compliment them. Thus, you have a counter productive variant of NIH (Not Invented Here) in play, taking us to the dreaded realm of: Break the Pot and You Own It (exemplified by the 11+ year Semantic Web Project comprehension and appreciation odyssey).
From my vantage point, here is how I believe the Linked Data value pyramid should be layered, especially when communicating the essential value prop.:
If you perform the steps above, on any HTTP network (e.g. World Wide Web), you implicitly bind the Names/Identifiers of things to negotiable representations of their metadata (description) bearing documents.
Also note, you can create and deploy the resulting RDF metadata using any of the following approaches:
section from IETF's Domain Keys spec. (paraphrased by me)
.
The Linked Data meme is based on the use of HTTP based URIs as reference / identifier labels associated with the "identity abstraction" referred to above. Thus, when you de-reference (request information about) an HTTP based URI you ultimately end up with a resource URL that exposes the "constellation of characteristics" mentioned above, in a representation negotiated at request time -- between an HTTP client and server e.g., (X)HTML, JSON, XML, RDF/XML, N3, Turtle, Trix, others :-)
With great joy and pride, I wish Structured Dynamics all the success they deserve. Naturally, the collaborations and close relationship between OpenLink Software and its latest technology partner will continue -- especially as we collectively work towards a more comprehendible and pragmatic Web of Linked Data for developers (across Web 1.0, 2.0, 3.0, and beyond), end-users (information- and knowledge-workers), and entrepreneurs (driven by quality and tangible value contribution).
Excerpted from the project home page:
The NeuroCommons project seeks to make all scientific research materials - research articles, annotations, data, physical materials - as available and as useable as they can be. We do this by both fostering practices that render information in a form that promotes uniform access by computational agents - sometimes called "interoperability". We want knowledge sources to combine meaningfully, enabling semantically precise queries that span multiple information sources.
In a nutshell, a great project that makes practical use of Linked Data Web technology in the areas of computational biology and neuroscience.
A pre-installed and fully tuned edition of Virtuoso that includes a fully configured Neurocommons Knowledgebase (in RDF Linked Data form) on Amazon's EC2 Cloud platform.
Generally, it provides a no-hassles mechanism for instantiating personal-, organization-, or service-specific instances of a very powerful research knowledgebase within approximately 1.15 hours compared to a lengthy rebuild from RDF source data alternative that takes 14 hours or more, depending on machine hardware configuration and host operating system resources.
A pre-installed and fully tuned edition of Virtuoso that includes a fully configured DBpedia instance on Amazon's EC2 Cloud platform.
Generally, it provides a no hassles mechanism for instantiating personal, organization, or service specific instances of DBpedia within approximately 1.5 hours as opposed to a lengthy rebuild from RDF source data that takes between 8 - 22 hours depending on machine hardware configuration and host operating system resources.
From a Web Entrepreneur perspective it offers all of the generic benefits of a Virtuoso EC2 AMI plus the following:
Here are a few live examples of DBpedia resource URIs deployed and de-referencable via one of my EC2 based personal data spaces:
Basically this is how it works.
DBpedia replica implies:
Tomorrow is the official go live day (due to last minute price changes), but you can instantiate a paid Virtuoso AMI starting now :-)
To be continued...
]]>Typically, Orri's post are targeted at the hard core RDF and SQL DBMS audiences, but in this particular post, he shoots straight at the business community revealing "Opportunity Cost" containment as the invisible driver behind the business aspects of any market inflection.
Remember, the Web isn't ubiquitous because its users mastered the mechanics and virtues of HTML and/or HTTP. Web ubiquity is a function of the opportunity cost of not being on the Web, courtesy of the network effects of hyperlinked documents -- i.e., the instant gratification of traversing documents on the Web via a single click action. In similar fashion, the Linked Data Web's ubiquity will simply come down to the opportunity cost of not being "inside the Web", courtesy of the network effects of hyperlinked entities (documents, people, music, books, and other "Things").
Here are some excerpts from Orri's post:
Every time there is a major shift in technology, this shift needs to be motivated by addressing a new class of problem. This means doing something that could not be done before. The last time this happened was when the relational database became the dominant IT technology. At that time, the questions involved putting the enterprise in the database and building a cluster of line of business applications around the database. The argument for the RDBMS was that you did not have to constrain the set of queries that might later be made, when designing the database. In other words, it was making things more ad hoc. This was opposed then on grounds of being less efficient than the hierarchical and network databases which the relational eventually replaced. Today, the point of the Data Web is that you do not have to constrain what your data can join or integrate with, when you design your database. The counter-argument is that this is slow and geeky and not scalable. See the similarity? A difference is that we are not specifically aiming at replacing the RDBMS. In fact, if you know exactly what you will query and have a well defined workload, a relational representation optimized for the workload will give you about 10x the performance of the equivalent RDF warehouse. OLTP remains a relational-only domain. However, when we are talking about doing queries and analytics against the Web, or even against more than a handful of relational systems, the things which make RDBMS good become problematic.
If we think about Web 1.0 as a period where the distinguishing noun was: "Author", and Web 2.0 the noun: "Journalist", we should be able to see that what comes next is the noun: "Analyst". This new generation analyst would be equipped with de-referencable Web Identity courtesy of their Person Entity URI. The analyst's URI would also be the critical component of Web based low cost attribution ecosystem; one that ultimately turns the URI into the analyst's brand emblem / imprint.
If the RDF generated, results in an entity-to-entity level network (graph) in which each entity is endowed with a de-referencable HTTP based ID (a URI), we end up with an enhancement to the Web that adds Hyperdata linking across extracted entities, to the existing Hypertext based Web of linked documents (pages, images, and other information resource types). Thus, I can use the same URL linking mechanism to reference a broader range of "Things" i.e., documents, things that documents are about, or things loosely associated with documents.
The Virtuoso Sponger is an example of an RDF Middleware solution from OpenLink Software. It's an in-built component of the Virtuoso Universal Server, and deployable in many forms e.g., Software as Service (SaaS) or traditional software installation. It delivers RDF-ization services via a collection of Web information resource specific Cartridges/Providers/Drivers covering Wikipedia, Freebase, CrunchBase, WikiCompany, OpenLibrary, Digg, eBay, Amazon, RSS/Atom/OPML feed sources, XBRL, and many more.
RDF-ization alone doesn't ensure valuable RDF based Linked Data on the Web. The process of producing RDF Linked Data is ultimately about the art of effectively describing resources with an eye for context.
The animation that follows illustrates the process (5,000 feet view), from grabbing resources via HTTP GET, to injecting RDF Linked Data back into the Web cloud:
Note: the Shredder is a Generic Cartridge, so you would have one of these per data source type (information resource type).
]]>From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).
Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)
"Artificial intelligence is supposed to let machines do things for people. The risk is that we may rely too much on them. Two months ago, for instance, writer Nicolas Carr asked whether Google is making us stupid. In my recent blog series "The Age of Google," I extended Carr’s discussion. Due to the success of Google, we are relying more on objective search than on active thinking to answer questions. In consequence, the more Google has advanced its service, the farther Google users have drifted from active thinking."
"But at least one form of human thinking cannot be replaced by machines. I am not talking about inference/discovery (which machines may be capable of doing) but about creation/generation-from-nothing (which I don’t believe machines may ever do)."
I tend to describe our ability to create/generate-from-nothing as "Zero-based Cognition", which is initially about "thought" and the eventually about "speed of thought dissemination" and "global thought meshing".
In a peculiar sense, Zero-based cognition is analogous to Zero-based budgeting from the accounting realm :-)
]]>If your Web presence goes beyond (X)HTML pages, via the addition of REST or SOAP based Web Services, then you re participating in Web usage dimension 2.0.
If you Web presence includes all of the above, with the addition of structured data interlinked with structured data across other points of presence on the Web, then you are participating in Web usage dimension 3.0 i.e., "Linked Data Web" or "Web of Data" or "Data Web".
BTW - If you've already done all of the above, and you have started building intelligent agents that exploit the aforementioned structured interlinked data substrate, then you are already in Web usage dimension 4.0.
A while back I watched Kevin Kelly's 5,000 days presentation at TED. During the presentation, I kept on scratching my head, wondering why phrases like "Linked Data", "Semantic Web", "Web of Data", "Data Web" where so unnaturally disconnected from his session narrative.
Yesterday I watched IMINDI's TechCrunch 50 presentation, and once again I saw the aforementioned pattern repeat itself. This time around, the poor founders of this "Linked Data Web" oriented company (which is what they are in reality) took a totally undeserved pasting from a bunch of panelist incapable of seeing beyond today (Web 2.0) and yesterday (initial Web bootstrap).
Anyway, thanks to the Web, this post will make a small contribution towards re-connecting the missing phrases to these "Linked Data Web" presentations.
]]>Courtesy of Linked Data, we are now able to extend the "document to document" linking mechanism of the Web (Hypertext Linking) to more granular "entity to entity" level linking. And in doing so, we have a layer of abstraction that in one swoop alleviates all of the infrastructure oriented data access impediments of yore. I know this sounds simplistic, but be rest assured, imbibing Linked Data's value proposition is really just that simple, once you engage solutions (e.g. Virtuoso) that enable you to deploy Linked Data across your enterprise.
Microsoft ACCESS, SQL Server, and Virtuoso all use the Northwind SQL DB Schema as the basis of the demonstration database shipped with each DBMS product. This schema is comprised of common IS/MIS entities that include: Customers, Contacts, Orders, Products, Employees etc.
What we all really want to do as data, information, and knowledge consumers and/or dispatchers, is be no more than a single "mouse click" away from relevant data/information/knowledge data access and/or exploration. Even better (but not always so obvious), we also want anyone in our network (company, division, department, cube-cluster) to inherit these data access efficiencies.
In this example, the Web Page about the Customer "ALKI" provides me with a myriad of exploration and data access paths e.g., when I click on the foaf:primarytopic property value link.
This simple example, via a single Web Page, should put to rest any doubts about the utility of Linked Data. Of course this is an old demo, but this time around the UI is minimalist as my prior attempts skipped a few steps i.e., starting from within a Linked Data explorer/browser.
Important note: I haven't exported SQL into an RDF data warehouse, I am converting the SQL into RDF Linked Data on the fly which has two fundamental benefits:
Enjoy!
Note: You can substitute my examples using any Web resource URL. The underlying RDFization and Linked Data deployment functionality of the Virtuoso demo instance takes care of everything else. Also note that the HTML based resource description page capability is now deployed as part of the Virtuoso Sponger component of every Virtuoso installation starting with from version 5.0.8.
]]>My contribution to the developing discourse takes the form of a Q&A session. I've taken the questions posed and provided answers that express my particular points of view:
Q: Is the desktop of the future going to just be a web-hosted version of the same old-fashioned desktop metaphors we have today?
A: No, it's going to be a more Web Architecture aware and compliant variant exposed by appropriate metaphors.
Q: The desktop of the future is going to be a hosted web service
A: A vessel for exploiting the virtues of the Linked Data Web.
Q: The Browser is Going to Swallow Up the Desktop
A: Literally, of course not! Metaphorically, of course! And then the Browser metaphor will decomposes into function specific bits of Web interaction amenable to orchestration by its users.
Q: The focus of the desktop will shift from information to attention
A: No! Knowledge, Information, and Data sharing courtesy of Hyperdata & Hypertext Linking.
Q: Users are going to shift from acting as librarians to acting as daytraders
A: They were Librarians at Web 1.0, Journalist at Web 2.0, and Analysts in Web 3.0 (i.e, analyze structured and interlinked data), and CEOs in Web 4.0 (i.e. get Agents to do stuff intelligently en route to making decisions).
Q: The Webtop will be more social and will leverage and integrate collective intelligence
A: The Linked Data Web vessel will only require you to fill in your profile (once) and then serendipitous discovery and meshing of relevant data will simply happen (the serendipity quotient will grow in line with Linked Data Web density).
Q: The desktop of the future is going to have powerful semantic search and social search capabilities built-in
A: It is going to be able to "Find" rather than "Search" for stuff courtesy of the Linked Data Web.
Q: Interactive shared spaces will replace folders
A: Data Spaces and their URIs (Data Source Names) replace everything. You simply choose the exploration metaphor that best suits you space interaction needs.
Q: The Portable Desktop
A: Ubiquitous Desktop i.e. do the same thing (all answers above) on any device connected to the Web.
Q: The Smart Desktop
A: Vessels with access to Smart Data (Linked Data + Action driven Context sprinklings).
Q: Federated, open policies and permissions
A: More federation for sure, XMPP will become a lot more important, and OAuth will enable resurgence of the federated aspects of the Web and Internet.
Q: The personal cloud
A: Personal Data Spaces plugged into Clouds (Intranet, Extranet, Internet).
Q: The WebOS
A: An operating system endowed with traditional Database and Host Operating system functionality such as: RDF Data Model, SPARQL Query Language, URI based Pointer mechanism, and HTTP based message Bus.
Q: Who is most likely to own the future desktop?
A: You! And all you need is a URI (an ID or Data Source Name for "Entity You") and a Profile Page (a place where "Entity You" is Describe by You).
You can get a feel for the future desktop by downloading and then installing the OpenLink Data Explorer plugin for Firefox, which allows you to switch viewing modes between Web Page and Linked Data behind the page. :-)
By coincidence, Glenn and I presented at this month's Cambridge Semantic Web Gathering.
I've provided a dump of Glenn's issues and my responses below:
RDF is a Graph based Data Model it stands for Resource Description Framework. The Metadata data angle comes from it's Meta Content Framework (MCF) origins. You can express and serialize data based on the RDF Data Model using: Turtle, N3, TriX, N-Triples, and RDF/XML.
These are just appeasement:
- old query paradigm: fishing in dark water with superstitiously tied lures; only works well in carefully stocked lakes
- we don't ask questions by defining answer shapes and then hoping they're dredged up whole.
SPARQL, MQL, and Entity-SQL are Graph Model oriented Query Languages. Query Languages always accompany Database Engines. SQL is the Relational Model equivalent.
Noble attempt to ground the abstract, but:
- URI dereferencing/namespace/open-world issues focus too much technical attention on cross-source cases where the human issues dwarf the technical ones anyway
- FOAF query over the people in this room? forget it.
- link asymmetry doesn't scale
- identity doesn't scale
- generating RDF from non-graph sources: more appeasement, right where the win from actually converting could be biggest!
Innovative use of HTTP to deliver "Data Access by Reference" to the Linked Data Web.
When you have a Data Model, Database Engine, and Query Language, the next thing you need is a Data Access mechanism that provides "Data Access by Reference". ODBC and JDBC (amongst others) provide "Data Access by Reference" via Data Source Names. Linked Data is about the same thing (URIs are Data Source Names) with the following differences:
Hugely motivating and powerful idea, worthy of a superhero (Graphius!), but:
- giant and global parts are too hard, and starting global makes every problem harder
- local projects become unmanageable in global context (Cyc, Freebase data-modeling lists...).
And my thus my plea, again. Forget "semantic" and "web", let's fix the database tech first:
- node/arc data-model, path-based exploratory query-model
- data-graph applications built easily on top of this common model; building them has to be easy, because if it's hard, they'll be bad
- given good database tech, good web data-publishing tech will be trivial!
- given good tools for graphs, the problems of uniting them will be only as hard as they have to be.
Giant Global Graph is just another moniker for a "Web of Linked Data" or "Linked Data Web".
Multi-Model Database technology that meshes the best of the Graph & Relational Models exist. In a nutshell, this is what Virtuoso is all about and it's existed for a very long time :-)
Virtuoso is also a Virtual DBMS engine (so you can see Heterogeneous Relational Data via Graph Model Context Lenses). Naturally, it is also a Linked Data Deployment platform (or Linked Data Sever).
The issue isn't the "Semantic Web" moniker per se., it's about how Linked Data (foundation layer of Semantic Web) gets introduced to users. As I said during the MIT Gathering: "The Web is experienced via Web Browsers primarily, so any enhancement to the Web must be exposed via traditional Web Browsers", which is why we've opted to simply add "View Linked Data Sources" to the existing set of common Browser options that includes:
By exposing the Linked Data Web option as described above, you enable the Web user to knowingly transition from the traditional Rendered (X)HTML page view to the Linked Data View (i.e., structured data behind the page). This simple "User Interaction" tweak makes the notion of exploiting a Structured Web becomes somewhat clearer.
The Linked Data Web isn't a panacea. It's just an addition to the existing Web that enrichens the things you can do with the Web. It's predominance, like any application feature, will be subject to the degrees to which it delivers tangible value or matrializes internal and external opportunity costs.
Note: The Web isn't ubiquitous today becuase all it's users groked HTML Markup. It's ubquitity is a function of opportunity costs: there simply came a point in the Web boostrap when nobody could afford the opportunity costs associated with being off the Web. The same thing will play out with Linked Data and the broader Semantic Web vision.
Links:URIs are simple to use i.e you simply click on them via a user agents UI. However, URLs when incorporated into Data Source Naming en route to constructing HTTP based Identifiers, that deliver HTTP based pointers to the location / address of a Resource Descriptions, another matter.
I touched on this issue in my Linked Data Planet keynote last week, and I must say, it did set off a light.
I believe, we can only get the broader Web community to comprehend the utility of URIs (Web Data Source Names) by exposing said utility via the Web's Universal Client (Web Browser). For instance, how do URN based Identity / Naming schemes help in a world dominated by Web Browsers that only grok "http://"? From my vantage point, the practical solution is for data providers who already have "doi", "lsid" and other Handle based Identifiers in place, to embark upon http-to-native-naming-scheme-proxying.
In my usual "dog-fooding" and "practice what you preach" fashion, this is exactly what we do in the new Linked Data Web extension that we've decided to reveal to the public (albeit late beta). Thus, when you use an existing browser to view pages with "lsid" or "doi" URNs, you still enjoy the utility of getting at the "Raw Linked Data Sources" that these names expose.
]]>Here is the list:
For the time challenged (i.e. those unable to view this post using it's permalink / URI as a data source via the OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, or Tabulator), the benefits of this post are as follows:
Put differently, I cost-effectively contribute to the GGG across all Web interaction dimensions (1.0, 2.0, 3.0) :-)
]]>Trent Adams, Steve Greenberg, and I, also had a podcast chat about Web Data Portability and Accessibility (Linked Data). I also remixed Jon Breslin's "Data Portability & Me" presentation to produce: "Data Accessibility & Me".
The podcasts interviews and presentations provide contributions to the broadening discourse about Open Data Access / Connectivity on the Web.
]]>*On* the ubiquitous Web of "Linked Documents", HREF means (by definition and usage): Hypertext Reference to an HTTP accessible Data Object of Type: "Document" (an information resource). Of course we don't make the formal connection of Object Type when dealing with the Web on a daily basis, but whenever you encounter the "resource not found" condition notice the message: HTTP/1.0 404 Object Not Found, from the HTTP Server tasked with retrieving and returning the resource.
*In* the Web of "Linked Data", a complimentary addition to the current Web of "Linked Documents", HREF is used to reference Data Objects that are of a variety of "Types", not just "Documents". And the way this is achieved, is by using Data Object Identifiers (URIs / IRIs that are generated by the Linked Data deployment platform) in the strict sense i.e. Data Identity (URI) is separated from Data Address (URL). Thus, you can reference a Person Data Object (aka an instance of a Person Class) in your HREF and the HTTP Server returns a Description of the Data Object via a Document (again, an information resource). A document containing the Description of a Data Object typically contains HREFs to other Data Objects that expose the Attributes and Relationships of the initial Person Data Object, and it this collection of Data Objects that is technically called a "Graph" -- which is what RDF models.
What I describe above is basic stuff for anyone that's familiar with Object Database or Distributed Objects technology and concepts.
The Linked Document Web is a collection of physical resources that traverse the Web Information Bus in palatable format i.e documents. Thus, Document Object Identity and Document Object Data Address can be the same thing i.e. a URL can serve as the ID/URI of a Document Data Object.
The Linked Data Web on the other hand, is a Distributed Object Database, and each Data Object must be uniquely defined, otherwise we introduce ambiguity that ultimately taints the Database itself (making incomprehensible to reasoning challenged machines). Thus we must have unique Object IDs (URIs / IRIs) for People, Places, Events, and other things that aren't Documents. Once we follow the time tested rules of Identity, People can then be associated with the things they create (blog posts, web pages, bookmarks, wikiwords etc). RDF is about expressing these graph model relationships while RDF serialization formats enables the information resources to transport these data object link ladden information resources to requesting User Agents.
Put in more succinct terms, all documents on the Web are compound documents in reality (e.g. mast contain a least an image these days). The Linked Data Web is about a Web where Data Object IDs (URIs) enable us to distill source data from the information contained in a compound document.
The degree of unobtrusiveness of new technology, concepts, or new applications of existing technology, is what ultimately determines eventual uptake and meme virulence (network effects). For a while, the Semantic Web meme was mired in confusion and general misunderstanding due to a shortage of practical use case scenario demos.
The emergence of the SPARQL Query Language has provided critical infrastructure for a number of products, projects, and demos, that now make the utility of the Semantic Web vision mush clearly via the simplicity of Linked Data, as exemplified by the following:
Daniel Lewis has just published a nice blog post titled: The Data Space Philosophy, that puts the underlying Data Space concept in perspective.
The Linked Data Web is a Giant Global Graph of Data Spaces (meshes of data and identity exposed by graphs connecting data and identity)
Data Portability ultimately depends on platforms that provide unobtrusive generation of Linked Data (for data referencing) alongside support for a plethora of industry standard data formats -- which is what OpenLink Data Spaces has been about for a very long time :-)
If you want to explore who I know, what I read, and what I've tagged (amongst other things), all you have to do is:
Some Tools that help you comprehend what I am saying:
Senator Barack Obama is a beacon of change within the democratic party while Senator Hillary Clinton is status quo.
According to the data in the GovtTrack.us data space:
Senator Barack Obama is a rank-and-file Democrat according to GovTrack's analysis of his track record in congress. Whereas, Senator Hillary Clinton is a radical democrat, according to the same Govt. Track analysis of her track record in congress.
Who do we believe? The GovtTrack.us performance data, old media pundits, or postulations of the candidates? GovtTrack.us is a new approach to candidate vetting. It provides data in traditional Document Web and Linked Data Web forms, placing analytic power in the hands of the citizen.
Here are insights into the track records of Senators Hillary Clinton and Barack Obama via the Zitgist Linked Data Viewer:
Note: I am not aligned to any political party or candidate, this is just a demonstration of Linked Data that has a high degree of poignancy relative to US primary elections etc..
]]>So, unlike Scoble, I am able to make my Facebook Data portable without violating Facebook rules (no data caching outside Facebook realm) by doing the following:
In a nutshell, my Linked Data Space enables you to reference data in my data space via Object Identifiers (URIs), and some cases the Object IDs and Graphs are constructed on the fly via RDFization middleware.
Here are my URIs that provide different paths to my Facebook Data Space:
To conclude, 2008 is clearly the inflection year during which we will final unshackle Data and Identity from the confines of "Web Data Silos" by leveraging the HTTP, SPARQL, and RDF induced virtues of Linked Data.
Related Posts:
Introducing a new preloaded and preconfigured Virtuoso (Cluster Edition) AMI for the Amazon EC2 Cloud that hosts combined Linked Datasets from:
Predictably instantiate a powerful database with high quality data and cross links within minutes, for personal or service specific use.
Simply follow the instructions in our Amazon EC2 guide for the BBC + DBpedia 3.6 Linked Dataset guide.
Your installation steps are as follows:
The DBpedia + BBC Combo Linked Dataset is a preconfigured Virtuoso Cluster (4 Virtuoso Cluster Nodes, each comprised of one Virtuoso Instance; initial deployment is to a single Cluster Host, but license may be converted for physically distributed deployment), available via the Amazon EC2 Cloud, preloaded with the following datasets:
The BBC has been publishing Linked Data from its Web Data Space for a number of years. In line with best practices for injecting Linked Data into the World Wide Web (Web), the BBC datasets are interlinked with other datasets such as DBpedia and MusicBrainz.
Typical follow-your-nose exploration using a Web Browser (or even via sophisticated SPARQL query crawls) isn't always practical once you get past the initial euphoria that comes from comprehending the Linked Data concept. As your queries get more complex, the overhead of remote sub-queries increases its impact, until query results take so long to return that you simply give up.
Thus, maximizing the effects of the BBC's efforts requires Linked Data that shares locality in a Web-accessible Data Space — i.e., where all Linked Data sets have been loaded into the same data store or warehouse. This holds true even when leveraging SPARQL-FED style virtualization — there's always a need to localize data as part of any marginally-decent locality-aware cost-optimization algorithm.
This DBpedia + BBC dataset, exposed via a preloaded and preconfigured Virtuoso Cluster, delivers a practical point of presence on the Web for immediate and cost-effective exploitation of Linked Data at the individual and/or service specific levels.
Download Virtuoso installer archive(s). You must deploy the Personal or Enterprise Edition; the Open Source Edition does not support Shared-Nothing Cluster Deployment.
Set key environment variables and start the OpenLink License Manager, using command (this may vary depending on your shell and install directory):
. /opt/virtuoso/virtuoso-enterprise.sh
Optional: To keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME
environment variable to a different directory, e.g.,
export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Note: You will have to adjust this setting every time you shift between this cluster setup and your single-server setup. Either may be made your environment's default through the virtuoso-enterprise.sh
and related scripts.
Set up your cluster by running the mkcluster.sh
script. Note that initial deployment of the DBpedia + BBC Combo requires a 4 node cluster, which is the default for this script.
Start the Virtuoso Cluster with this command:
virtuoso-start.sh
Stop the Virtuoso Cluster with this command:
virtuoso-stop.sh
Navigate to your installation directory.
Download the combo dataset installer script — bbc-dbpedia-install.sh
.
For best results, set the downloaded script to fully executable using this command:
chmod 755 bbc-dbpedia-install.sh
Shut down any Virtuoso instances that may be currently running.
Optional: As above, if you have decided to keep the default single-server configuration file and demo database intact, set the VIRTUOSO_HOME
environment variable appropriately, e.g.,
export VIRTUOSO_HOME=/opt/virtuoso/cluster-home/
Run the combo dataset installer script with this command:
sh bbc-dbpedia-install.sh
The combo dataset typically deploys to EC2 virtual machines in under 90 minutes; your time will vary depending on your network connection speed, machine speed, and other variables.
Once the script completes, perform the following steps:
Verify that the Virtuoso Conductor (HTTP-based Admin UI) is in place via:
http://localhost:[port]/conductor
Verify that the Virtuoso SPARQL endpoint is in place via:
http://localhost:[port]/sparql
Verify that the Precision Search & Find UI is in place via:
http://localhost:[port]/fct
Verify that the Virtuoso hosted PivotViewer is in place via:
http://localhost:[port]/PivotViewer
Looking retrospectively at any technology failure -- enterprises or industry at large -- you will eventually discover -- at the core -- messy conflation of at least one of the following:
The Internet & World Wide Web (InterWeb) are massive successes because their respective architectural cores embody the critical separation outlined above.
The Web of Linked Data is going to become a global reality, and massive success, because it leverages inherently sound architecture -- bar conflationary distractions of RDF. :-)
]]>The problems typically take the following form:
To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.
Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.
Many options allow you to easily and quickly generate RDF data from other data sources:
Install the Faceted Browser VAD package (fct_dav.vad
) which delivers the following:
Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --
http://<cname>[:<port>]/describe/?uri=<entity-uri>
<cname>[:<port>]
gets replaced by the host and port of your Virtuoso instance<entity-uri>
gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.
Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).
It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.
Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.
A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.
As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.
OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).
The steps are as follows:
In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.
It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.
In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.
Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:
Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.
A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.
As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.
OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).
The steps are as follows:
In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist without a fully populated SPARQL compliant Quad Store. Last but not least, it doesn't exist if you have a fully loaded SPARQL compliant Quad Store isn't up to the cocktail of challenges (query load and complexity) presented by live Web database accessibility.
It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.
In the most basic sense, simply browse the HTML based resource decriptor pages en route to discovering erstwhile undiscovered relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records), comprised of HTTP URIs from both realms e.g., via owl:sameAs relations.
Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:
As the "Linked Data" meme has gained momentum you've more than likely been on the receiving end of dialog with Linked Open Data community members (myself included) that goes something like this:
"Do you have a URI", "Get yourself a URI", "Give me a de-referencable URI" etc..
And each time, you respond with a URL -- which to the best of your Web knowledge is a bona fide URI. But to your utter confusion you are told: Nah! You gave me a Document URI instead of the URI of a real-world thing or object etc..
Well our everyday use of the Web is an unfortunate conflation of two distinct things, which have Identity: Real World Objects (RWOs) & Address/Location of Documents (Information bearing Resources).
The "Linked Data" meme is about enhancing the Web by unobtrusively reintroducing its core essence: the generic HTTP URI, a vital piece of Web Architecture DNA. Basically, its about so realizing the full capabilities of the Web as a platform for Open Data Identification, Definition, Access, Storage, Representation, Presentation, and Integration.
People, Places, Music, Books, Cars, Ideas, Emotions etc..
A Uniform Resource Identifier. A global identifier mechanism for network addressable data items. Its sole function is Name oriented Identification.
The constituent parts of a URI (from URI Generic Syntax RFC) are depicted below:
A location oriented HTTP scheme based URI. The HTTP scheme introduces a powerful and inherent duality that delivers:
So far so good!
The kind of URI Linked Data aficionados mean when they use the term: URI.
An HTTP URI is an HTTP scheme based URI. Unlike a URL, this kind of HTTP scheme URI is devoid of any Web Location orientation or specificity. Thus, Its inherent duality provides a more powerful level of abstraction. Hence, you can use this form of URI to assign Names/Identifiers to Real World Objects (RWO). Even better, courtesy of the Identity/Address duality of the HTTP scheme, a single URI can deliver the following:
Data about Data. Put differently, data that describes other data in a structured manner.
The predominant model for metadata is the Entity-Attribute-Value + Classes & Relationships model (EAV/CR). A model that's been with us since the inception of modern computing (long before the Web).
The Resource Description Framework (RDF) is a framework for describing Web addressable resources. In a nutshell, its a framework for adding Metadata bearing Information Resources to the current Web. Its comprised of:
The ubiquitous use of the Web is primarily focused on a Linked Mesh of Information bearing Documents. URLs rather than generic HTTP URIs are the prime mechanism for Web tapestry; basically, we use URLs to conduct Information -- which is inherently subjective -- instead of using HTTP URIs to conduct "Raw Data" -- which is inherently objective.
Note: Information is "data in context", it isn't the same thing as "Raw Data". Thus, if we can link to Information via the Web, why shouldn't we be able to do the same for "Raw Data"?
The meme simply provides a set of guidelines (best practices) for producing Web architecture friendly metadata. Meaning: when producing EAV/CR model based metadata, endow Subjects, their Attributes, and Attribute Values (optionally) with HTTP URIs. By doing so, a new level of Link Abstraction on the Web is possible i.e., "Data Item to Data Item" level links (aka hyperdata links). Even better, when you de-reference a RWO hyperdata link you end up with a negotiated representations of its metadata.
Linked Data is ultimately about an HTTP URI for each item in the Data Organization Hierarchy :-)
Today, the we put stuff on the Web because we want it do be discovered as part of a "sharing act". Likewise, we make regular use of Search Engine Services because we want to "Find" stuff in a productive manner.
Putting, the above in context, you don't need to be Einstein to figure out that to date the Web hasn't enabled vendors to describe their products and services clearly. Likewise, it hasn't enabled us to describe what we want, when we want it, and how much we are willing to pay etc. Basically, the SDQ of Web Content is excruciatingly low!
The Linked Data meme is about using the essence of the Web -- HTTP URIs -- as the mechanism for conducting data across the Web that unambiguously unveils basic things like:
A Web of Linked Data enables a complete redefinition of eCommerce, and that's just for starters :-)
At the current time we have loaded 100% of all the very large data sets from the LOD Cloud. As result, we can start the process of exposing Linked Data virtues in a manner that's palatable to users, developers, and database professionals across the Web 1.0, 2.0, and 3.0 spectrums.
You can use the "Search & Find" or"URI Lookup" or SPARQL endpoint associated with the LOD cloud hosting instance to perform the following tasks:
If you don't want to use the SPARQL based Web Service, or other Linked Data Web oriented APIs for interacting with the LOD cloud programmatically, you can simply use the powerful REST style Web Service that provides URL parameters for performing full text oriented "Search", entity oriented "Find" queries, and faceted navigation over the huge data corpus with results data returned in JSON and XML formats.
Amazon have agreed to add all the LOD Cloud data sets to their existing public data sets collective. Thus, the data sets we are loading will be available in "raw data" (RDF) format on the public data sets page via Named Elastic Block Storage (EBS) Snapshots); meaning, you can make an EC2 AMI (e.g. a Linux, Windows, Solaris) and install an RDF quad or triple store of choice into your AMI, then simply load data from the LOD cloud based on your needs.
In addition to the above, we are also going to offer a Virtuoso 6.0 Cluster Edition based LOD Cloud AMI (as we've already done with DBpedia, MusicBrainz, NeuroCommons, and Bio2Rdf) that will enable you to simply instantiate a personal and service specific edition of Virtuoso with all the LOD data in place and fully tuned for performance and scalability; basically, you will simply press "Instantiate AMI" and a LOD cloud data space, in true Linked Data from, will be at your disposal within minutes (i.e. the time it takes the DB to start).
Work on the migration of the LOD data to EC2 starts this week. Thus, if you are interested in contributing an RDF based data set to the LOD cloud now is the time to get your archive links in place on the (see: ESW Wiki page for LOD Data Sets).
]]>Jason:
Scoble is sensing what comes next, but in my opinion, describes it using an old obtrusive advertising model anecdote.
I've penned a post or two about the "Magic of You" which is all about the new Web power broker (Entity: "You").
Personally, I've long envisaged a complete overhaul of advertising where obtrusive advertising simply withers away; ultimately replaced by an unobtrusive model that is driven by individualized relevance and high doses of serendipity. Basically, this is ultimately about "taking the Ad out of item placement in Web pages".
The fundamental ingredients of an unobtrusive advertising landscape would include the following Human facts:
Ideally, we would like to be able to simply state the following, via a Web accessible profile:
Now put the above into the context of an evolving Web where data items are becoming more visible by the second, courtesy of the "Linked Data" meme. Thus, things that weren't discernable via the Web: "People", "Places", "Music", "Books", "Products", etc., become much easier to identify and describe.
Assuming the comments above hold true re. the Web's evolution into a collection of Linked Data Spaces, and the following occur:
Wish-Lists and Offer-Lists will gradually start bonding with increasing degrees of serendipity courtesy of exponential growth in Linked Data Web density.
So based on what I've stated so far, Scoble would simply browse the Web or visit his profile page, and in either scenario enjoy a "minority report" style of experience albeit all under his control (since he is the one driving his Web user agent).
What I describe above simply comes down to "Wish-lists" and associated recommendations becoming the norm outside the confines of Amazon's data space on the Web. Serendipitous discovery, intelligent lookups, and linkages are going to be the fundamental essence of Linked Data Web oriented applications, services, agents.
Beyond Scoble, it's also important to note that access to data will be controlled by entity "You". Your data space on the Web will be something you will controll access to in a myriad of ways, and it will include the option to provide licensed access to commercial entities on your terms. Naturally, you will also determine the currency that facilitates the value exchange :-)
Enter search pattern: Microsoft
You will get the usual result from a full text pattern search i.e., hits and text excerpts with matching patterns in boldface. This first step is akin to throwing your net out to sea while fishing.
Now you have your catch, what next? Basically, this is where traditional text search value ends since regex or xpath/xquery offer little when the structure of literal text is the key to filtering or categorization based analysis of real-world entities. Naturally, this is where the value of structured querying of linked data starts, as you seek to use entity descriptions (combination of attribute and relationship properties) to "Find relevant things".
Continuing with the demo.
Click on "Properties" link within the Navigation section of the browser page which results in a distillation and aggregation of the properties of the entities associated with the search results. Then use the "Next" link to page through the properties until to find the properties that best match what you seek. Note, this particular step is akin to using the properties of the catch (using fishing analogy) for query filtering, with each subsequent property link click narrowing your selection further.
Using property based filtering is just one perspective on the data corpus associated with the text search pattern; thus, you can alter perspectives by clicking on the "Class" link so that you can filter you search results by entity type. Of course, in a number of scenarios you would use a combination of entity types and entity properties filters to locate the entities of interest to you.
In 2009 I hope the following happens re. "Linked Data":
2009 is about a reboot on a monumental scale. We need new thinking, new technology, new approaches, and new solutions. No matter what route we take, we can't negate the importance of "Data". When dealing with organic or inorganic computers systems -- Data is simply everything!
The ability of individuals and enterprises to access, mesh, and disseminate data to relevant nodes across public and private networks will ultimately determine the winners and losers in the new frontier, ushered in by 2009.
Do not take data access and data management technology for granted. User interfaces come and ago, application logic comes and goes, but your data stays with you forever. If you are mystified by data access technology then make 2009 the year of data access technology demystification :-)
A community developed knowledgebase comprised of Bio Informatics data from across 30 or so public data sources. The standard deployment of Bio2Rdf includes a a federation of SPARQL endpoints provided by project members and collaborators.
An Amazon EC2 hosted variant of the Bio2Rdf knowledgebase. In addition to providing a SPARQL endpoint, the data exposed by the Amazon AMI is published in compliance with Linked Data publishing best practices espoused by the Linking Open Data community (LOD).
The ability to instantiate a personal or service-specific variant of this powerful knowledgebase via the Amazon EC2 Cloud. Instead of a 22+ hour error prone odyssey - you simply get down to the task of data analysis and integration within 1.5 hrs (when setting up you AMI for the first time).
"Only one improves with age. With apologies to the originator of the phrase - “Hardware is like fish, operating systems are like wine.”
Yes! Applications are like Fish and Data like Wine, which is basically what Linked Data is fundamentally about, especially when you inject memes such as "Cool URIs" into the mix. Remember, the essence of Linked Data is all about a Web of Linked Data Objects endowed with Identifiers that don't change i.e., they occupy one place in public (e.g. World Wide Web) or private (your corporate Intranet or Extranet) networks, keeping the data that they expose relevant (as in fresh), accessible, and usable in many forms courtesy of the data access & representation dexterity that HTTP facilitates, when incorporated into object identifiers.
Here is another excerpt from his post that rings true (amongst many others):
What am I talking about? Processes change, and need to change. Baking data into the application is a bad idea because the data can’t then be extended in useful, and “unexpected ways”. But not expecting corporate data to be used in new ways is kind of like not expecting the Spanish Inquisition. But… “NOBODY expects the Spanish Inquisition! Amongst our weaponry are such diverse elements as: fear, surprise, ruthless efficiency, an almost fanatical devotion to the Pope.” (sounds like Enterprise Architecture ...).
A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.
From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:
From a Middleware perspective it provides:
From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering
From the general System Administrator's perspective it provides:
Higher level user oriented offerings include:
For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:
If we could just take "The Semantic Web" moniker for what it was -- a code name for an aspect of the Web -- and move on, things will get much clearer, fast!
Basically, what is/was the "Semantic Web" should really have been code named: ("You" Oriented Data Access) as a play on: Yoda's appreciation of the FORCE (Fact ORiented Connected Entities) -- the power of inter galactic, interlinked, structured data, fashioned by the World Wide Web courtesy of the HTTP protocol.
As stated in a earlier post, the next phase of the Web is all about the magic of entity "You". The single most important item of reference to every Web user would be the Person Entity ID (URI). Just by remembering your Entity ID, you will have intelligent pathways across, and into, the FORCE that the Linked Data Web delivers. The quality of the pathways and increased density of the FORCE are the keys to high SDQ (tomorrows SEO). Thus, the SDQ of URIs will ultimately be the unit determinant of value to Web Users, along the following personal lines, hence the critical platform questions:
While most industry commentators continue to ponder and pontificate about what "The Semantic Web" is (unfortunately), the real thing (the "FORCE") is already here, and self-enhancing rapidly.
Assuming we now accept the FORCE is simply an RDF based Linked Data moniker, and that RDF Linked Data is all about the Web as a structured database, we should start to move our attention over to practical exploitation of this burgeoning global database, and in doing so we should not discard knowledge from the past such as the many great examples available gratis from the Relational Database realm. For instance, we should start paying attention to the discovery, development, and deployment of high level tools such as query builders, report writers, and intelligence oriented analytic tools, none of which should -- at first point of interaction -- expose raw RDF or the SPARQL query language. Along similar lines of thinking, we also need development environments and frameworks that are counterparts to Visual Studio, ACCESS, File Maker, and the like.
Like Apache, Virtuoso is a bona-fide Web Application Server for PHP based applications. Unlike Apache, Virtuoso is also the following:
As result of the above, when you deploy a PHP application using Virtuoso, you inherit the following benefits:
As indicated in prior posts, producing RDF Linked Data from the existing Web, where a lot of content is deployed by PHP based content managers, should simply come down to RDF Views over the SQL Schemas and deployment / publishing of the RDF Views in RDF Linked data form. In a nutshell, this is what Virtuoso delivers via its PHP runtime hosting and pre packaged VADs (Virtuoso Application Distribution packages), for popular PHP based applications such as: phpBB3, Drupal, WordPress, and MediaWiki.
In addition, to the RDF Linked Data deployment, we've also taken the traditional LAMP installation tedium out of the typical PHP application deployment process. For instance, you don't have to rebuild PHP 3.5 (32 or 64 Bit) on Windows, Mac OS X, or Linux to get going, simply install Virtuoso, and then select a VAD package for the relevant application and you're set. If the application of choice isn't pre packaged by us, simply install as you would when using Apache, which comes dow to situating the PHP files in your Web structure under the Web Application's root directory.
At the current time, I've only provided links to ZIP files containing the Virtuoso installation "silent movies". This approach is a short-term solution to some of my current movie publishing challenges re. YouTube and Vimeo -- where the compressed output hasn't been of acceptable visual quality. Once resolved, I will publish much more "Multimedia Web" friendly movies :-)
]]>Our diagram depicts the myriad of data sources from which RDF Linked Data is generated "on the fly" via our data source specific RDF-zation cartridges/drivers. It also unveils how the sponger leverages the Linked Data constellations of UMBEL, DBpedia, Bio2Rdf, and others for lookups.
]]>Ubiquity from Mozilla Labs, provides an alternative entry point for experiencing the "Controller" aspect of the Web's natural compatibility with the MVC development pattern. As I've noted (in various posts) Web Services, as practiced by the REST oriented Web 2.0 community or SOAP oriented SOA community within the enterprise, is fundamentally about the ("Controller" aspect of MVC.
Ubiquity provides a commandline interface for direct invocation of Web Services. For instance, in our case, we can expose the Virtuoso's in-built RDF Middleware ("Sponger") and Linked Data deployment services via a single command of the form: describe-resource <url>
To experience this neat addition to Firefox you need to do the following:
Enjoy!
]]>As per usual, this is part post and part Linked Data demo. This time around, I am showcasing Proxy/Wrapper based dereferencable URIs and a new "Page Description" feature that showcases the capabilities of Virtuoso's in-built RDFization Middleware. Also note, the resource descriptions (RDF) are presented using an HTML page.
]]>Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk's coordinator, explains, "The basic problem that we all face nowadays is how to handle vast amounts of information at a sensible rate." According to Bernardi, Nepomuk takes a traditional approach by creating a meta-data layer with well-defined elements that services can be built upon to create and manipulate the information.
The comment above echoes my sentiments about the imminence of "information overload" due to the vast amounts of user generated content on the Internet as a whole. We are going to need to process more an more data within a fixed 24 hour timeframe, while attempting to balance our professional and personal lives. Be rest assured, this is a very serious issue, and you cannot event begin to address it without a Web of Linked Data.
"The first idea of building the semantic desktop arose from the fact that one of our colleagues could not remember the girlfriends of his friends," Bernard says, more than half-seriously. "Because they kept changing -- you know how it is. The point is, you have a vast amount of information on your desktop, hidden in files, hidden in emails, hidden in the names and structures of your folders. Nepomuk gives a standard way to handle such information."
If you get a personal URI for Entity "You", via a Linked Data aware platform (e.g. OpenLink Data Spaces) that virtualizes data across your existing Web data spaces (blogs, feed subscriptions, wikis, shared bookmarks, photo galleries, calendars, etc.), you then only have to remember your URI whenever you need to "Find" something, imagine that!
To conclude, "information overload" is the imminent challenge of our time, and the keys to challenge alleviation lie in our ability to construct and maintain (via solutions) few context lenses (URIs) that provide coherent conduits into the dense mesh of structured Linked Data on the Web.
]]>CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).
CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.
CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.
CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.
CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.
CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).
Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.
CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.
Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:
]]>When tagging a document, the semantic tagging service passes the content of a target document through a processing pipeline (a distillation process of sorts) that results in automagic extraction of the following:
Once the extraction phase is completed, a user is presented with a list of "suggested tags" using a variety of user interaction techniques. The literal values of elected Tags are then associated with one or more Tag and Tag Meaning Data Objects, with each Object type endowed with a unique Identifier.
Broad acceptance that: "Context is king", is gradually taking shape. That said, "Context" landlocked within Literal values offers little over what we have right now (e.g. at Del.icio.us or Technorati), long term. By this I mean: if the end product of semantically enhanced tagging leaves us with: Literal Tag values only, Tags associated with Tag Data Objects endowed with platform specific Identifiers, or Tag Data Objects with any other Identity scheme that excludes HTTP, the ability of Web users to discern or derive multiple perspectives from the base Context (exposed by semantically enhanced Tags) will be lost, or severely impeded at best.
The shape, form, and quality of the lookup substrate that underlies semantic tagging services, ultimately affects "context fidelity" matters such as Entity Disambiguation. The importance of quality lookup infrastructure on the burgeoning Linked Data Web is the reason why OpenLink Software is intimately involved with the DBpedia and UMBEL projects.
I am immensely happy to see that the Web 2.0 and Semantic Web communities are beginning to coalesce around the issue of "Context". This was the case at the WWW2008 Linked Data Workshop, I am feeling a similar vibe emerging from the Semantic Web Technologies conference currently nearing completion in San Jose. Of course, I will be talking about, and demonstrating practical utility of all of this, at the upcoming Linked Data Planet conference.
ODBC identifies data sources using Data Source Names (DSNs).
WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.
ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.
WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).
ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).
WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!
So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!
Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.
URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.
I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.
The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.
By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)
]]>In today's primarily Document centric Web, the pursuit of Context is akin to pursuing a mirage in a desert of user generated content. The quest is labor intensive, and you ultimaely end up without water at the end of the pursuit :-)
Listening to the Christine Connor's podcast interview with Talis simply reinforces my strong belief that "Context, Context, Context" is the Semantic Web's equivalent of Real Estate's "Location, Location, Location" (ignore the subprime loans mess for now). The critical thing to note is that you cannot unravel "Context" from existing Web content without incorporating powerful disambiguation technology into an "Entity Extraction" process. Of course, you cannot even consider seriously pursing any entity extraction and disambiguation endeavor without a lookup backbone that exposes "Named Entities" and their relationships to "Subject matter Concepts" (BTW - this is what UMBEL is all about). Thus, when looking at the broad subject of the Semantic Web, we can also look at "Context" as the vital point of confluence for the Data oriented (Linked Data) and the "Linguistic Meaning" oriented perspectives.
I am even inclined to state publicly that "Context" may ultimately be the foundation for 4th "Web Interaction Dimension" where practical use of AI leverages a Linked Data Web substrate en route to exposing new kinds of value :-)
"Context" may also be the focal point of concise value proposition articulation to VCs as in: "My solution offers the ability to discover and exploit "Context" iteratively, at the rate of $X.XX per iteration, across a variety of market segments :-)
]]>Daniel simplifies my post by using diagrams to depict the different paths for PHP based applications exposing Linked Data - especially those that already provide a significant amount of the content that drives Web 2.0.
If all the content in Web 2.0 information resources are distillable into discrete data objects endowed with HTTP based IDs (URIs), with zero "RDF handcrafting Tax", what do we end up with? A Giant Global Graph of Linked Data; the Web as a Database.
So, what used to apply exclusively, within enterprise settings re. Oracle, DB2, Informix, Ingres, Sybase, Microsoft SQL Server, MySQL, PostrgeSQL, Progress Open Edge, Firebird, and others, now applies to the Web. The Web becomes the "Distributed Database Bus" that connects database records across disparate databases (or Data Spaces). These databases manage and expose records that are remotely accessible "by reference" via HTTP.
As I've stated at every opportunity in the past, Web 2.0 is the greatest thing that every happened to the Semantic Web vision :-) Without the "Web 2.0 Data Silo Conundrum" we wouldn't have the cry for "Data Portability" that brings a lot of clarity to some fundamental Web 2.0 limitations that end-users ultimately find unacceptable.
In the late '80s, the SQL Access Group (now part of X/Open) addressed a similar problem with RDBMS silos within the enterprise that lead to the SAG CLI which is exists today as Open Database Connectivity.
In a sense we now have WODBC (Web Open Database Connectivity), comprised of Web Services based CLIs and/or traditional back-end DBMS CLIs (ODBC, JDBC, ADO.NET, OLE-DB, or Native), Query Language (SPARQL Query Language), and a Wire Protocol (HTTP based SPARQL Protocol) delivering Web infrastructure equivalents of SQL and RDA, but much better, and with much broader scope for delivering profound value due to the Web's inherent openness. Today's PHP, Python, Ruby, Tcl, Perl, ASP.NET developer is the enterprise 4GL developer of yore, without enterprise confinement. We could even be talking about 5GL development once the Linked Data interaction is meshed with dynamic languages (delivering higher levels of abstraction at the language and data interaction levels). Even the underlying schemas and basic design will evolve from Closed World (solely) to a mesh of Closed & Open World view schemas.
]]>The list is nice, but actual execution can be challenging. For instance, when writing a blog post, or constructing a WikiWord, would you have enough disposable time to go searching for these URIs? Or would you compromise and continue to inject "Literal" values into the Web, leaving it to the reasoning endowed human reader to connect the dots?
Anyway, OpenLink Data Spaces is now equipped with a Glossary system that allows me to manage terms, meaning of terms, and hyper-linking of phrases and words matching associated with my terms. The great thing about all of this is that everything I do is scoped to my Data Space (my universe of discourse), I don't break or impede the other meanings of these terms outside my Data Space. The Glossary system can be shared with anyone I choose to share it with, and even better, it makes my upstreaming (rules based replication) style of blogging even more productive :-)
Remember, on the Linked Data Web, who you know doesn't matter as much as what your are connected to, directly or indirectly. Jason Kolb covers this issue in his post: People as Data Connectors, and so doesFrederick Giasson via a recent post titled: Networks are everywhere. For instance, this blog post (or the entire Blog) is a bona fide RDF Linked Data Source, you can use it as the Data Source of a SPARQL Query to find things that aren't even mentioned in this post, since all you are doing is beaming a query through my Data Space (a container of Linked Data Graphs). On that note, let's re-watch Jon Udell's "On-Demand-Blogosphere" screencast from 2006 :-)
]]>The aforementioned qualification is increasingly necessary for the following reasons:
The terms GGG, Linked Data, Data Web, Web of Data, and Web 3.0 (when I use this term) all imply URI driven Open Data Access for the Web Database (maybe call this ODBC for the Web) -- ability to point to records across data spaces without any adverse effect to the remote data spaces. It's really important to note that none of the aforementioned terms have nothing to do with the "Linguistic Meaning of blurb". Building a smarter document exposed via a URL without exposing descriptive data links doesn't provide open access to information data sources.
As human beings we are all endowed with reasoning capability. But we can't reason without access to data. Dearth of openly accessible structured data is the source of many ills in cyberspace and across society in general. Today we still have Subjectivity reigning over Objectivity due to the prohibitive costs of open data access.
We can't cost-effectively pursue objectivity without cost-effective infrastructure for creating alternative views of the data behind information sources (e.g. Web Pages). More Objectivity and less Subjectivity is what the next Web Frontier is about. At OpenLink we simply use the moniker: Analysis for All! Everyone becomes a data analyst in some form, and even better, the analysis are easily accessible to anyone connected to the Web. Of course, you will be able to share special analysis with your private network of friends and family, or if you so choose, not at all :-)
Recap, it's important to note that Linked Data is the foundation layer of the Semantic Web vision. It's not only facilitates open data access, it also enables data integration (Meshing as opposed to Mashing) across disparate data schemas
As demonstrated by DBpedia and the Linked Data Solar system emerging around it, if you URI everything, then everything is Cool.
Linked Data and Information Silos are mutually exclusive concepts. Thus, you cannot produce a web accessible Information Silo and then refer to it as "Semantic Web" technology. Of course, it might be very Semantic, but it's fundamentally devoid of critical "Semantic Web" essence (DNA).
My acid test for any Semantic Web solution is simply this (using a Web User Agent or Client):
Here is the Acid test against my Data Space:
The goal of this effort is standardization of approaches (syntax and methodology) for mapping Relational Data Model instance data to RDF (Graph Data Model).
Every record in a relational table/view/stored procedure (Table Valued Functions/Procedures) is declaratively morphed into an Entity (instance of a Class associated with a Schema/Ontology). The derived entities become part of a graph that exposes relationships and relationship traversal paths that have lower JOIN Costs than attempting the same thing directly via SQL. In a nutshell, you end up with a conceptual interface atop a logical data layer that enables a much more productive mechanism for exploring homogeneous and/or heterogeneous data without confinement at the DB instance, SQL DBMS type, host operating system, local area network, or wide area network levels.
Just as we have to mesh the Linked Data and Document Webs, unobtrusively. It's also important that the same principles to apply to exposure of RDBMS hosted data as RDF based Linked Data.
We all know that a large amount of data driving the IT engines of most enterprises resides in Relational Databases. And contrary to recent RDBMS vs RDF database misunderstandings espoused (hopefully inadvertently) by some commentators, Relational Database engines aren't going away anytime soon. Meshing Relational (logical) and Graph (conceptual) data models a natural progression along an evolutionary path towards: Analysis for All. By the way, there is a parallel evolution occurring in others realms such as Microsoft's ADO.NET's Entity Framework.
To Unobtrusively expose existing data sources as RDF Linked Data. The links that follow provide examples:
BTW - Benjamin Nowack penned an interesting post titled: Semantic Web Aliases, that covers a variety of labels used to describe the Semantic Web. The great thing about this post is that it provides yet another demonstration-in-the-making for the virtues of Linked Data :-)
Labels are harmless when their sole purpose is the creation of routes of comprehension for concepts. Unfortunately, Labels aren't always constructed with concept comprehension in mind, most of the time they are artificial inflectors and deflectors servicing marketing communications goals.
Anyway, irrespective of actual intent, I've endowed all of the labels from Bengee's post with URIs as my contribution important disambiguation effort re. the Semantic Web:
As per usual this post is best appreciated when processed via an Linked Data aware user agent.
]]>A query language for the burgeoning Structured & Linked Data Web (aka Semantic Web / Giant Global Graph). Like SQL, for the Relational Data Model, it provides a query language for the Graph based RDF Data Model.
It's also a REST or SOAP based Web Service that exposes SPARQL access to RDF Data via an endpoint.
In addition, it's also a Query Results Serialization format that includes XML and JSON support.
It brings important clarity to the notion of the "Web as a Database" by transforming existing Web Sites, Portals, and Web Services into bona fide corpus of Mesh-able (rather than Mash-able) Data Sources. For instance, you can perform queries that join one or more of the aforementioned data sources in exactly the same manner (albeit different syntax) as you would one or more SQL Tables.
-- SPARQL equivalent of SQL SELECT * against my personal data space hosted FOAF file
SELECT DISTINCT ?s ?p ?o FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s ?p ?o}
-- SPARQL against my social network -- Note: My SPARQL will be beamed across all of contacts in the social networks of my contacts as long as they are all HTTP URI based within each data space
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?Person FROM <http://myopenlink.net/dataspace/person/kidehen> WHERE {?s a foaf:Person; foaf:knows ?Person}
Note: you can use the basic SPARQL Endpoint, SPARQL Query By Example, or SPARQL Query Builder Demo tool to experiment with the demonstration queries above.
SPARQL is implemented by RDF Data Management Systems (Triple or Quad Stores) just as SQL is implemented by Relational Database Management Systems. The aforementioned data management systems will typically expose SPARQL access via a SPARQL endpoint.
A SPARQL implementors Testimonial page accompanies the SPARQL press release. In addition the is a growing collection of implementations on the ESW Wiki Page for SPARQL compliant RDF Triple & Quad Stores.
Yes! SPARQL facilitates an unobtrusive manifestation of a Linked Data Web by way of natural extension of the existing Document Web i.e these Web enclaves co-exist in symbiotic fashion.
As DBpedia very clearly demonstrates, Linked Data makes the Semantic Web demonstrable and much easier to comprehend. Without SPARQL there would be no mechanism for Linked Data deployment, and without Linked Data there is no mechanism for Beaming Queries (directly or indirectly) across the Giant Global Graph of data hosted by Social Networks, Shard Bookmarks Services, Weblogs, Wikis, RSS/Atom/OPML feeds, Photo Galleries and other Web accessible Data Sources (Data Spaces).
Download Lnks:
My Comments:
Hyperdata is short for HyperLinked Data :-) The same applies to Linked Data. Thus, we have two literal labels for the same core Concept. HTTP is the enabling protocol for "Hyper-linking" Documents and associated Structured Data via the World Wide Web (Web for short). Data Links associated with Structured Data contained in, or hosted by, Documents on the Web.
RDFa, eRDF, GRDDL, SPARQL Query Language, SPARQL Protocol (SOAP or REST service), SPARQL Results Serializations (XML or JSON) collectively provide a myriad of unobtrusive routes to structured data embedded within, or associated with, existing Web Documents.
As Danny already states, ontologies are not prerequisites for producing structured data using the RDF Data Model. They simply aid the ability to express one's self clearly (i.e. no repetition or ambiguity) across a broad audience of machines (directly) and their human masters (indirectly).
Using the crux of this post as the anecdote: The Semantic Data Web would simplify the process of claiming and/or proving that Linked Data and Hyperdata describe the same concept. It achieves this by using Triples (Subject, Predicate, Object) expressed in various forms (N3, Turtle, RDF/XML etc.) to formalize claims in a form palatable to electronic agents (machines) operating on behalf of Humans. In a nutshell, this increases human productive by completely obliterates the erstwhile exponential costs of discovering data, information, and knowledge.
BTW - for full effect, view this post (i.e. cut and paste the Permalink URI of this post, below) into an RDF Browser such as:
This article, like the one from Mike, and our soon to be released Linked Data Deployment white paper, collectively address the main topic without inadvertent distraction by the misnomer: non-information resource. For instance, the OAI article uses the term: Generic Resource instead of Non-informaton Resource.
The Semantic Data Web is here, but we need to diffuse this reality across a broader spectrum of Web communities, so as to avoid unnecessary uptake inertia that can arise due basic incomprehension of key concepts such as Linked Data deployment.
]]>In my "Linked Data & Web Information BUS" post (plus a few LOD mailing list posts), I had the delight and displeasure (on the brain primarily) of attempting to get terminology right with regards to Information- and Non-Information Web Resources. I eventually settled for Data Sources instead of the simpler and more obvious term: Data Resources :-)
Thus, I redefine the URIs from earlier past as follows:
Thanks to today's internet connectivity, it took a simple Skype ping from Mike Bergman, and a 30 minute (or so) session that followed for us to arrive at "Data Resource" as a clearer term for Non Information Resources.
Mike has promised to write a detailed post covering our Linked Data and the Structured Web terminology meshing odyssey.
]]>Of course, this also enables me to provide yet another Semantic Data Web demo in the form of additional viewing perspectives for the aforementioned FAQ (just click to see):
Lee also embarked on a similar embellishment effort re. the SPARQL Query Language FAQ thereby enabling me to also offer alternative viewing perspectives along similar lines:
]]>Well, I'll have a crack at helping him out i.e. defining the Semantic Data Web in simple terms with linked examples :-)
Tip: Watch the recent TimBL video interview re. the Semantic Data Web before, during, or after reading this post.
Here goes!
The popular Web is a "Web of Documents". The Semantic Data Web is a "Web of Data". Going down a level, the popular web connects documents across the web via hyperlinks. The Semantic Data Web connects data on the web via hyperlinks. Next level, hyperlinks on the popular web have no inherent meaning (lack context beyond: "there is another document"). Hyperlinks on the Semantic Data Web have inherent meaning (they possess context: "there is a Book" or "there is a Person" or "this is a piece of Music" etc..).
Very simple example:
Click the traditional web document URLs for Dan Connolly and Tim Berners-Lee. Then attempt to discern how they are connected. Of course you will see some obvious connections by reading the text, but you won't easily discern other data driven connections. Basically, this is no different to reading about either individual in a print journal, bar the ability to click on hyperlinks that open up other pages. The Data Extraction process remains labour intensive :-(
Repeat the exercise using the traditional web document URLs as Data Web URIs, this time around, paste the hyperlinks above into an RDF aware Browser (in this case the OpenLink RDF Browser). Note, we are making a subtle but critical change i.e. the URLs are now being used as Semantic Data Web URIs (a small-big-deal kind of thing).
If you're impatient or simply strapped for time (aren't we all these days), simply take a look at these links:
Note: There are other RDF Browsers out there such as:
All of these RDF Browsers (or User Agents) demonstrate the same core concepts in subtly different ways.
If I haven't lost you, proceed to a post I wrote a few weeks ago titled: Hello Data Web (Take 3 - Feel the "RDF" Force).
If you've made it this far, simply head over to DBpedia for a lot of fun :-)
Note Re. my demos: we make use of SVG in our RDF Browser which makes them incompatible with IE (6 or 7) and Safari. That said, Firefox (1.5+), Opera 9.x, WebKit (Open Source Safari), and Camino work fine.
Note to Scoble:
All the Blogs, Wikis, Shared Bookmarks, Image Galleries, Discussion Forums and the like are Semantic Web Data Spaces. The great thing about all of this is that through RSS 2.0's wild popularity, Blogosphere has done what I postulated about a while back: The Semantic Web would be self-annotating, and so it has come to be :-)
To prove the point above: paste your blog's URL into the OpenLink RDF Browser and see it morph into a Semantic Data Web URI (a pointer to Web Data that's you've created) once you click the "Query" button (click on the TimeLine tab for full effect). The same applies to del.icio.us, Flickr, Googlebase, and basically any REST style Web Service as per my RDF Middleware post.
Lazy Semantic Web Callout:
If you're a good animator (pro or hobbyist), please produce an animation of a document going through a shredder. The strips that emerge from the shredder represent the granular data that was once the whole document. The same thing is happening on the Web right now, we are putting photocopies of (X)HTML documents through the shredder (in a good way) en route to producing granular items of data that remain connected to the original copy while developing new and valuable connections to other items of Web Data.
That's it!
]]>Geonames announced the release of its Geonames ontology v1.2. The new ontology has few enhancements. It introduced the notion of linked data and made clear distinction between URI that intended for linking documents and for linking ontology concepts.
Different types of geospatial data are of different spatial granularity. Data of different spatial granularity may relate to each other by the containment relation. For example, countries contain states, states contains cities and so on. Some geospatial data are of the similar spatial granularity (e.g., two cities that are nearby each other, or two countries that are neighboring each other). To support the knowledge representation of these relationships, the ontology introduced three new properties: childreanFeatures, nearbyFeatures and neighbouringFeatures.
In the Semantic Web, both ontology concepts and physical web documents are linked by URI. Sometimes in applications, it’s useful to make clear whether the use of a URI is intended for linking documents or for linking ontology concepts. The new Geonames ontology introduced a URI convention for identifying the intended usage of a URI. This convention also simplifies the discovering of geospatial data using Geonames web services.
Here is an example:
Other interesting ontology properties include wikipediaArticle and locationMap. The former links a Feature instance to a Web article on Wikipedia, and the latter links a Feature instance to a digital map Web page.
For additional information about Geonames ontology v1.2, see Marc’s post at the Geonames blog.
"(Via Geospatial Semantic Web Blog.)
]]>For additional clarity re. my comments above, you can also look at the SPARQL & SIOC Usecase samples document for our OpenLink Data Spaces platform. Bottom line, the Semantic Web and SPARQL aren't BORING. In fact, quite the contrary, since they are essential ingredients of a more powerful Web than the one we work with today!
Enjoy the rest of John's post:
]]>Creating connections between discussion clouds with SIOC:
(Extract from our forthcoming BlogTalk paper about browsers for SIOC.)
SIOC provides a unified vocabulary for content and interaction description: a semantic layer that can co-exist with existing discussion platforms. Using SIOC, various linkages are created between the aforementioned concepts, which allow new methods of accessing this linked data, including:
- Virtual Forums. These may be a gathering of posts or threads which are distributed across discussion platforms, for example, where a user has found posts from a number of blogs that can be associated with a particular category of interest, or an agent identifies relevant posts across a certain timeframe.
- Distributed Conversations. Trackbacks are commonly used to link blog posts to previous posts on a related topic. By creating links in both directions, not only across blogs but across all types of internet discussions, conversations can be followed regardless of what point or URI fragment a browser enters at.
- Unified Communities. Apart from creating a web page with a number of relevant links to the blogs or forums or people involved in a particular community, there is no standard way to define what makes up an online community (apart from grouping the people who are members of that community using FOAF or OPML). SIOC allows one to simply define what objects are constituent parts of a community, or to say to what community an object belongs (using sioc:has_part / part_of): users, groups, forums, blogs, etc.
- Shared Topics. Technorati (a search engine for blogs) and BoardTracker (for bulletin boards) have been leveraging the free-text tags that people associate with their posts for some time now. SIOC allows the definition of such tags (using the subject property), but also enables hierarchial or non-hierarchial topic definition of posts using sioc:topic when a topic is ambiguous or more information on a topic is required. Combining with other Semantic Web vocabularies, tags and topics can be further described using the SKOS organisation system.
- One Person, Many User Accounts. SIOC also aims to help the issue of multiple identities by allowing users to define that they hold other accounts or that their accounts belong to a particular personal identity (via foaf:holdsOnlineAccount or sioc:account_of). Therefore, all the posts or comments made by a particular person using their various associated user accounts across platforms could be identified.
In the past I have expressed views that echo the essence of John's piece. It has been pretty darn clear to me that Microsoft is struggling as a result of its inability to handle challenges associated with the metaphoric "computing vase" which it sought to own solely as a result of its proclivity for crushing and/or alienating erstwhile technology partners as part of this quest (a process that commenced a long time ago culminating the contradiction and ultimate paradox called IE7; remember not too long ago it was impossible to separate IE from Windows! It could only exist as an OS extension etc.).
Windows in its current incarnation fails to provide a productive working environment, you either have a plethora of viruses and spyware contending for you computing resources, or you have all the software in place to protect against these assaults rendering the computing resources equally busy. The computing power lag is simply too much when using windows, and this is its achilles heel!
I have been using Windows since version 2.0, and although I have always found the Mac OS variations to be superior on the UI front, I never found any of the historic versions viable alternatives. In my case, this is all about providing a productive work environment across the following usage modes, in descending order of priority:
1. Power User (OutLook, Excel, WORD, and other desktop productivity tools)
2. Product Testing and QA
3. Programmer Buddy (a Microsoft term)
4. Programming (for the most part prototyping)
The release of Mac OS X Tiger lead me down an evaluation path that I have repeated many times in the past: test the viability of moving wholesale from Windows to Mac OS X and remain functional (if really lucky, exceed existing productivity levels). This time around I found that I could actually migrate over 6 years worth of emails, contacts, presentations, documents, spreadsheets from Windows to Mac OS X. I also discovered that success extended all the way to my data linked documents that are transparently bound to back-end databases (in my case the norm rather the exception via ODBC).
I now use Mac OS X as my prime working platform (I still have to use Windows as the platform remains strategic for all our product offerings), and I am absolutely loving it! The joint feelings of euphoria and confusion that I experienced post migration were similar to how I felt after making the transition from "stick shift" to "automatic" geared cars (as I transitioned my residence from the UK to the U.S). At the time I couldn't understand why anyone (other than a grand prix driver) would ever drive a "stick shift" by choice.
Today, I can't understand why I stuck with Windows for so long at the expense of my daily working productivity. The biggest bonus from this transition is that Mac OS X has made it easier for me to engage less technical individuals (family & friends) in the sheer joy and potential of Information Technology across a variety of realms as opposed to being confined to the "business computing" realm solely. I can demonstrate the power and potential of the Internet, Web, Web Services, Blogosphere, Wikispehere, with much more sanity and coherence now that my machine responds in a timely fashion during these demos amongst other benefits.
Some may deem this windows bashing, but if they take the time to look a little deeper, this is simply about "straight shooting" from a real computer user (I like my computers to do deliver on their hugh potential promised; I don't compromise this basic expectation; my computer and associate software should save me time and ramp up my productivity!) . If Microsoft is the company that it once was, then it would simply use this kind of commentary to rally its troops and get its act together! That's what I would do if a customer felt so badly about our technology (UDA or Virtuoso).
]]>The value of the Internet as a repository of useful information is very low. Carl Shapiro in “Information Rules” suggests that the amount of actually useful information on the Internet would fit within roughly 15,000 books, which is about half the size of an average mall bookstore. To put this in perspective: there are over 5 billion unique, static & publicly accessible web pages on the www. Apparently Only 6% of web sites have educational content (Maureen Henninger, “Don’t just surf the net: Effective research strategies”. UNSW Press). Even of the educational content only a fraction is of significant informational value.
..As Stanford students, Larry Page and Sergey Brin looked at the same problem—how to impart meaning to all the content on the Web—and decided to take a different approach. The two developed sophisticated software that relied on other clues to discover the meaning of content, such as which Web sites the information was linked to. And in 1998 they launched Google..
You mean noise ranking. Now, I don't think Larry and Sergey set out to do this, but Google page ranks are ultimately based on the concept of "Google Juice" (aka links). The value quotient of this algorithm is accelerating at internet speed (ironically, but naturally). Human beings are smarter than computers, we just process data (not information!) much slower that's all. Thus, we can conjure up numerous ways to bubble up the google link ranking algorithms in no time (as is the case today).
..What most differentiates Google's approach from Berners-Lee's is that Google doesn't require people to change the way they post content..
The Semantic Web doesn't require anyone to change how they post content either! It just provides a roadmap for intelligent content managment and consumption through innovative products.
..As Sergey Brin told Infoworld's 2002 CTO Forum, "I'd rather make progress by having computers under-stand what humans write, than by forcing -humans to write in ways that computers can understand." In fact, Google has not participated at all in the W3C's formulation of Semantic Web standards, says Eric Miller..
Semantic Content generated by next generation content managers will make more progress, and they certainly won't require humans to write any differently. If anything, humans will find the process quite refreshing as and when participation is required e.g. clicking bookmarklets associated with tagging services such as 'del.icio.us', 'de.lirio.us', or Unalog and others. But this is only the beginning, if I can click on a bookmarklet to post this blog post to a tagging service, then why wouldn't I be able to incorporate the "tag service post" into the same process that saves my blog post (the post is content that ends up in a content management system aka blog server)?
Yet Google's impact on the Web is so dramatic that it probably makes more sense to call the next generation of the Web the "Google Web" rather than the "Semantic Web."
Ah! so you think we really want the noisy "Google Web" as opposed to a federation of distributed Information- and Knowledgbases ala the "Semantic Web"? I don't think so somehow!
Today we are generally excited about "tagging" but fail to see its correlation with the "Semantic Web", somehow? I have said this before, and I will say it again, the "Semantic Web" is going to be self-annotated by humans with the aid of intelligent and unobtrusive annotation technology solutions. These solutions will provide context and purpose by using our our social essence as currency. The annotation effort will be subliminal, there won't be a "Semantic Web Day" parade or anything of the like. It will appear before us all, in all its glory, without any fanfare. Funnily enough, we might not even call it "The Semantic Web", who cares? But it will have the distinct attributes of being very "Quiet" and highly "Valuable"; with no burden on "how we write", but constructive burden on "why we write" as part of the content contribution process (less Google/Yahoo/etc juice chasing for more knowledge assembly and exchange).
We are social creatures at our core. The Internet and Web have collectively reduced the connectivity hurdles that once made social network oriented solutions implausible. The eradication of these hurdles ultimately feeds the very impulses that trigger the critical self-annotation that is the basis of my fundamental belief in the realization of TBL's Semantic Web vision.
]]>
The thing that most surprised me today in the SoftEdge panel on Social Software was the reaction to RSS. I should be clear that I am an RSS true believer. It seems to me that metadata as a byproduct of social software engines (be it blogging or social networking or whatever) is not only enviable, it is inevitable. RSS and FOAF and other yet-to-be-determined social software data protocols will become standards because it simply makes good sense for them to be standardized. Anyone paying attention to the unbelievable development and adoption curve of wireless can appreciate the immense value driven by standards -- and, in particular, standards that are truly standard. So it came as a bit of a shock to me that when I questioned the panelists on the implications of RSS and the Semantic Web, they were less sold on the inevitability of it all.
When asked the question of whether the proliferation of RSS and FOAF might make it possible for reader technology to be the next killer application in knowledge management, I got very strong reactions from both Reid Hoffman and Meg Hourihan. Reid stated that he did not believe that RSS was sufficiently robust to provide significant value an any level. Meg followed up with a general indictment of the semantic web, which she views merely as a geek utopia. I will admit that I'm a fan of Candide (particularly at the hands of Bernstein), but I hardly view myself as Panglos. One need look no further than, for example, the tools that Oddpost has incorporated into its web email client to allow an integrated email and blog experience. Better yet, through a relatively simple web service, Oddpost can deliver an RSS feed of a particular Google News search so that you can keep track of keywords that are of interest to you without having to visit Google repeatedly to find out if your company or candidate or favorite band has been mentioned in today's news. The same is true of watch lists on Technorati. Rather than periodically check to see if someone has linked to your blog, Technorati will do the work for you and deliver the info to your inbox only when there is information to be delivered. These examples are just the tip of the iceberg but the demonstrate the nascent power of RSS and related standards. I'll have to wait for another panel to have that argument with Reid and Meg.
[via The Scobleizer Weblog]
Now this is good news from Microsoft! This means that products like Virtuoso can now compete head-on with Yukon (on a level playing field when it arrives) as far as Visual Studio.NET integration goes. Hopefully I will no longer have to rant about any of the following:
I wonder if the same degree of openness could extend to Web Matrix? That would be something indeed!
]]>A simple guide usable by any Perl developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.
# # Demonstrating use of a single query to populate a # Virtuoso Quad Store via Perl. # # # HTTP URL is constructed accordingly with CSV query results format as the default via mime type. # use CGI qw/:standard/; use LWP::UserAgent; use Data::Dumper; use Text::CSV_XS; sub sparqlQuery(@args) { my $query=shift; my $baseURL=shift; my $format=shift; %params=( "default-graph" => "", "should-sponge" => "soft", "query" => $query, "debug" => "on", "timeout" => "", "format" => $format, "save" => "display", "fname" => "" ); @fragments=(); foreach $k (keys %params) { $fragment="$k=".CGI::escape($params{$k}); push(@fragments,$fragment); } $query=join("&", @fragments); $sparqlURL="${baseURL}?$query"; my $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); my $req = HTTP::Request->new(GET => $sparqlURL); my $res = $ua->request($req); $str=$res->content; $csv = Text::CSV_XS->new(); foreach $line ( split(/^/, $str) ) { $csv->parse($line); @bits=$csv->fields(); push(@rows, [ @bits ] ); } return \@rows; } # Setting Data Source Name (DSN) $dsn="http://dbpedia.org/resource/DBpedia"; # Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET using the IRI in # FROM clause as Data Source URL en route to DBMS # record Inserts. $query="DEFINE get:soft \"replace\"\n # Generic (non Virtuoso specific SPARQL # Note: this will not add records to the # DBMS SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; $data=sparqlQuery($query, "http://localhost:8890/sparql/", "text/csv"); print "Retrieved data:\n"; print Dumper($data);
Retrieved data: $VAR1 = [ [ 's', 'p', 'o' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2002/07/owl#Thing' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://dbpedia.org/ontology/Work' ], [ 'http://dbpedia.org/resource/DBpedia', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://dbpedia.org/class/yago/Software106566077' ], ...
CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Perl developer that already knows how to use Perl for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Javascript developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing.
/* Demonstrating use of a single query to populate a # Virtuoso Quad Store via Javascript. */ /* HTTP URL is constructed accordingly with JSON query results format as the default via mime type. */ function sparqlQuery(query, baseURL, format) { if(!format) format="application/json"; var params={ "default-graph": "", "should-sponge": "soft", "query": query, "debug": "on", "timeout": "", "format": format, "save": "display", "fname": "" }; var querypart=""; for(var k in params) { querypart+=k+"="+encodeURIComponent(params[k])+"&"; } var queryURL=baseURL + '?' + querypart; if (window.XMLHttpRequest) { xmlhttp=new XMLHttpRequest(); } else { xmlhttp=new ActiveXObject("Microsoft.XMLHTTP"); } xmlhttp.open("GET",queryURL,false); xmlhttp.send(); return JSON.parse(xmlhttp.responseText); } /* setting Data Source Name (DSN) */ var dsn="http://dbpedia.org/resource/DBpedia"; /* Virtuoso pragma "DEFINE get:soft "replace" instructs Virtuoso SPARQL engine to perform an HTTP GET using the IRI in FROM clause as Data Source URL with regards to DBMS record inserts */ var query="DEFINE get:soft \"replace\"\nSELECT DISTINCT * FROM <"+dsn+"> WHERE {?s ?p ?o}"; var data=sparqlQuery(query, "/sparql/");
Place the snippet above into the <script/> section of an HTML document to see the query result.
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Javascript developer that already knows how to use Javascript for HTTP based data access within HTML. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any PHP developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. PHP.
#!/usr/bin/env php <?php # # Demonstrating use of a single query to populate a # Virtuoso Quad Store via PHP. # # HTTP URL is constructed accordingly with JSON query results format in mind. function sparqlQuery($query, $baseURL, $format="application/json") { $params=array( "default-graph" => "", "should-sponge" => "soft", "query" => $query, "debug" => "on", "timeout" => "", "format" => $format, "save" => "display", "fname" => "" ); $querypart="?"; foreach($params as $name => $value) { $querypart=$querypart . $name . '=' . urlencode($value) . "&"; } $sparqlURL=$baseURL . $querypart; return json_decode(file_get_contents($sparqlURL)); }; # Setting Data Source Name (DSN) $dsn="http://dbpedia.org/resource/DBpedia"; #Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET #using the IRI in FROM clause as Data Source URL $query="DEFINE get:soft \"replace\" SELECT DISTINCT * FROM <$dsn> WHERE {?s ?p ?o}"; $data=sparqlQuery($query, "http://localhost:8890/sparql/"); print "Retrieved data:\n" . json_encode($data); ?>
Retrieved data: {"head": {"link":[],"vars":["s","p","o"]}, "results": {"distinct":false,"ordered":true, "bindings":[ {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/www.w3.org\/2002\/07\/owl#Thing"}}, {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/dbpedia.org\/ontology\/Work"}}, {"s": {"type":"uri","value":"http:\/\/dbpedia.org\/resource\/DBpedia"},"p": {"type":"uri","value":"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"},"o": {"type":"uri","value":"http:\/\/dbpedia.org\/class\/yago\/Software106566077"}}, ...
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a PHP developer that already knows how to use PHP for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Python developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Python.
#!/usr/bin/env python # # Demonstrating use of a single query to populate a # Virtuoso Quad Store via Python. # import urllib, json # HTTP URL is constructed accordingly with JSON query results format in mind. def sparqlQuery(query, baseURL, format="application/json"): params={ "default-graph": "", "should-sponge": "soft", "query": query, "debug": "on", "timeout": "", "format": format, "save": "display", "fname": "" } querypart=urllib.urlencode(params) response = urllib.urlopen(baseURL,querypart).read() return json.loads(response) # Setting Data Source Name (DSN) dsn="http://dbpedia.org/resource/DBpedia" # Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET # using the IRI in FROM clause as Data Source URL query="""DEFINE get:soft "replace" SELECT DISTINCT * FROM <%s> WHERE {?s ?p ?o}""" % dsn data=sparqlQuery(query, "http://localhost:8890/sparql/") print "Retrieved data:\n" + json.dumps(data, sort_keys=True, indent=4) # # End
Retrieved data: { "head": { "link": [], "vars": [ "s", "p", "o" ] }, "results": { "bindings": [ { "o": { "type": "uri", "value": "http://www.w3.org/2002/07/owl#Thing" }, "p": { "type": "uri", "value": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" }, "s": { "type": "uri", "value": "http://dbpedia.org/resource/DBpedia" } }, ...
JSON was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Python developer that already knows how to use Python for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A simple guide usable by any Ruby developer seeking to exploit SPARQL without hassles.
SPARQL is a powerful query language, results serialization format, and an HTTP based data access protocol from the W3C. It provides a mechanism for accessing and integrating data across Deductive Database Systems (colloquially referred to as triple or quad stores in Semantic Web and Linked Data circles) -- database systems (or data spaces) that manage proposition oriented records in 3-tuple (triples) or 4-tuple (quads) form.
SPARQL queries are actually HTTP payloads (typically). Thus, using a RESTful client-server interaction pattern, you can dispatch calls to a SPARQL compliant data server and receive a payload for local processing e.g. local object binding re. Ruby.
#!/usr/bin/env ruby # # Demonstrating use of a single query to populate a # Virtuoso Quad Store. # require 'net/http' require 'cgi' require 'csv' # # We opt for CSV based output since handling this format is straightforward in Ruby, by default. # HTTP URL is constructed accordingly with CSV as query results format in mind. def sparqlQuery(query, baseURL, format="text/csv") params={ "default-graph" => "", "should-sponge" => "soft", "query" => query, "debug" => "on", "timeout" => "", "format" => format, "save" => "display", "fname" => "" } querypart="" params.each { |k,v| querypart+="#{k}=#{CGI.escape(v)}&" } sparqlURL=baseURL+"?#{querypart}" response = Net::HTTP.get_response(URI.parse(sparqlURL)) return CSV::parse(response.body) end # Setting Data Source Name (DSN) dsn="http://dbpedia.org/resource/DBpedia" #Virtuoso pragmas for instructing SPARQL engine to perform an HTTP GET #using the IRI in FROM clause as Data Source URL query="DEFINE get:soft \"replace\" SELECT DISTINCT * FROM <#{dsn}> WHERE {?s ?p ?o} " #Assume use of local installation of Virtuoso #otherwise you can change URL to that of a public endpoint #for example DBpedia: http://dbpedia.org/sparql data=sparqlQuery(query, "http://localhost:8890/sparql/") puts "Got data:" p data # # End
Got data: [["s", "p", "o"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://www.w3.org/2002/07/owl#Thing"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://dbpedia.org/ontology/Work"], ["http://dbpedia.org/resource/DBpedia", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://dbpedia.org/class/yago/Software106566077"], ...
CSV was chosen over XML (re. output format) since this is about a "no-brainer installation and utilization" guide for a Ruby developer that already knows how to use Ruby for HTTP based data access. SPARQL just provides an added bonus to URL dexterity (delivered via URI abstraction) with regards to constructing Data Source Names or Addresses.
A service from OpenLink Software, available at: http://uriburner.com, that enables anyone to generate structured descriptions -on the fly- for resources that are already published to HTTP based networks. These descriptions exist as hypermedia resource representations where links are used to identify:
The hypermedia resource representation outlined above is what is commonly known as an Entity-Attribute-Value (EAV) Graph. The use of generic HTTP scheme based Identifiers is what distinguishes this type of hypermedia resource from others.
The virtues (dual pronged serendipitous discovery) of publishing HTTP based Linked Data across public (World Wide Web) or private (Intranets and/or Extranets) is rapidly becoming clearer to everyone. That said, the nuance laced nature of Linked Data publishing presents significant challenges to most. Thus, for Linked Data to really blossom the process of publishing needs to be simplified i.e., "just click and go" (for human interaction) or REST-ful orchestration of HTTP CRUD (Create, Read, Update, Delete) operations between Client Applications and Linked Data Servers.
In similar vane to the role played by FeedBurner with regards to Atom and RSS feed generation, during the early stages of the Blogosphere, it enables anyone to publish Linked Data bearing hypermedia resources on an HTTP network. Thus, its usage covers two profiles: Content Publisher and Content Consumer.
The steps that follow cover all you need to do:
That's it! The discoverability (SDQ) of your content has just multiplied significantly, its structured description is now part of the Linked Data Cloud with a reference back to your site (which is now a bona fide HTTP based Linked Data Space).
HTML+RDFa based representation of a structured resource description:
<link rel="describedby" title="Resource Description (HTML)"type="text/html" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
JSON based representation of a structured resource description:
<link rel="describedby" title="Resource Description (JSON)" type="application/json" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
N3 based representation of a structured resource description:
<link rel="describedby" title="Resource Description (N3)" type="text/n3" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
RDF/XML based representations of a structured resource description:
<link rel="describedby" title="Resource Description (RDF/XML)" type="application/rdf+xml" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>
As an end-user, obtaining a structured description of any resource published to an HTTP network boils down to the following steps:
If you are a developer, you can simply perform an HTTP operation request (from your development environment of choice) using any of the URL patterns presented below:
HTML:URIBurner is a "deceptively simple" solution for cost-effective exploitation of HTTP based Linked Data meshes. It doesn't require any programming or customization en route to immediately realizing its virtues.
If you like what URIBurner offers, but prefer to leverage its capabilities within your domain -- such that resource description URLs reside in your domain, all you have to do is perform the following steps:
When you install your own URIBurner instances, you also have the ability to perform customizations that increase resource description fidelity in line with your specific needs. All you need to do is develop a custom extractor cartridge and/or meta cartridge.
Anyway, Socialtext and Mike 2.0 (they aren't identical and juxtaposition isn't seeking to imply this) provide nice demonstrations of socially enhanced collaboration for individuals and/or enterprises is all about:
As is typically the case in this emerging realm, the critical issue of discrete "identifiers" (record keys in sense) for data items, data containers, and data creators (individuals and groups) is overlooked albeit unintentionally.
Rather than using platform constrained identifiers such as:
It enables you to leverage the platform independence of HTTP scheme Identifiers (Generic URIs) such that Identifiers for:
simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked Data Objects endowed with High SDQ (Serendipitious Discovery Quotient). For example my Personal WebID is all anyone needs to know if they want to explore:
Even when you reach a point of equilibrium where: your daily activities trigger orchestratestration of CRUD (Create, Read, Update, Delete) operations against Linked Data Objects within your socially enhanced collaboration network, you still have to deal with the thorny issues of security, that includes the following:
FOAF+SSL, an application of HTTP based Linked Data, enables you to enhance your Personal HTTP scheme based Identifer (or WebID) via the following steps (peformed by a FOAF+SSL compliant platform):
Contrary to conventional experiences with all things PKI (Public Key Infrastructure) related, FOAF+SSL compliant platforms typically handle the PKI issues as part of the protocol implementation; thereby protecting you from any administrative tedium without compromising security.
Understanding how new technology innovations address long standing problems, or understanding how new solutions inadvertently fail to address old problems, provides time tested mechanisms for product selection and value proposition comprehension that ultimately save scarce resources such as time and money.
If you want to understand real world problem solution #1 with regards to HTTP based Linked Data look no further than the issues of secure, socially aware, and platform independent identifiers for data objects, that build bridges across erstwhile data silos.
If you want to cost-effectively experience what I've outlined in this post, take a look at OpenLink Data Spaces (ODS) which is a distributed collaboration engine (enterprise of individual) built around the Virtuoso database engines. It simply enhances existing collaboration tools via the following capabilities:
Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items (if missing)
Since the beginning of the modern IT era, each period of innovation has inadvertently introduced its fair share of Data Silos. The driving force behind this anomaly remains an overemphasis on the role of applications when selecting problem solutions. Unfortunately, most solution selecting decision makers remain oblivious to the fact that most applications are architecturally monolithic; i.e., they fail to separate the following five layers that are critical to all solutions:
The rise of the Internet, and its exponentially-growing user-friendly enclave known as the World Wide Web, is bringing the intrinsic costs of the monolithic application architecture anomaly to bear -- in manners unanticipated by many. For example, the emergence of network-oriented solutions across the realms of Enterprise 2.0-based Collaboration and Web 2.0-based Software-as-a-Service (SaaS), combined with the overarching influence of Social Media, are producing more heterogeneously-structured and disparately-located data sources than people can effectively process.
As is often the case, a variety of problem and product monikers have emerged for the data access and integration challenges outlined above. Contemporary examples include Enterprise Information Integration, Master Data Management, and Data Virtualization. Labeling aside, the fundamental issues of the unresolved Data Integration challenge boil down to the following:
Effectively solving today's data integration challenges requires a move away from monolithic application architecture to loosely-coupled, network-centric application architectures. Basically, we need a ubiquitous network-centric application protocol that lends itself to loosely-coupled across-the-wire orchestration of data interactions. In short, this will be what revitalizes the art of application development and deployment.
The World Wide Web is built around a network application protocol called HTTP. This protocol intrinsically separates the five layers listed earlier, thereby enabling:
A uniquely designed to address today's escalating Data Access and Integration challenges without compromising performance, security, or platform independence. At its core lies an unrivaled commitment to industry standards combined with unique technology innovation that transcends erstwhile distinct realms such as:
When Virtuoso is installed and running, HTTP-based Data Objects are automatically created as a by-product of its powerful data virtualization, transcending data sources and data representation formats. The benefits of such power extend across profiles such as:
Bottom line, Virtuoso delivers unrivaled flexibility and scalability, without compromising performance or security.
]]>
In this post I provide a brief re-introduction to this essential aspect of Virtuoso.
This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).
In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.
The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!
You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).
You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).
It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.
As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.
When trying to understand HTTP based Linked Data, especially if you're well versed in DBMS technology use (User, Power User, Architect, Analyst, DBA, or Programmer) think:
Remember the need for Data Access & Integration technology is the by product of the following realities:
In a nutshell, the AWS Cloud infrastructure simplifies the process of generating Federated presence on the Internet and/or World Wide Web. Remember, centralized networking models always end up creating data silos, in some context, ultimately! :-)
]]>The observations above triggered a discussion thread on Twitter that involved: @edsu, @iand, and moi. Naturally, it morphed into a live demonstration of: human vs machine, interpretation of claims expressed in the RDF graph.
It showcases (in Man vs Machine style) the issue of unambiguously discerning the meaning of the owl:sameAs claim expressed in the LCSH Linked Data Space.
From the Linked Data perspective, it may spook a few people to see owl:sameAs values such as: "info:lc/authorities/sh95000541", that cannot be de-referenced using HTTP.
It may confuse a few people or user agents that see URI de-referencing as not necessarily HTTP specific, thereby attempting to de-reference the URI.URN on the assumption that it's associated with a "handle system", for instance.
It may even confuse RDFizer / RDFization middleware that use owl:sameAs as a data provider attribution mechanism via hint/nudge URI values derived from original content / data URI.URLs that de-reference to nothing e.g., an original resource URI.URL plus "#this" which produces URI.URN-URL -- think of this pattern as "owl:shameAs" in a sense :-)
Simply bring OWL reasoning (inference rules and reasoners) into the mix, thereby negating human dialogue about interpretation which ultimately unveils a mesh of orthogonal view points. Remember, OWL is all about infrastructure that ultimately enables you to express yourself clearly i.e., say what you mean, and mean what you say.
The SPARQL queries against the Graph generated and automatically populated by the Sponger reveal -- without human intervention-- that: "info:lc/authorities/sh95000541", is just an alternative name for < xmlns="http" id.loc.gov="id.loc.gov" authorities="authorities" sh95000541="sh95000541" concept="concept">, and that the graph produced by LCSH is self-describing enough for an OWL reasoner to figure this all out courtesy of the owl:sameAs property :-).
Hopefully, this post also provides a simple example of how OWL facilitates "Reasonable Linked Data".
Robin:
Web 3.0 is fundamentally about the World Wid Web becoming a structured database equipped with a formal data model (RDF which is a moniker for Entity-Attribute-Value with Classes & Relationships based Graph Model), query language, and a protocol for handling divrerse data representational requirements via negotiation
.Web 3.0 is about a Web that facilitates serendipitous discovery of relevant things; thereby making serendipitous discovery quotient (SDQ), rather than search engine optimization (SEO), the critical success factor that drives how resources get published on the Web.
Personally, I believe we are on the cusp of a major industry inflection re. how we interact with data hosted in computing spaces. In a nutshell, the conceptual model interaction based on real-world entities such as people, places, and other things (including abstract subject matter) will usurp traditional logical model interaction based on rows and columns of typed and/or untyped literal values exemplified by relational data access and management systems.
Labels such as "Web 3.0", "Linked Data", and "Semantic Web", are simply about the aforementioned model transition playing out on the World Wide Web and across private Linked Data Webs such as Intranets & Extranets, as exemplified emergence of the "Master Data Management" label/buzzword.
As was the case with Web Services re. Web 2.0, there is a critical piece of infrastructure driving the evolution in question, and in this case it comes down to the evolution of Hyperlinking.
We now have a new and complimentary variant of Hyperlinking commonly referred to as "Hyperdata" that now sits alongside "Hypertext". Hyperdata when used in conjunction with HTTP based URIs as Data Source Names (or Identifiers), delivers a potent and granular data access mechanism scoped down to the datum (object or record) level; which is much different from the document (record or entity container) level linkage that Hypertext accords.
In addition, the incorporation of HTTP into this new and enhanced granular Data Source Naming mechanism also addresses past challenges relating to separation of data, data representation, and data transmission protocols -- remember XDR woes familiar to all sockets level programmers -- courtesy of in-built content negotiation. Hence, via a simple HTTP GET --against a Data Source Name exposed by a Hyperdata link -- I can negotiate (from client or server sides) the exact representation of the description (entity-attribute-value graph) of an Entity / Data Object / Resource, dispatched by a data server.
For example, this is how a description of entity "Me" ends up being available in (X)HTML or RDF document representations (as you will observe when you click on that link to my Personal URI).
The foundation of what I describe above comes from:
Some live examples from DBpedia:
Today, I revisited the same article -- and to my shock and horror -- my comments do not exist (note: the site did accept my comments yesterday!). Even more frustrating for me, I now have to expend time I don't have re-writing my comments due to the depth and danger of the inaccuracies in this post re. RDF in general.
Please look into what happened to my comments. It's too early for me to conclude that subjective censorship is a play on the Web -- which isn't a hard copy journalistic format style of platform where editors get away with such shenanigans. The Web is a sticky database, and outer joining is well and truly functional (meaning: exclusion and omission ultimately come back to bite via full outer join query results against the Web DB).
By the way, if you publish the comments I made to the post (yesterday), I will add a note to this post, accordingly.
Yes! David just confirmed to me via Twitter that this is yet another comment system related issue and absolutely no intent to censor etc. His words Twervatim :-)
For sake of clarity, I've itemized the inaccuracies and applied my correction comments (inline) accordingly:
Inaccuracy #1:
Resource Description Framework (RDF), a part of the XML story, provides interoperability between applications that exchange information.
Correction #1:
RDF and XML are not inextricably linked in any way. RDF is part Data Model (EAV/CR style Graph) with associated markup and data serialization formats that include: N3, Turtle, TriX, RDF/XML etc.
Inaccuracy #2:
RDF uses XML to define a foundation for processing metadata and to provide a standard metadata infrastructure for both the Web and the enterprise.
Correction #2:
RDF/XML is an XML based markup and data serialization format. As a markup language it can be used for creating RDF model records/statements (using Subject, Predicate, Object or Entity, Attribute, Value). As a serialization format, it provides a mechanism for marshaling RDF data across data managers and data consumers.
Inaccuracy #3:
The difference between the two is that XML is used to transport data using a common format, while RDF is layered on top of XML defining a broad category of data.
Correction #3:
See earlier corrections above.
Inaccuracy #4:
When the XML data is declared to be of the RDF format, applications are then able to understand the data without understanding who sent it.
Correction #4:
You do not declare data to be of RDF format. RDF isn't a format it is a data model (as stated above). You can "up lift" or map data from XML to RDF (hierarchical to graph model mapping). Likewise you can "down shift" or map data from RDF to XML (example: SPARQL SELECT query patterns "down shift" to SPARQL Results XML, which isn't RDF/XML, while keeping access to graphs via URIs or Entity Identifiers that reside within the serialization).
Inaccuracy #5:
RDF extends the XML model and syntax to be specified for describing either resources or a collection of information. (XML points to a resource in order to scope and uniquely identify a set of properties known as the schema.).
Correction #5:
See earlier comments.
The single accurate paragraph in this ebiz article lies right at the end and it states the following:
"I've always thought RDF has been underutilized for data integration, and it's really an old standard. Now that we're focused on both understanding and integrating data, perhaps RDF should make a comeback."
As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
See: AVF Pyramid Diagram.The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
See: RDBMS Primacy Diagram.For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
See: New EAV/CR Primacy Diagram.As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services.
The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy.
In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy.
Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte.
For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below:
"Future of Database Research is excellent, but what is the future of data?"
"..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular."-- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers.
"One size fits all: A concept whose time has come and gone
- They are direct descendants of System R and Ingres and were architected more than 25 years ago
- They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs.
-- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry.
Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis".
As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things" that extend beyond primitive data hosted database tables and content management style document containers; we've struggled to achieve this in the most basic sense, let alone evolve our ability to connect inline with the exponential rate at which the Internet & Web are spawning "universes of discourse" (data spaces) that emanate from user activity (within the enterprise and across the Internet & Web). In a nutshell, we haven't been able to upgrade our interaction with data such that "conceptual models" and resulting "context lenses" (or facets) become concrete; by this I mean: real-world entity interaction making its way into the computer realm as opposed to the impedance we all suffer today when we transition from conceptual model interaction (real-world) to logical model interaction (when dealing with RDBMS based data access and data management).
Here are some simple examples of what I can only best describe as: "critical dots unconnected", resulting from an inability to interact with data conceptually:
Government (Globally) -Financial regulatory bodies couldn't effectively discern that a Credit Default Swap is an Insurance policy in all but literal name. And in not doing so the cost of an unregulated insurance policy laid the foundation for exacerbating the toxicity of fatally flawed mortgage backed securities. Put simply: a flawed insurance policy was the fallback on a toxic security that financiers found exotic based on superficial packaging.
Enterprises -Banks still don't understand that capital really does exists in tangible and intangible forms; with the intangible being the variant that is inherently dynamic. For example, a tech companies intellectual capital far exceeds the value of fixture, fittings, and buildings, but you be amazed to find that in most cases this vital asset has not significant value when banks get down to the nitty gritty of debt collateral; instead, a buffer of flawed securitization has occurred atop a borderline static asset class covering the aforementioned buildings, fixtures, and fittings.
In the general enterprise arena, IT executives continued to "rip and replace" existing technology without ever effectively addressing the timeless inability to connect data across disparate data silos generated by internal enterprise applications, let alone the broader need to mesh data from the inside with external data sources. No correlations made between the growth of buzzwords and the compounding nature of data integration challenges. It's 2009 and only a miniscule number of executives dare fantasize about being anywhere within distance of the: relevant information at your fingertips vision.
Looking more holistically at data interaction in general, whether you interact with data in the enterprise space (i.e., at work) or on the Internet or Web, you ultimately are delving into a mishmash of disparate computer systems, applications, service (Web or SOA), and databases (of the RDBMS variety in a majority of cases) associated with a plethora of disparate schemas. Yes, but even today "rip and replace" is still the norm pushed by most vendors; pitting one mono culture against another as exemplified by irrelevances such as: FOSS/LAMP vs Commercial or Web vs. Enterprise, when none of this matters if the data access and integration issues are recognized let alone addressed (see: Applications are Like Fish and Data Like Wine).
Like the current credit-crunch, exponential growth of data originating from disparate application databases and associated schemas, within shrinking processing time frames, has triggered a rethinking of what defines data access and data management value today en route to an inevitable RDBMS downgrade within the value pyramid.
There have been many attempts to address real-world modeling requirements across the broader DBMS community from Object Databases to Object-Relational Databases, and more recently the emergence of simple Entity-Attribute-Value model DBMS engines. In all cases failure has come down to the existence of one or more of the following deficiencies, across each potential alternative:
A common characteristic shared by all post-relational DBMS management systems (from Object Relational to pure Object) is an orientation towards variations of EAV/CR based data models. Unfortunately, all efforts in the EAV/CR realm have typically suffered from at least one of the deficiencies listed above. In addition, the same "one DBMS model fits all" approach that lies at the heart of the RDBMS downgrade also exists in the EAV/CR realm.
The RDBMS is not going away (ever), but its era of primacy -- by virtue of its placement at the apex of the data access and data management value pyramid -- is over! I make this bold claim for the following reasons:
Based on the above, it is crystal clear that a different kind of DBMS -- one with higher AVF relative to the RDBMS -- needs to sit atop today's data access and data management value pyramid. The characteristics of this DBMS must include the following:
Quick recap, I am not saying that RDBMS engine technology is dead or obsolete. I am simply stating that the era of RDBMS primacy within the data access and data management value pyramid is over.
The problem domain (conceptual model views over heterogeneous data sources) at the apex of the aforementioned pyramid has simply evolved beyond the natural capabilities of the RDBMS which is rooted in "Closed World" assumptions re., data definition, access, and management. The need to maintain domain based conceptual interaction with data is now palpable at every echelon within our "Global Village" - Internet, Web, Enterprise, Government etc.
It is my personal view that an EAV/CR model based DBMS, with support for the seven items enumerated above, can trigger the long anticipated RDBMS downgrade. Such a DBMS would be inherently multi-model because you would need to the best of RDBMS and EAV/CR model engines in a single product, with in-built support for HTTP and other Internet protocols in order to effectively address data representation and serialization issues.
Examples of contemporary EAV/CR frameworks that provide concrete conceptual layers for data access and data management currently include:
The frameworks above provide the basis for a revised AVF pyramid, as depicted below, that reflects today's data access and management realities i.e., an Internet & Web driven global village comprised of interlinked distributed data objects, compatible with "Open World" assumptions.
What is our "Search" and "Find" demonstration about? It is about how you use the "Description" of "Things" to unambiguously locate things in a database at Web Scale.
To our perpetual chagrin, we are trying to demonstrate an engine -- not UI prowess -- but the immediate response is to jump to the UI aesthetics.
Google, Yahoo etc.. offer a simple input form for full text search patterns, they have a processing window for completing full text searches across Web Content indexed on their servers. Once the search patterns are processed, you get a page ranked result set (collection of Web pages basically that claim/state: we found N pages out of a document corpus of about M indexed pages).
Note: the estimate aspect of traditional search results in like "advertising small print" the user lives with the illusion that all possible documents on the Web (or even Internet) have been searched whereas in reality: 25% of the possible total is a major stretch; since the Web and Internet are fractal networks and scale-free, inherently growing at exponential rates "ad infinitum" across boundless dimensions of human comprehension.
The power of Linked Data ultimately comes down to the fact that the user constructs the path to what they seek via the properties of the "Things" in question. The routes are not hardwired since URI de-referencing (follow your nose pattern) is available to Linked Data aware query engines and crawlers.
We are simply trying to demonstrate how you can combine the best of full text search with the best of structured querying while reusing familiar interaction patterns from Google/Yahoo. Thus, you start with full text search, find get all the entities associated with the pattern, then use the entity types or entity properties to find what you seek.
You state in your post:
"To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed."
Correct.
"On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?".
Not exactly correct. We are performing aggregates using a configurable interactive time factor. Example: tell me how many entities of type: Person, with interest: Semantic Web, exist in this database within 2 seconds. Also understand that you could retry the same query and get different numbers within the same interactive time factor. It isn't your basic "query cut-off".
"And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can."
Yes, "Microsoft" was a poor example for sure, the example could have been pattern: "glenn mcdonald", which should demonstrate the fundamental utility of what we are trying to demonstrate i.e., entity disambiguation courtesy of entity properties and/or entity type filtering.
Compare Googles results for: Glenn McDonald with those from our demo (which dissambiguate "Glenn McDonald" via associated properties and/or types), assuming we both agree that your Web Site or Blog Home isn't the center of your entity graph or personal data space (i.e., data about you); so getting your home page at the top of the Google page rank offers limited value, in reality.
What are we bragging about? A little more than what you attempt to explain. Yes, we are showing that we can find stuff within a processing window, but understand the following:
I hope I've clarified what's going on with our demo? If not, pose your challenge via examples and I will respond with solutions or simply cry out loud: "no mas!".
As for your "Mac OX X Leopard" comments, I can only say this: I emphasized that this is a demo, the data is pretty old, and the input data has issues (i.e. some of the input data is bad as your example shows). The purpose of this demo is not about the text per se., it's about the size of the data corpus and faceted querying. We are going to have the entire LOD Cloud loaded into the real thing, and in addition to that our Sponger Middleware will be enabled, and then you can take issue with data quality as per your reference to "Cyndi Lauper" (btw - it takes one property filter to find information about her quickly using "dbpprop:name" after filtering for properties with text values).
Of all things, this demo had nothing to do with UI and Information presentation aesthetics. It was all about combining full text search and structured queries (sparql behind the scenes) against a huge data corpus en route to solving challenges associated with faceted browsing over large data sets. We have built a service that resides inside Virtuoso. The Service is naturally of the "Web Service" variety and can be used from any consumer / client environment that speaks HTTP (directly or indirectly).
To be continued ...
]]>A data access driver/provider that provides conceptual entity oriented access to RDBMS data managed by Virtuoso. Naturally, it also uses Virtuoso's in-built virtual / federated database layer to provide access to ODBC and JDBC accessible RDBMS engines such as: Oracle (7.x to latest), SQL Server (4.2 to latest), Sybase, IBM Informix (5.x to latest), IBM DB2, Ingres (6.x to latest), Progress (7.x to OpenEdge), MySQL, PostgreSQL, Firebird, and others using our ODBC or JDBC bridge drivers.
It delivers an Entity-Attribute-Value + Classes & Relationships model over disparate data sources that are materialized as .NET Entity Framework Objects, which are then consumable via ADO.NET Data Object Services, LINQ for Entities, and other ADO.NET data consumers.
The provider is fully integrated into Visual Studio 2008 and delivers the same "ease of use" offered by Microsoft's own SQL Server provider, but across Virtuoso, Oracle, Sybase, DB2, Informix, Ingres, Progress (OpenEdge), MySQL, PostgreSQL, Firebird, and others. The same benefits also apply uniformly to Entity Frameworks compatibility.
Bearing in mind that Virtuoso is a multi-model (hybrid) data manager, this also implies that you can use .NET Entity Frameworks against all data managed by Virtuoso. Remember, Virtuoso's SQL channel is a conduit to Virtuoso's core; thus, RDF (courtesy of SPASQL as already implemented re. Jena/Sesame/Redland providers), XML, and other data forms stored in Virtuoso also become accessible via .NET's Entity Frameworks.
You can choose which entity oriented data access model works best for you: RDF Linked Data & SPARQL or .NET Entity Frameworks & Entity SQL. Either way, Virtuoso delivers a commercial grade, high-performance, secure, and scalable solution.
Note: When working with external or 3rd party databases, simply use the Virtuoso Conductor to link the external data source into Virtuoso. Once linked, the remote tables will simply be treated as though they are native Virtuoso tables leaving the virtual database engine to handle the rest. This is similar to the role the Microsoft JET engine played in the early days of ODBC, so if you've ever linked an ODBC data source into Microsoft Access, you are ready to do the same using Virtuoso.
In recent times I've stumbled across Master Data Management (MDM) which is all about entities that provide holistic views of enterprise data (or what I call: Context Lenses). I've also stumbled across emerging tensions in the .NET realm between Linq to Entities and Linq to SQL, where in either case the fundamental issues comes down to the optimal paths "Conceptual Level Access" over the "Logical Logical Level" when dealing with data access in the .NET realm.
Strangely, the emerging realm of RDF Linked Data, MDM, and .NET's Entity Frameworks, remain strangely disconnected.
Another oddity is the obvious, but barely acknowledged, blurring of the lines between the "traditional enterprise employee" and the "individual Web netizen". The fusion between these entities is one of the most defining characteristics of how the Web is reshaping the data landscape.
At the current time, I tend to crystalize my data access world view under the moniker: YODA ("You" Oriented Data Access), based on the following:
Now, if re-labeling can confuse me when applied to a realm I've been intimately involved with for eons (internet time). I don't want to imagine what it does for others who aren't that intimately involved with the important data access and data integration realms.
On the more refreshing side, the article does shed some light on the potency of RDF and OWL when applied to the construction of conceptual views of heterogeneous data sources.
"How do you know that data coming from one place calculates net revenue the same way that data coming from another place does? You’ve got people using the same term for different things and different terms for the same things. How do you reconcile all of that? That’s really what semantic integration is about."
BTW - I discovered this article via another titled: Understanding Integration And How It Can Help with SOA, that covers SOA and Integration matters. Again, in this piece I feel the gradual realization of the virtues that RDF, OWL, and RDF Linked Data bring to bear in the vital realm of data integration across heterogeneous data silos.
A number of events, at the micro and macro economic levels, are forcing attention back to the issue of productive use of existing IT resources. The trouble with the aforementioned quest is that it ultimately unveils the global IT affliction known as: heterogeneous data silos, and the challenges of pain alleviation, that have been ignored forever or approached inadequately as clearly shown by the rapid build up of SOA horror stories in the data integration realm.
Data Integration via conceptualization of heterogenous data sources, that result in concrete conceptual layer data access and management, remains the greatest and most potent application of technologies associated with the "Semantic Web" and/or "Linked Data" monikers.
The significant problems we face cannot be solved at the same level of thinking we were at when we created them.
This quote also applies to the current global financial mess because the essence of this crisis remains inextricably linked to dependency on outdated "closed world" systems.
We have a global human network that depends on systems driven by, and confined to, data silos! Every time you hear a CEO, Government Official, work colleague, neighbor, sibling, or relative tell you they didn't see it coming, just remember:
There won't be a depression because we can't afford one. Just like we couldn't afford to continue with the manner in which our systems work today. Unlike the '30s, we all know that there are no absolute safe havens right now, we have enough information at our disposal to eventually understand (post panic) that stuffing the mattress isn't an option (even government bonds won't cut it, ditto money market accounts).
Take a deep breadth and tell traditional media to "shut up". As per usual, the traditional mass media wants to have it both ways by stoking the panic and maxing out on the frenzy with reckless abandon (as per usual). If there is a time to appreciate the blogosphere and quality journalism etc.. It's now.
Anyway, as the saying goes: "It's always darkest before dawn", and as bizarre as this may sound in some quarters, things will ultimately change for the better. It just so happened that a really big cane was required in order for us to change our dysfunctional ways :-(
I recently wrote a post about "zero based cognition" that sought to bring attention to the power of "Human Thought" in relation to value creation.
Innovative creation and dissemination of value is how we will eventually get out of the current mess (as we've done in the past). The predictability of the aforementioned reality is significantly increased by the sheer link density and resulting "network effects" potential of the Internet and World Wide Web. Our ability to "connect the dots" as part of our value creation, dissemination, and consumption processing pipelines is what will ultimately separate the winners from the losers (individuals, enterprises, nations).
In typical style, Henry walks you through his point of view using simple but powerful illustrations. Here is a key statement in his post that really struck me:
"In order to be able to have a mental theory one needs to be able to understand that other people may have a different view of the world. On a narrow three dimensional understanding of 'view', this reveals itself in that people at different locations in a room will see different things. One person may be able to see a cat behind a tree that will be hidden to another. In some sense though these two views can easily be merged into a coherent description."
Opaque Web pages (e.g., generated by Semantic Technology inside offerings that will not expose or share data entity URIs), irrespective of how smart the underlying page generation and visualization technology may be, a fundamentally autistic and counter intuitive as we move toward a Web of Linked Data.
Preoccupation with the "V" aspect of the M-V-C trinity is inadvertently compounding and the problem of digital autism on the Web. Unbeknownst to the purveyors of data silos and proprietary service lock-in, digital autism on the Web ultimately implies Web business model autism.
]]>What strikes me the most, is how sharing his findings act as serendipitous connectors to related insights and points of view, that ultimately create deeper shared knowledge about the core subject matter, courtesy of the Web hosted Blogosphere.
Jana: What are the benefits you see to the business community in adopting semantic technology?
Me: Exposure, exploitation, of untapped treasure trove of interlinked data, information, and knowledge across disparate IT infrastructure via conceptual entry points (Entity IDs / URIs / Data Source Names) that refer to as "Context Lenses".
Jana: Do you think these benefits are great enough for businesses to adopt the changes?
Me: Yes, infrastructural heterogeneity is a fact of corporate life (growth, mergers, acquisitions etc). Any technology that addresses these challenges is extremely important and valuable. Put differently, the opportunity costs associated with IT infrastructural heterogeneity remains high!
Jana: How large do you think this impact will actually be?
Me: Huge, enterprise have been aware of their data, information, and knowledge treasure troves etc. for eons. Tapping into these via a materialization of the "information at your fingertips" vision is something they've simply been waiting to pursue without any platform lock-in, for as long as I've been in this industry.
Jana: I’ve heard, from contacts in the Bay Area, that they are skeptical of how large this impact of semantic technology will actually be on the web itself, but that the best uses of the technology are for fields such as medical information, or as you mentioned, geo-spatial data.
Me: Unfortunately, those people aren't connecting the Semantic Web and open access to heterogeneous data sources, or the intrinsic value of holistic exploration location of entity based data networks (aka Linked Data).
Jana: Are semantic technologies going to be part of the web because of people championing the cause or because it is actually a necessary step?
Me: Linked Data technology on the Web is a vital extension of the current Web. Semantic Technology without the "Web" component, or what I refer to as "Semantics Inside only" solutions, simply offer little or no value as Web enhancements based on their incongruence with the essence of the Web i.e., "Open Linkage" and no Silos! A nice looking Silo is still a Silo.
Jana: In the early days of the web, there was an explosion of new websites, due to the ease of learning HTML, from a business to a person to some crackpot talking about aliens. Even today, CSS and XHTML are not so difficult to learn that a determined person can’t learn them from W3C or other tutorials easily. If OWL becomes the norm for websites, what do you think the effects will be on the web? Do you think it is easy enough to learn that it will be readily adopted as part of the standard toolkit for web developers for businesses?
Me: Correction, learning HTML had nothing to do with the Web's success. The value proposition of the Web simply reached critical mass and you simply couldn't afford to not be part of it. The easiest route to joining the Web juggernaut was a Web Page hosted on a Web Site. The question right now is: what's the equivalent driver for the Linked Data Web bearing in mind the initial Web bootstrap. My answer is simply this: Open Data Access i.e., getting beyond the data silos that have inadvertently emerged from Web 2.0.
Jana: Following the same theme, do you think this will lead to an internet full of corporate-controlled websites, with sites only written by developers rather than individuals?
Me: Not at all, we will have an Internet owned by it's participants i.e., You and the agents that work on your behalf.
Jana: So, you are imagining technologies such as Drupal or Wordpress, that allow users to manage sites without a great deal of knowledge of the nuts and bolts of current web technologies?
Me: Not at all! I envisage simple forms that provide conduits to powerful meshes of interlinked data spaces associated with Web users.
Jana: Given all of the buzz, and my own familiarity with ontology, I am just very curious if the semantic web is truly necessary?
Me:This question is no different than saying: I hear the Web is becoming a Database, and I wonder if a Data Dictionary is necessary, or even if access to structured data is necessary. It's also akin to saying: I accept "Search" as my only mechanism for Web interaction even though in reality, I really want to be able to "Find" and "Process" relevant things at a quicker rate than I do today, relative to the amount of information, and information processing time, at my disposal.
Jana: Will it be worth it to most people to go away from the web in its current form, with keyword searches on sites like Google, to a richer and more interconnected internet with potentially better search technology?
Me: As stated above, we need to add "Find" to the portfolio of functions we seek to perform against the Web. "Finding" and "Searching" are mutually inclusive pursuits at different ends of an activity spectrum.
Jana: For our more technical readers, I have a few additional questions: If no standardization comes about for mapping relational databases to domain ontologies, how do you see that as influencing the decisions about adoption of semantic technology by businesses? After all, the success of technology often lives or dies on its ease of adoption.
Me: Standardization of RDBMS to RDF Mapping is not the critical success factor here (of course it would be nice). As stated earlier, the issue of data integration that arises from IT infrastructural heterogeneity has been with decision makers in the enterprise for ever. The problem is now seeping into the broader consumer realm via Web ubiquity. The mistakes made in the enterprise realm are now playing out in the consumer Web realm. In both realms the critical success factors are:
The LinqToRdf project is about binding LINQ to RDF. It sits atop Joshua Tauberer's C# based Semantic Web/RDF library which has been out there for a while and works across Microsoft .NET and it's open source variant "Mono".
Historically, the Semantic Web realm has been dominated by RDF frameworks such as Sesame, Jena and Redland; which by their Open Source orientation, predominantly favor non-Windows platforms (Java and Linux). Conversely, Microsoft's .NET frameworks have sought to offer Conceptualization technology for heterogeneous Logical Data Sources via .NET's Entity Frameworks and ADO.NET, but without any actual bindings to RDF.
Interestingly, believe it or not, .NET already has a data query language that shares a number of similarities with SPARQL, called Entity-SQL, and a very innovative programming language called LINQ; that offers a blend of constructs for natural data access and manipulation across relational (SQL), hierarchical (XML), and graph (Object) models without the traditional object language->database impedance tensions of the past.
With regards to all of the above, we've just released a mini white paper that covers the exploitation of RDF-based Linked Data using .NET via LINQ. The paper offers a an overview of LinqToRdf, plus enhancements we've contributed to the project (available in LinqToRdf v0.8.). The paper includes real-world examples that tap into a MusicBrainz powered Linked Data Space, the Music Ontology, the Virtuoso RDF Quad Store, Virtuoso Sponger Middleware, and our RDfization Cartridges for Musicbrainz.
Enjoy!]]>The Web Universal Plug and Play (WUPnP) Cheatsheet:
Essentially, if you build an application and use the technologies suggested in the ‘glue section’ then your web application/service (whether it’s front-end or back-end) will fit into many many other web applications/services… and therefore also more manageable for the future! This is WUPnP.
Key technologies for making your services/applications as sticky as possible:
Web-based plug and play fun!
"(Via Daniel Lewis.)
]]>When the DBpedia & Yago integration took place last year (around WWW2007, Banff) there was a little, but costly omission that occurred: nobody sought to load the Yago Class Hierarchy into the Virtuoso's Inference Engine :-(
Anyway, the Class Hierarchy has now been loaded into the Virtuoso's inference engine (as Virtuoso Inference Rules) and the following queries are now feasible using the live Virtuoso based DBpedia instance hosted by OpenLink Software:
-- Find all Fiction Books associated with a property "dbpedia:name" that has literal value: "The Lord of the Rings" .
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
-- Variant of query with Virtuoso's Full Text Index extension via the bif:contains function/magic predicate
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
SELECT DISTINCT ?s ?n
FROM < xmlns="http" dbpedia.org="dbpedia.org">//dbpedia.org>
WHERE {
?s a yago:Fiction106367107 .
?s dbpedia:name ?n .
?n bif:contains 'Lord and Rings'
}
-- Retrieve all individuals instances of Fiction Class which should include all Books.
DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia: <http://dbpedia.org/property>
PREFIX yago: <http://dbpedia.org/class/yago>
Note: you can also move the inference pragmas to the Virtuoso Sever side i.e place the inference rules in a server instance config file, thereby negating the need to place "define input:inference 'http://dbpedia.org/resource/inference/rules/yago#'" pragmas directly in your SPARQL queries.
1995 (and the early 90’s) must have been a visionaries time of dreaming… most of their dreams are happening today.
Watch Steve Jobs (then of NeXT) discuss what he thinks will be popular in 1996 and beyond at OpenStep Days 1995:
Heres a spoiler:
The thing that OpenStep propose is:
What Steve was suggesting was one of the beginnings of the Data Web! Yep, Portable Distributed Objects and Enterprise Objects Framework was one of the influences of the Semantic Web / Linked Data Web…. not surprising as Tim Berners-Lee designed the initial web stack on a NeXT computer!
I’m going to spend a little time this evening figuring out how much ‘distributed objects’ stuff has been taken from the OpenStep stuff into the Objective-C + Cocoa environment. (<- I guess I must be quite geeky ;-))
"(Via Daniel Lewis.)
]]>Unfortunately, the cost of completing ZDNet's unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I'll settle for a trackback ping instead.
What follows is the cut and paste of my intended comment contributions to Paul's post.
Paul,
As discussed earlier this week during our podcast session, commercialization of Semantic Web technology shouldn't be a mercurial matter at this stage in the game :-) It's all about looking at how it provides value :-)
From the Linked Data angle, the ability to produce, dispatch, and exploit "Context" across an array of "Perspectives" from a plethora of disparate data sources on the Web and/or behind corporate firewalls, offers immense commercial value.
Yahoo's Searchmonkey effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as "value consumption tickets" (Data Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to data encountered on the Web. Yahoo! is about to put this light on in a big way (imho).
The "self annotating" nature of the Web is what ultimately drives the manifestation of the long awaited Semantic Web. I believe I postulated about "Self Annotation & the Semantic Web" in a number of prior posts which, by the way, should be DataRSS compatible right now due to Yahoo's support of OpenSearch Data Providers (which this Blog Space has been for eons).
Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, Tag, Weblog, Shared Bookmark, Wikiword, Microformat, Microformat++ (eRDF or RDFa), GRDDL stylesheet, and RDFizer etc.. is a piece of structured data.
Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)
Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (information resource), and then you add Structure to the information resource (RSS, Atom, microformats, RDFa, eRDF, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to Linked Data) is a synch thanks to RDF Middleware (as per earlier RDF middleware posts).
]]>Unfortunately, the cost of completing ZDNet's unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I'll settle for a trackback ping instead.
What follows is the cut and paste of my intended comment contributions to Paul's post.
Paul,
As discussed earlier this week during our podcast session, commercialization of Semantic Web technology shouldn't be a mercurial matter at this stage in the game :-) It's all about looking at how it provides value :-)
From the Linked Data angle, the ability to produce, dispatch, and exploit "Context" across an array of "Perspectives" from a plethora of disparate data sources on the Web and/or behind corporate firewalls, offers immense commercial value.
Yahoo's Searchmonkey effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as "value consumption tickets" (Data Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to data encountered on the Web. Yahoo! is about to put this light on in a big way (imho).
The "self annotating" nature of the Web is what ultimately drives the manifestation of the long awaited Semantic Web. I believe I postulated about "Self Annotation & the Semantic Web" in a number of prior posts which, by the way, should be DataRSS compatible right now due to Yahoo's support of OpenSearch Data Providers (which this Blog Space has been for eons).
Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, Tag, Weblog, Shared Bookmark, Wikiword, Microformat, Microformat++ (eRDF or RDFa), GRDDL stylesheet, and RDFizer etc.. is a piece of structured data.
Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)
Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (information resource), and then you add Structure to the information resource (RSS, Atom, microformats, RDFa, eRDF, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to Linked Data) is a synch thanks to RDF Middleware (as per earlier RDF middleware posts).
]]>ReadWriteWeb via Alex Iskold's post have delivered another iteration of their "Guide to Semantic Technologies".
If you look at the title of this post (and their article) they seem to be accurately providing a guide to Semantic Technologies, so no qualms there. If on the other hand, this is supposed to he a guide to the "Semantic Web" as prescribed by TimBL then they are completely missing the essence of the whole subject, and demonstrably so I may add, since the entities: "ReadWriteWeb" and "Alex Iskold" are only describable today via the attributes of the documents they publish i.e their respective blogs and hosted blog posts.
Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.
TimBL's vision as espoused via the "Semantic Web" vision is about the production, consumption, and sharing of Data Objects via HTTP based Identifiers called URIs/IRIs (Hyperdata Links / Linked Data). It's how we use the Web as a Distributed Database where (as Jim Hendler once stated with immense clarity): I can point to records (entity instances) in your database (aka Data Space) from mine. Which is to say that if we can all point to data entities/objects (not just data entities of type "Document") using these Location, Value, and Structure independent Object Identifiers (courtesy of HTTP) we end up with a much more powerful Web, and one that is closer to the "Federated and Open" nature of the Web.
As I stated in a prior post, if you or your platform of choice aren't producing de-referencable URIs for your data objects, you may be Semantic (this data model predates the Web), but there is no "World Wide Web" in what you are doing.
I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.
As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.
Bottom-line, the Semantic Web is about adding the missing "Open Data Access & Connectivity" feature to the current Document Web (we have to beyond regex, grep, xpath, xquery, full text search, and other literal scrapping approaches). The Linked Data Web of de-referencable data object URIs is the critical foundation layer that makes this feasible.
Remember, It's not about "Applications" it's about Data and actually freeing Data from the "tyranny of Applications". Unfortunately, application inadvertently always create silos (esp. on the Web) since entity data modeling, open data access, and other database technology realm matters, remain of secondary interest to many application developers.
Final comment, RDF facilitates Linked Data on the Web, but all RDF isn't endowed with de-referencable URIs (a major source of confusion and misunderstanding). Thus, you can have RDF Data Source Providers that simply project RDF data silos via Web Services APIs if RDF output emanating from a Web Service doesn't provide out-bound pathways to other data via de-referencable URIs. Of course the same also applies to Widgets that present you with all the things they've discovered without exposing de-referencable URIs for each item.
BTW - my final comments above aren't in anyway incongruent with devising successful business models for the Web. As you may or may not know, OpenLink is not only a major platform provider for the Semantic Web (expressed in our UDA, Virtuoso, OpenLink Data Spaces, and OAT products), we are also actively seeding Semantic Web (tribe: Linked Data of course) startups. For instance, Zitgist, which now has Mike Bergman as it's CEO alongside Frederick Giasson as CTO. Of course, I cannot do Zitgist justice via a footnote in a blog post, so I will expand further in a separate post.
If you look at the title of this post (and their article) they seem to be accurately providing a guide to Semantic Technologies, so no qualms there. If on the other hand, this is supposed to he a guide to the "Semantic Web" as prescribed by TimBL then they are completely missing the essence of the whole subject, and demonstrably so I may add, since the entities: "ReadWriteWeb" and "Alex Iskold" are only describable today via the attributes of the documents they publish i.e their respective blogs and hosted blog posts.
Preoccupation with Literal objects as describe above, implies we can only take what "ReadWriteWeb" and "Alex Iskold" say "Literally" (grep, regex, and XPath/Xquery are the only tools for searching deeper in this Literal realm), we have no sense of what makes them tick or where they come from, no history (bar "About Page" blurb), no data connections beyond anchored text (more pointers to opaque data sources) in post and blogrolls. The only connection between this post and them is the my deliberate use of the same literal text in the Title of this post.
TimBL's vision as espoused via the "Semantic Web" vision is about the production, consumption, and sharing of Data Objects via HTTP based Identifiers called URIs/IRIs (Hyperdata Links / Linked Data). It's how we use the Web as a Distributed Database where (as Jim Hendler once stated with immense clarity): I can point to records (entity instances) in your database (aka Data Space) from mine. Which is to say that if we can all point to data entities/objects (not just data entities of type "Document") using these Location, Value, and Structure independent Object Identifiers (courtesy of HTTP) we end up with a much more powerful Web, and one that is closer to the "Federated and Open" nature of the Web.
As I stated in a prior post, if you or your platform of choice aren't producing de-referencable URIs for your data objects, you may be Semantic (this data model predates the Web), but there is no "World Wide Web" in what you are doing.
I am a Kingsley Idehen, a Person who authors this weblog. I also share bookmarks gathered over the years across an array of subjects via my bookmark data space. I also subscribe to a number of RSS/Atom/RDF feeds, which I share via my feeds subscription data space. Of course, all of these data sources have Tags which are collectively exposed via my weblog tag-cloud, feeds subscriptions tag-cloud, and bookmarks tag-cloud data spaces.
As I don't like repeating myself, and I hate wasting my time or the time of others, I simply share my Data Space (a collection of all of my purpose specific data spaces) via the Web so that others (friends, family, employees, partners, customers, project collaborators, competitors, co-opetitors etc.) can can intentionally or serendipitously discover relevant data en route to creating new information (perspectives) that is hopefully exposed others via the Web.
Bottom-line, the Semantic Web is about adding the missing "Open Data Access & Connectivity" feature to the current Document Web (we have to beyond regex, grep, xpath, xquery, full text search, and other literal scrapping approaches). The Linked Data Web of de-referencable data object URIs is the critical foundation layer that makes this feasible.
Remember, It's not about "Applications" it's about Data and actually freeing Data from the "tyranny of Applications". Unfortunately, application inadvertently always create silos (esp. on the Web) since entity data modeling, open data access, and other database technology realm matters, remain of secondary interest to many application developers.
Final comment, RDF facilitates Linked Data on the Web, but all RDF isn't endowed with de-referencable URIs (a major source of confusion and misunderstanding). Thus, you can have RDF Data Source Providers that simply project RDF data silos via Web Services APIs if RDF output emanating from a Web Service doesn't provide out-bound pathways to other data via de-referencable URIs. Of course the same also applies to Widgets that present you with all the things they've discovered without exposing de-referencable URIs for each item.
BTW - my final comments above aren't in anyway incongruent with devising successful business models for the Web. As you may or may not know, OpenLink is not only a major platform provider for the Semantic Web (expressed in our UDA, Virtuoso, OpenLink Data Spaces, and OAT products), we are also actively seeding Semantic Web (tribe: Linked Data of course) startups. For instance, Zitgist, which now has Mike Bergman as it's CEO alongside Frederick Giasson as CTO. Of course, I cannot do Zitgist justice via a footnote in a blog post, so I will expand further in a separate post.
Yes, integration is hard, but I do profoundly believe that what's been happening on the Web over the last 10 or so years also applies to the Enterprise, and by this I absolutely do not mean "Enterprise 2.0" since "2.0" and productive agility do not compute in my realm of discourse.
large collections of RSS feeds, Wikiwords, Shared Bookmarks, Discussion Forums etc.. when disconnected at the data level (i.e. hosted in pages with no access to the "data behind") simply offer information deluge and inertia (there are only so many hours for processing opaque information sources in a given day).
Enterprises fundamentally need to process information efficiently as part of a perpetual assessment of their relative competitive Strengths, Weaknesses, Opportunities, and Threats (SWOT), in existing and/or future markets. Historically, IT acquisitions have run counter intuitively to the aforementioned quest for "Ability" due to the predominance of "rip and replace" approach technology acquisition that repeatedly creates and perpetuates information silos across Application, Database, Operating System, Development Environment boundaries. The sequence of events typically occurs as follows:
In the early to mid 90's (pre ubiquitous Web), operating system, programming language, operating system, and development framework independence inside the enterprise was technically achievable via ODBC (due to it's platform independence). That said, DBMS specific ODBC channels alone couldn't address the holistic requirements associated with Conceptual Views of disparate data sources, hence the need for Data Access Virtualization via Virtual Database Engine technology.
Just as is the case on the Web today, with the emergence of the "Linked Data" meme, enterprises now have a powerful mechanism for exploiting the Data Integration benefits associated with generating Data Objects from disparate data sources, endowed with HTTP based IDs (URIs).
Conceptualizing access to data exposed Databases APIs, SOA based Web Services (SOAP style Web Services), Web 2.0 APIs (REST style Web Services), XML Views of SQL Data (SQLX), pure XML etc.. is problem area addressed by RDF aware middleware (RDFizers e.g Virtuoso Sponger).
Here are examples of what SQL Rows exposed as RDF Data Objects (identified using HTTP based URIs) would look like outside or behind a corporate firewall:
What's Good for the Web Goose (Personal Data Space URIs) is good for the Enterprise Gander (Enterprise Data Space URIs).
Daniel Lewis also penned an interesting post in response to Ian's, that actually triggered this post.
I think definition time has long expired re. the Web's many interaction dimensions, evolutionary stages, and versions.
On my watch it's simply demo / dog-food time. Or as Dan Brickley states: Just Show It.
Below, I've created a tabulated view of the various lanes on the Web's Information Super Highway. Of course, this is a Linked Data demo should you be interested in the universe of data exposed via the links embedded in this post :-)
1.0 |
2.0 |
3.0 |
|
Desire |
Information Creation & Retrieval |
Information Creation, Retrieval, and Extraction |
Distillation of Data from Information |
Information Linkage (Hypertext) |
Information Mashing (Mash-ups) |
Linked Data Meshing (Hyperdata) |
|
Enabling Protocol |
HTTP |
HTTP |
|
Markup |
|||
Basic Data Unit | Resource (Data Object) of type "Document" |
Resource (Data Object) of type "Document" |
Resource (Data Object) that may be one of a variety of Types: Person, Place, Event, Music etc. |
Basic Data Unit Identity |
Resource URL (Web Data Object Address)
|
Resource URL (Web Data Object Address)
|
Unique Identifier (URI) that is indepenent of actual Resource (Web Data Object) Address. Note: An Identifier by itself has no utility beyond Identifying a place around which actual data may be clustered.
|
Query or Search |
Full Text Search patterns |
Full Text Search patterns |
Structured Querying via SPARQL |
Deployment |
Web Server (Document Server) |
Web Server + Web Services Deployment modules |
Web Server + Linked Data Deployment modules (Data Server) |
Auto-discovery |
<link rel="alternate"..> |
<link rel="alternate"..> |
<link rel="alternate" | "meta"..>, basic and/or transparent content negotiation |
Target User | Humans |
Humans & Text extraction and manipulation oriented agents (Scrappers) |
Agents with varying degrees of data processing intelligence and capacity |
Serendipitous Discovery Quotient (SDQ) | Low | Low | High |
Pain |
Information Opacity |
Information Silos |
Data Graph Navigability (Quality) |
OpenLink Data Spaces (ODS) now officially supports:
- Attention Profiling Markup Language (APML).
- Meaning of a Tag (MOAT) in conjunction with Simple Knowledge Organisation System (SKOS) and Social-Semantic Cloud of Tags (SCOT).
- OAuth - an Open Authentication Protocol
Which means that OpenLink Data Spaces support all of the main standards being discussed in the DataPortability Interest Group!
APML Example:
All users of ODS automatically get a dynamically created APML file, for example: APML profile for Kingsley Idehen
The URI for an APML profile is: http://myopenlink.net/dataspace/<ods-username>/apml.xml
Meaning of a Tag Example:
All users of ODS automatically have tag cloud information embedded inside their SIOC file, for example: SIOC for Kingsley Idehen on the Myopenlink.net installation of ODS.
But even better, MOAT has been implemented in the ODS Tagging System. This has been demonstrated in a recent test blog post by my colleague Mitko Iliev, the blog post comes up on the tag search: http://myopenlink.net/dataspace/imitko/weblog/Mitko%27s%20Weblog/tag/paris
Which can be put through the OpenLink Data Browser:
OAuth Example:
OAuth Tokens and Secrets can be created for any ODS application. To do this:
- you can log in to MyOpenlink.net beta service, the Live Demo ODS installation, an EC2 instance, or your local installation
- then go to ‘Settings’
- and then you will see ‘OAuth Keys’
- you will then be able to choose the applications that you have instantiated and generate the token and secret for that app.
Related Document (Human) Links
- OpenLink Data Spaces Official Page
- OpenLink Software Page
- OpenLink Data Spaces Wikipedia Page
- Attention Profiling Markup Language Project Website
- Meaning of a Tag Project Website
- Simple Knowledge Organisation Systems Project Website
- Social-Semantic Cloud of Tags Project Website
- OAuth Protocol Website
- DataPortability.org Website
- Semantically Interlinked Online Communities Project Website
Remember (as per my most recent post about ODS), ODS is about unobtrusive fusion of Web 1.0, 2.0, and 3.0+ usage and interaction patterns. Thanks to a lot of recent standardization in the Semantic Web realm (e.g SPARQL), we are now employ the MOAT, SKOS, and SCOT ontologies as vehicles for Structured Tagging.
This is how we take a key Web 2.0 feature (think 2D in a sense), bend it over, to create a Linked Data Web (Web 3.0) experience unobtrusively (see earlier posts re. Dimensions of Web). Thus, nobody has to change how they tag or where they tag, just expose ODS to the URLs of your Web 2.0 tagged content and it will produce URIs (Structured Data Object Identifiers) and a lnked data graph for your Tags Data Space (nee. Tag Cloud). ODS will construct a graph which exposes tag subject association, tag concept alignment / intended meaning, and tag frequencies, that ultimately deliver "relative disambiguation" of intended Tag Meaning (i.e. you can easily discern the taggers meaning via the Tags actual Data Space which is associated with the tagger). In a nutshell, the dynamics of relevance matching, ranking, and the like, change immensely without futile timeless debates about matters such as:
We can just get on with demonstrating Linked Data value using what exists on the Web today. This is the approach we are deliberately taking with ODS.
Tip: This post is best viewed via an RDF aware User Agent (e.g. a Browser or Data Viewer). I say this because the permalink of this post is a URI in a Linked Data Space (My Blog) comprised of more data than meets the eye (i.e. what you see when you read this post via a Document Web Browser) :-)
]]>There are quite a few reasons to use OpenLink Data Spaces (ODS). Here are 10 of the reasons why I use ODS:
- Its native support of DataPortability Recommendations such as RSS, Atom, APML, Yadis, OPML, Microformats, FOAF, SIOC, OpenID and OAuth.
- Its native support of Semantic Web Technologies such as: RDF and SPARQL/SPARUL for querying.
- Everything in ODS is an Object with its own URI, this is due to the underlying Object-Relational Architecture provided by Virtuoso.
- It has all the social media components that you could need, including: blogs, wikis, social networks, feed readers, CRM and a calendar.
- It is expandable by installing pre-configured components (called VADs), or by re-configuring a LAMP application to use Virtuoso. Some examples of current VADs include: MediaWiki, Wordpress and Drupal.
- It works with external webservices such as: Facebook, del.icio.us and Flickr.
- Everything within OpenLink Data Spaces is Linked Data, which provides more meaningful information than just plain structural information. This meaningful information could be used for complex inferencing systems, as ODS can be seen as a Knowledge Base.
- ODS builds bridges between the existing static-document based web (aka ‘Web 1.0‘), the more dynamic, services-oriented, social and/or user-orientated webs (aka ‘Web 2.0‘) and the web which we are just going into, which is more data-orientated (aka ‘Web 3.0’ or ‘Linked Data Web’).
- It is fully supportive of Cloud Computing, and can be installed on Amazon EC2.
- Its released free under the GNU General Public License (GPL). [note]However, it is technically dual licensed as it lays on top of the Virtuoso Universal Server which has both Commercial and GPL licensing[/note]
The features above collectively provide users with a Linked Data Junction Box that may reside with corporate intranets or "out in the clouds" (Internet). You can consume, share, and publish data in a myriad of formats using a plethora of protocols, without any programming. ODS is simply about exposing the data from your Web 1.0, 2.0, 3.0 application interactions in structured from, with Linking, Sharing, and ultimately Meshing (not Mashing) in mind.
Note: Although ODS is equipped with a broad array of Web 2.0 style Applications, you do not need to use native ODS apps in order to exploit it's power. It binds to anything that supports the relevant protocols and data formats.
]]>Jason recently moved to Massachusetts which lead to me pinging him about our earlier blogosphere encounter and the emergence of a Data Portability Community. I also informed him about the fact that TimBL, myself, and a number of other Semantic Web technology enthusiasts, frequently meet on the 2nd Tuesday of each month at the MIT hosted Cambridge Semantic Web Gatherings, to discuss, demonstrate, debate all aspects of the Semantic Web. Luckily (for both of us), Jason attended the last event, and we got to meet each other in person.
Following our face to face meeting in Cambridge, a number of follow-on conversations ensued covering, Linked Data and practical applications of the Semantic Web vision. Jason writes about our exchanges a recent post titled: The Semantic Web. His passion for Data Portability enabled me to use OpenID and FOAF integration to connect the Semantic Web and Data Portability via the Linked Data concept.
During our conversations, Jason also eluded to the fact that he had already encountered OpenLink Software while working with our ODBC Drivers (part of or UDA product family) for IBM Informix (Single-Tier or Multi-Tier Editions) a few years ago (interesting random connection).
As I've stated in the past, I've always felt that the Semantic Web vision will materialize by way of a global epiphany. The count down to this inevitable event started at the birth of the blogosphere, ironically. And accelerated more recently, through the emergence of Web 2.0 and Social Networking, even more ironically :-)
The blogosphere started the process of Data Space coalescence via RSS/Atom based semi-strucutured data enclaves, Web 2.0 RDFpropagated Web Service usage en route to creating service provider controlled, data and information silosRDF, Social NetworkingRDF brought attention to the fact that User Generated Data wasn't actually owned or controlled by the Data Creators etc.
The emergence of "Data Portability" has created a palatable moniker for a clearly defined, and slightly easier to understand, problem: the meshing of Data and Identity in cyberspace i.e. individual points of presence in cyberspace, in the form of "Personal Data Spaces in the Clouds" (think: doing really powerful stuff with .name domains). In a sense, this is the critical inflection point between the document centric "Web of Linked Documents" and the data centric "Web or Linked Data". There is absolutely no other way solve this problem in a manner that alleviates the imminent challenges presented by information overload -- resulting from the exponential growth of user generated data across the Internet and enterprise Intranets.
]]>Here goes:
In addition, in one week, courtesy of the Web, UK Semnantic Web Gatherings in Bristol and Oxford, I discover, interview, and employ Daniel :-) Imagine how long this would have taken to pull off via the Document Web, assuming I would even discover Daniel.
As with all things these days, the Web and Internet change everything, which includes talent discovery and recruitment.
A Global Social graph that is a mesh of Linked Data enables the process of recruitment, marketing, and other elements of busines management to be condensed down to a sending powerful beams across the aforementioned Graph :-) The only variable pieces are the traversal paths exposed to your beam via the beam's entry point URI. In my case, I have a single URI that exposes a Graph of critical paths for the Blogosphere (i.e data spaces of RSS Atom Feeds). Thus, I can discover if your profile matches the requirements associated with an opening at OpenLink Software (most of the time) before you do :-)
BTW - I just noticed that John Breslin described ODS as social-graph++ in his recent post, titled: Tales from the SIOC-o-sphere, part 6. In a funny way, this reminds of a post from the early blogosphere days about platforms and Weblog APIs (circa. 2003) about ODS (then exposed via the Blog Platform realm of Virtuoso).
]]>"The phrase Open Social implies portability of personal and social data. That would be exciting but there are entirely different protocols underway to deal with those ideas. As some people have told me tonight, it may have been more accurate to call this "OpenWidget" - though the press wouldn't have been as good. We've been waiting for data and identity portability - is this all we get?"
[Source: Read/Write Web's Commentary & Analysis of Google's OpenSocial API]
..Perhaps the world will read the terms of use of the API, and realize this is not an open API; this is a free API, owned and controlled by one company only: Google. Hopefully, the world will remember another time when Google offered a free API and then pulled it. Maybe the world will also take a deeper look and realize that the functionality is dependent on Google hosted technology, which has its own terms of service (including adding ads at the discretion of Google), and that building an OpenSocial application ties Google into your application, and Google into every social networking site that buys into the Dream. Hopefully the world will remember. Unlikely, though, as such memories are typically filtered in the Great Noise....
[Source: Poignant commentary excerpt from Shelly Power's Blog (as always)]
The "Semantic Data Web" vision has always been about "Data & Identity" portability across the Web. Its been that and more from day one.
In a nutshell, we continue to exhibit varying degrees of Cognitive Dissonance re the following realities:
The Data Web is about Presence over Eyeballs due to the following realities:
This is why we need to inject a mesh of Linked Data into the existing Web. This is what the often misunderstood vision of the "Semantic Data Web" or "Web of Data" or "Web or Structured Data" is all about.
As stated earlier (point 10 above), "Data is forever" and there is only more of it to come! Sociality and associated Social Networking oriented solutions are at best a spec in the Web's ocean of data once you comprehend this reality.
Note: I am writing this post as an early implementor of GData and an implementor of RDF Linked Data technology and a "Web Purist".
OpenSocial implementation and support across our relevant product families: Virtuoso (i.e the Sponger Middleware for RDF component), OpenLink Data Spaces (Data Space Controller / Services), and the OpenLink Ajaxt Toolkit (i.e OAT Widgets and Libraries), is a triviality now that the OpenSocial APIs are public.
The concern I have, and the problem that remains mangled in the vast realms of Web Architecture incomprehension, is the fact that GData and GData based APIs cannot deliver Structured Linked Data in line with the essence of the Web without introducing "lock-in" that ultimately compromises the "Open Purity" of the Web. Facebook and Google's OpenSocial response to the Facebook juggernaut (i.e. open variant of the Facebook Activity Dashboard and Social Network functionality realms, primarily), are at best icebergs in the ocean we know as the "World Wide Web". The nice and predictable thing about icebergs is that they ultimately melt into the larger ocean :-)
On a related note, I had the pleasure of attending the W3C's RDF and DBMS Integration Workshop, last week. The event was well attended by organizations with knowledge, experience, and a vested interested in addressing the issues associated with exposing none RDF data (e.g. SQL) as RDF, and the imminence of data and/or information overload covered in different ways via the following presentations:Jon Udell recently penned a post titled: The Fourth Platform. The post arrives at a spookily coincidental time (this happens quite often between Jon and I as demonstrated last year during our podcast; the "Fourth" in his Innovators Podcast series).
The platform that Jon describes is "Cloud Based" and comprised of Storage and Computation. I would like to add Data Access and Management (native and virtual) under the fourth platform banner with the end product called: "Cloud based Data Spaces".
As I write, we are releasing a Virtuoso AMI (Amazon Image) labeled: virtuoso-dataspace-server. This edition of Virtuoso includes the OpenLink Data Spaces Layer and all of the OAT applications we've been developing for a while.
There's more to come!
]]>First off, I am going to focus on the Semantic Data Web aspect of the overall Semantic Web vision (a continuum) as this is what we have now. I am also writing this post as a deliberate contribution to the discourse swirling around the real topic: Semantic Web Value Proposition.
We are in the early stages of the long anticipated Knowledge Economy. That being the case, it would be safe to assume that information access, processing, and dissemination are of utmost importance to individuals and organizations alike. You don't produce knowledge in a vacum! Likewise, you can produce Information in a vacum, you need Data.
Increasingly, Blogs, Wikis, Shared Bookmarks, Photo Galleries, Discussion Forums, Shared Calendars and the like, have become invaluable tools for individual and organizational participation in Web enabled global discourse (where a lot of knowledge is discovered). These tools, are typically associated with Web 2.0, implying Read-Write access via Web Services, centralized application hosting, and data lock-in (silos).
The reality expressed above is a recipe for "Information Overload" and complete annihilation of ones effective pursuit and exploitation of knowledge due "Time Scarcity" (note: disconnecting is not an option). Information abundance is inversely related to available processing time (for humans in particular). In my case for instance, I was actively subscribed to over 500+ RSS feeds in 2003. As of today, I've simply stopped counting, and that's just my Weblog Data Space. Then add to that, all of the Discussions I track across Blogs, wikis, message boards, mailing lists, traditional usnet discussion forumns, and the like, and I think you get the picture.
Beyond information overload, Web 2.0 data is "Semi-Structured" by way of it's dominant data containers ((X)HTML, RSS, Atom documents and data streams etc.) lacking semantics that formally expose individual data items as distinct entities, endowed with unambiguous naming / identification, descriptive attributes (a type of property/predicate), and relationships (a type of property/predicate).
Solution:Devise a standard for Structured Data Semantics that is compatible with the Web Information BUS.
Produce structured data (entities, entity types, entity relationships) from Web 1.0 and Web 2.0 resources that already exists on the Web such that individual entities, their attributes, and relationships are accessible and discernible to software agents (machines).
Once the entities are individually exposed, the next requirement is a mechanism for selective access to these entities i.e. a query language.
Semantic Data Web Technologies that facilitate the solution described above include:
Structured Data Standards:Use of URIs or IRIs for uniquely identifying physical (HTML Documents, Image Files, Multimedia Files etc..) and abstract (People, Places, Music, and other abstract things).
Entity Access & Querying:SPARQL Query Language - the SQL analog of the Semantic Data Web that enables query constructs that target named entities, entity attributes, and entity relationships
Organizations are rife with a plethora of business systems that are built atop a myriad of database engines, sourced from a variety of DBMS vendors. A typical organization would have a different database engine, from a specific DBMS vendor, underlying critical business applications such as: Human Resource Management (HR), Customer Relationship Management (CRM), Accounting, Supply Chain Management etc. In a nutshell, you have DBMS Engines, and DBMS Schema heterogeneity permeating the IT infrastructure of organizations on a global scale, making Data & Information Integration the biggest headache across all IT driven organizations.
Solution:Alleviation of the pain (costs) associated with Data & Information Integration.
Semantic Data Web offerings:A dexterous data model (RDF) that enables the construction of conceptual views of disparate data sources across an organization based on existing web architecture components such as HTTP and URIs.
Existing middleware solutions that facilitate the exposure of SQL DBMS data as RDF based Structured Data include:
BTW - There is an upcoming W3C Workshop covering the integration of SQL and RDF data.
The Semantic Data Web is here, it's value delivery vehicle is the URI. The URI is a conduit to Interlinked Structured Data (RDF based Linked Data) derived from existing data sources on the World Wide Web alongside data continuously injected into the Web by organizations world wide. Ironically, the Semantic Data Web only platform that crystallizes the: Information at Your Fingertips vision, without development environment, operating system, application, or database lock-in. You simply click on a Linked Data URI and the serendipitous exploration and discovery of data commences.
The unobtrusive emergence of the Semantic Data Web is a reflection of the soundness of the underlying Semantic Web vision.
If you are excited about Mash-ups then your are a Semantic Web enthusiast and benefactor in the making, because you only "Mash" (brute force data extraction and interlinking) because you can't "Mesh" (natural data extraction and interlinking). Likewise, if you are a social-networking, open social-graph, or portable social-network enthusiast, then you are also a Semantic Data Web benefactor and enthusiasts, because your "values" (yes, the values associated with the properties that define you e.g your interests etc) are the fundamental basis for portable, open, social-networking, which is what the Semantic Data Web hands to you on a platter without compromise (i.e. data lock-in or loss of data ownership).
Some practical examples of Semantic Data Web prowess:On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS.
Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol.
The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers).
HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc).
Examples of Information Resource and Data Source URIs:
Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL).
Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following:
The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as an Information Resource. It's comprised of granular data items called "Entities", that expose fine grained properties values, individual and/or group characteristics (attributes), and relationships (associations) with other Entities.
The Web is in the final stages of the 3rd phase of it's evolution. A phase characterized by the shuttling of structured data payloads (RDF) alongside less data oriented payloads (HTML, XHTML, XML etc.). As you can see, Linked Data and Structured Data are both terms used to describe the addition of more data centric payloads to the Web. Thus, you could view the process of creating a Structured Web of Linked Data as follows:
The Semantic Data Web is an evolution of the current Web (an Information Space) that adds structured data payloads (RDF) to current, less data oriented, structured payloads (HTML, XHTML, XML, and others).
The Semantic Data Web is increasingly seen as an inevitability because it's rapidly reaching the point of critical mass (i.e. network effect kick-in). As a result, Data Web emphasis is moving away from: "What is the Semantic Data Web?" To: "How will Semantic Data Web make our globally interconnected village an even better place?", relative to the contributions accrued from the Web thus far. Remember, the initial "Document Web" (Web 1.0) bootstrapped because of the benefits it delivered to blurb-style content publishing (remember the term electronic brochure-ware?). Likewise, in the case of the "Services Web" (Web 2.0), the bootstrap occurred because it delivered platform independence to Web Application Developers - enabling them to expose application logic behind Web Services. It is my expectation that the Data Integration prowess of the Data Web will create a value exchange realm for data architects and other practitioners from the database and data access realms.
A vital component of the new Virtuoso release is the finalization of our SQL to RDF mapping functionality -- enabling the declarative mapping of SQL Data to RDF. Additional technical insight covering other new features (delivered and pending) is provided by Orri Erling, as part of a series of post-Banff posts.
A majority of the world's data (especially in the enterprise realm) resides in SQL Databases. In addition, Open Access to the data residing in said databases remains the biggest challenge to enterprises for the following reasons:
Enterprises have known from the beginning of modern corporate times that data access, discovery, and manipulation capabilities are inextricably linked to the "Real-time Enterprise" nirvana (hence my use of 0.0 before this becomes 3.0).
In my experience, as someone whose operated in the data access and data integration realms since the late '80s, I've painfully observed enterprises pursue, but unsuccessfully attain, full control over enterprise data (the prized asset of any organization) such that data-, information-, knowledge-workers are just a click away from commencing coherent platform and database independent data drill-downs and/or discovery that transcend intranet, internet, and extranet boundaries -- serendipitous interaction with relevant data, without compromise!
Okay, situation analysis done, we move on..
At our most recent (12th June) monthly Semantic Web Gathering, I unveiled to TimBL and a host of other attendees a simple, but powerful, demonstration of how Linked Data, as an aspect of the Semantic Data Web, can be applied to enterprise data integration challenges.
The vision of data, information, or knowledge at your fingertips is nigh! Thanks to the infrastructure provided by the Semantic Data Web (URIs, RDF Data Model, variety of RDF Serialization Formats[1][2][3], and Shared Data Dictionaries / Schemas / Ontologies [1][2][3][4][5]) it's now possible to Virtualize enterprise data from the Physical Storage Level, through the Logical Data Management Levels (Relational), up to a Concrete Conceptual Model (Graph) without operating system, development environment or framework, or database engine lock-in.
We produce a shared ontology for the CRM and Business Reporting Domains. I hope this experiment clarifies how this is quite achievable by converting XML Schemas to RDF Data Dictionaries (RDF Schemas or Ontologies). Stay tuned :-)
Also watch TimBL amplify and articulate Linked Data value in a recent interview.
To deliver a mechanism that facilitates the crystallization of this reality is a contribution of boundless magnitude (as we shall all see in due course). Thus, it is easy to understand why even "her majesty", the queen of England, simply had to get in on the act and appoint TimBL to the "British Order of Merit" :-)
Note: All of the demos above now work with IE & Safari (a "remember what Virtuoso is epiphany") by simply putting Virtuoso's DBMS hosted XSLT engine to use :-) This also applies to my earlier collection of demos from the Hello Data Web and other Data Web & Linked Data related demo style posts.
]]>Play Date: What is that thing on the Wall? My Son: Security Alarm Play Date: How does it work My Son: If you click on that top button and then open the door, I will have to enter a code when we come back in or the alarm will go off Play Date: What is the code? My Son: I can't tell you that! Play Date: Why not? My Son: You might come and steal something from our house! Play Date: No I won't! My Son: Well, you might tell someone that might come and steal something from our house! or that person could tell someone who could tell someone that would steal from our house
LOL!! of course! At the same time wondering, how come a majority of adults don't quite see the need for granular access to Web Data in a manner that enables computers and humans to collectively arrive at similar decisions?
Putting Data in context en route to producing actionable knowledge is a transient endeavor that engages a myriad of human senses. We demonstrate comprehension of this fact in our daily existence as social creatures (at a very early age as depicted above). That said, we seem to forget this fact when engaging the Web: If we can't see it then it can't be valuable.
BTW - I just received a ping about the "Sensory Web" (which is just another way of describing a Data Driven Web experience from my vantage point.)
In the popular M-V-C pattern you don't see the "M", but the "M" will kill you if you get it wrong (it is the FORCE)! Coming to think about it, the pattern could have been coined: V-C-M or C-M-V, but isn't for obvious reasons :-)
RDF is the vehicle that enables us tap into the Data aspect of the Web. We started off with pages of blurb linked via hypertext (Web 1.0) and then looked to "Keywords" for some kind of data access; we then isolated some "Verbs" and discovered another dimension of Web Interaction (Web 2.0) but looked to these "Verbs" for data access which left us with Mashups; and now we are starting to extract "Nouns" and "Adjectives" from sentences (Subject, Predicate, Object - Triples) associated with resources on the Web (Data Web / Web 3.0 / Semantic Web Layer 1) which provides a natural data access substrate for Meshups (natural joining of disparate data from a plethora of data sources) while providing the foundation layer for the Semantic Web.
For those who need use-cases that demonstrate tangible value re. the Semantic Web, here are some projects to note courtesy of the Semantic Web Education and Outreach (SWEO) interest group: