Kingsley Idehen's Blog Data Spacehttp://www.openlinksw.com:443/weblog/public/search.vspx?blogid=127&q=rdf%20middleware&type=text&output=htmlFri, 29 Mar 2024 16:35:15 GMTKingsley Uyi Idehen<kidehen@openlinksw.com>About rdf middleware36 1 10 Frederick Giasson penned an interesting post earlier today that highlighted the RDF Middleware services offered by Triplr and the Virtuoso Sponger

Some Definitions (as per usual):

RDF Middleware (as defined in this context) is about producing RDF from non RDF Data Sources. This implies that you can use non RDF Data Sources (e.g. (X)HTML Web Pages, (X)HTML Web Pages hosting Microformats, and even Web Services such as those from Google, Del.icio.us, Flickr etc..) as Semantic Web Data Source URIs (pointers to RDF Data).

In this post I would like to provide a similar perspective on this ability to treat non RDF as RDF from RDF Browser perspective.

First off, what's an RDF Browser?

An RDF Browser is a piece of technology that enables you to Browse RDF Data Sources by way of Data Link Traversal. The key difference between this approach and traditional browsing is that Data Links are typed (they possess inherent meaning and context) whereas traditional links are untyped (although universally we have been trained to type them as links to Blurb in the form of (X)HTML pages or what is popularly called "Web Content".).

There are a number of RDF Browsers that I am aware off (note: pop me a message directly of by way of a comment to this post if you have a browser that I am unaware of), and they include (in order of creation and availability):

  1. Tabulator
  2. DISCO - Hyperdata Browser
  3. OpenLink Ajax Toolkit's RDF Browser (a component of the OAT Javascript Toolkit)

Each of the browsers above can consume the services of Triplr or the Virtuoso Sponger en route to unveiling a RDF Data that is traversable via URI dereferencing (HTTP GETing the data exposed by the Data Pointer). Thus you can cut&paste the following into each of the aforementioned RDF Browsers:

  1. Triplr's RDF Data (Triples) extractions from Dan Connolly's Home Page
  2. The Virtuoso Sponger's RDF Data (Triples) extractions from Dan Connolly's Home Page

Since we are all time challenged (naturally!) you can also just click on these permalinks for the OAT RDF Browser demos:

  1. Permalink for Triplr's RDF Data (Triples) extractions from Dan Connolly's Home Page
  2. Permalink for the Virtuoso Sponger's RDF Data (Triples) extractions from Dan Connolly's Home Page
]]>
RDF Browsers & RDF Data Middlewarehttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1172Sun, 29 Apr 2007 18:59:05 GMT42007-04-29T14:59:05-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Note: An updated version of a previously unpublished blog post:

Continuing from our recent Podcast conversation, Jon Udell sheds further insight into the essence of our conversation via a “Strategic Developer” column article titled: Accessing the web of databases.

Below, I present an initial dump of a DataSpace FAQ below that hopefully sheds light on the DataSpace vision espoused during my podcast conversation with Jon.

What is a DataSpace?

A moniker for Web-accessible atomic containers that manage and expose Data, Information, Services, Processes, and Knowledge.

What would you typically find in a Data Space? Examples include:

  • Raw Data - SQL, HTML, XML (raw), XHTML, RDF etc.

  • Information (Data In Context) - XHTML (various microformats), Blog Posts (in RSS, Atom, RSS-RDF formats), Subscription Lists (OPML, OCS, etc), Social Networks (FOAF, XFN etc.), and many other forms of applied XML.
  • Web Services (Application/Service Logic) - REST or SOAP based invocation of application logic for context sensitive and controlled data access and manipulation.
  • Persisted Knowledge - Information in actionable context that is also available in transient or persistent forms expressed using a Graph Data Model. A modern knowledgebase would more than likely have RDF as its Data Language, RDFS as its Schema Language, and OWL as its Domain  Definition (Ontology) Language. Actual Domain, Schema, and Instance Data would be serialized using formats such as RDF-XML, N3, Turtle etc).

How do Data Spaces and Databases differ?
Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled.

How do Data Spaces and Content Management Systems differ?
Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity.

How do Data Spaces and Knowledgebases differ?
A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models.

What Architectural Components make up a Data Space?

  • ORDBMS Engine - for Data Modeling agility (via complex purpose specific data types and data access methods), Data Atomicity, Data Concurrency, Transaction Isolation, and Durability (aka ACID).

  • Virtual Database Engine - for creating a single view of, and access point to, heterogeneous SQL, XML, Free Text, and other data. This is all about Virtualization at the Data Access Level.
  • Web Services Platform - enabling controlled access and manipulation (via application, service, or protocol logic) of Virtualized or Disparate Data. This layer handles the decoupling of functionality from monolithic wholes for function specific invocation via Web Services using either the SOAP or REST approach.

Where do Data Spaces fit into the Web's rapid evolution?
They are an essential part of the burgeoning Data Web / Semantic Web. In short, they will take us from data “Mash-ups” (combining web accessible data that exists without integration and repurposing in mind) to “Mesh-ups” (combining web accessible data that exists with integration and repurposing in mind).

Where can I see a DataSpace along the lines described, in action?

Just look at my blog, and take the journey as follows:

What about other Data Spaces?

There are several and I will attempt to categorize along the lines of query method available:
Type 1 (Free Text Search over HTTP):
Google, MSN, Yahoo!, Amazon, eBay, and most Web 2.0 plays .

Type 2 (Free Text Search and XQuery/XPath over HTTP)
A few blogs and Wikis (Jon Udell's and a few others)

Type 3 (RDF Data Sets and SPARQL Queryable):
Type 4 (Generic Free Text Search, OpenSearch, GData, XQuery/XPath, and SPARQL):
Points of Semantic Web presence such as the Data Spaces at:

What About Data Space aware tools?

  •    OpenLink Ajax Toolkit - provides Javascript Control level binding to Query Services such as XMLA for SQL, GData for Free Text, OpenSearch for Free Text, SPARQL for RDF, in addition to service specific Web Services (Web 2.0 hosted solutions that expose service specific APIs)
  •    Semantic Radar - a Firefox Extension
  •    PingTheSemantic - the Semantic Webs equivalent of Web 2.0's weblogs.com
  •    PiggyBank - a Firefox Extension

]]>
Data Spaces and Web of Databaseshttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1030Mon, 04 Sep 2006 22:58:56 GMT52006-09-04T18:58:56.000001-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
What is SPARQL?

A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).

SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.

Why is it important?

Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.

Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.

How do I use it, generally?

  1. Locate a SPARQL endpoint (DBpedia, LOD Cloud Cache, Data.Gov, URIBurner, others), or;
  2. Install a SPARQL compliant database server (quad or triple store) on your desktop, workgroup server, data center, or cloud (e.g., Amazon EC2 AMI)
  3. Start the database server
  4. Execute SPARQL Queries via the SPARQL endpoint.

How do I use SPARQL with Virtuoso?

What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:

  1. Software Download and Installation
  2. Data Loading from Data Sources exposed at Network Addresses (e.g. HTTP URLs) using very simple methods
  3. Actual SPARQL query execution via SPARQL endpoint.

Installation Steps

  1. Download Virtuoso Open Source or Virtuoso Commercial Editions
  2. Run installer (if using Commercial edition of Windows Open Source Edition, otherwise follow build guide)
  3. Follow post-installation guide and verify installation by typing in the command: virtuoso -? (if this fails check you've followed installation and setup steps, then verify environment variables have been set)
  4. Start the Virtuoso server using the command: virtuoso-start.sh
  5. Verify you have a connection to the Virtuoso Server via the command: isql localhost (assuming you're using default DB settings) or the command: isql localhost:1112 (assuming demo database) or goto your browser and type in: http://<virtuoso-server-host-name>:[port]/conductor (e.g. http://localhost:8889/conductor for default DB or http://localhost:8890/conductor if using Demo DB)
  6. Go to SPARQL endpoint which is typically -- http://<virtuoso-server-host-name>:[port]/sparql
  7. Run a quick sample query (since the database always has system data in place): select distinct * where {?s ?p ?o} limit 50 .

Troubleshooting

  1. Ensure environment settings are set and functional -- if using Mac OS X or Windows, so you don't have to worry about this, just start and stop your Virtuoso server using native OS services applets
  2. If using the Open Source Edition, follow the getting started guide -- it covers PATH and startup directory location re. starting and stopping Virtuoso servers.
  3. Sponging (HTTP GETs against external Data Sources) within SPARQL queries is disabled by default. You can enable this feature by assigning "SPARQL_SPONGE" privileges to user "SPARQL". Note, more sophisticated security exists via WebID based ACLs.

Data Loading Steps

  1. Identify an RDF based structured data source of interest -- a file that contains 3-tuple / triples available at an address on a public or private HTTP based network
  2. Determine the Address (URL) of the RDF data source
  3. Go to your Virtuoso SPARQL endpoint and type in the following SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM <RDFDataSourceURL> WHERE {?s ?p ?o}
  4. All the triples in the RDF resource (data source accessed via URL) will be loaded into the Virtuoso Quad Store (using RDF Data Source URL as the internal quad store Named Graph IRI) as part of the SPARQL query processing pipeline.

Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:

  1. Transformation of data from non RDF data sources (file content, hypermedia resources, web services output etc..) into RDF based 3-tuples (triples)
  2. Cache Invalidation Scheme Construction -- thus, subsequent queries (without the define get:soft "replace" pragma will not be required bar when you forcefully want to override cache).
  3. If you have very large data sources like DBpedia etc. from CKAN, simply use our bulk loader .

SPARQL Endpoint Discovery

Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.

Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:

  1. dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for services instances
  2. dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results in Zone File format

Related

  1. Using HTTP from Ruby -- you can just make SPARQL Protocol URLs re. SPARQL
  2. Using SPARQL Endpoints via Ruby -- Ruby example using DBpedia endpoint
  3. Interactive SPARQL Query By Example (QBE) tool -- provides a graphical user interface (as is common in SQL realm re. query building against RDBMS engines) that works with any SPARQL endpoint
  4. Other methods of loading RDF data into Virtuoso
  5. Virtuoso Sponger -- architecture and how it turns a wide variety of non RDF data sources into SPARQL accessible data
  6. Using OpenLink Data Explorer (ODE) to populate Virtuoso -- locate a resource of interest; click on a bookmarklet or use context menus (if using ODE extensions for Firefox, Safari, or Chrome); and you'll have SPARQL accessible data automatically inserted into your Virtuoso instance.
  7. W3C's SPARQLing Data Access Ingenuity -- an older generic SPARQL introduction post
  8. Collection of SPARQL Query Examples -- GoodRelations (Product Offers), FOAF (Profiles), SIOC (Data Spaces -- Blogs, Wikis, Bookmarks, Feed Collections, Photo Galleries, Briefcase/DropBox, AddressBook, Calendars, Discussion Forums)
  9. Collection of Live SPARQL Queries against LOD Cloud Cache -- simple and advanced queries.
]]>
Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1647Wed, 19 Jan 2011 15:43:35 GMT102011-01-19T10:43:35-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
What is URIBurner?

A service from OpenLink Software, available at: http://uriburner.com, that enables anyone to generate structured descriptions -on the fly- for resources that are already published to HTTP based networks. These descriptions exist as hypermedia resource representations where links are used to identify:

  • the entity (data object or datum) being described,
  • each of its attributes, and
  • each of its attributes values (optionally).

The hypermedia resource representation outlined above is what is commonly known as an Entity-Attribute-Value (EAV) Graph. The use of generic HTTP scheme based Identifiers is what distinguishes this type of hypermedia resource from others.

Why is it Important?

The virtues (dual pronged serendipitous discovery) of publishing HTTP based Linked Data across public (World Wide Web) or private (Intranets and/or Extranets) is rapidly becoming clearer to everyone. That said, the nuance laced nature of Linked Data publishing presents significant challenges to most. Thus, for Linked Data to really blossom the process of publishing needs to be simplified i.e., "just click and go" (for human interaction) or REST-ful orchestration of HTTP CRUD (Create, Read, Update, Delete) operations between Client Applications and Linked Data Servers.

How Do I Use It?

In similar vane to the role played by FeedBurner with regards to Atom and RSS feed generation, during the early stages of the Blogosphere, it enables anyone to publish Linked Data bearing hypermedia resources on an HTTP network. Thus, its usage covers two profiles: Content Publisher and Content Consumer.

Content Publisher

The steps that follow cover all you need to do:

  • place a tag within your HTTP based hypermedia resource (e.g. within section for HTML )
  • use a URL via the @href attribute value to identify the location of the structured description of your resource, in this case it takes the form: http://linkeddata.uriburner.com/about/id/{scheme-or-protocol}/{your-hostname-or-authority}/{your-local-resource}
  • for human visibility you may consider adding associating a button (as you do with Atom and RSS) with the URL above.

That's it! The discoverability (SDQ) of your content has just multiplied significantly, its structured description is now part of the Linked Data Cloud with a reference back to your site (which is now a bona fide HTTP based Linked Data Space).

Examples

HTML+RDFa based representation of a structured resource description:

<link rel="describedby" title="Resource Description (HTML)"type="text/html" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>

JSON based representation of a structured resource description:

<link rel="describedby" title="Resource Description (JSON)" type="application/json" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>

N3 based representation of a structured resource description:

<link rel="describedby" title="Resource Description (N3)" type="text/n3" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>

RDF/XML based representations of a structured resource description:

<link rel="describedby" title="Resource Description (RDF/XML)" type="application/rdf+xml" href="http://linkeddata.uriburner.com/about/id/http/example.org/xyz.html"/>

Content Consumer

As an end-user, obtaining a structured description of any resource published to an HTTP network boils down to the following steps:

  1. go to: http://uriburner.com
  2. drag the Page Metadata Bookmarklet link to your Browser's toolbar
  3. whenever you encounter a resource of interest (e.g. an HTML page) simply click on the Bookmarklet
  4. you will be presented with an HTML representation of a structured resource description (i.e., identifier of the entity being described, its attributes, and its attribute values will be clearly presented).

Examples

If you are a developer, you can simply perform an HTTP operation request (from your development environment of choice) using any of the URL patterns presented below:

HTML:
  • curl -I -H "Accept: text/html" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}

JSON:

  • curl -I -H "Accept: application/json" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
  • curl http://linkeddata.uriburner.com/about/data/json/{scheme}/{authority}/{local-path}

Notation 3 (N3):

  • curl -I -H "Accept: text/n3" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
  • curl http://linkeddata.uriburner.com/about/data/n3/{scheme}/{authority}/{local-path}
  • curl -I -H "Accept: text/turtle" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
  • curl http://linkeddata.uriburner.com/about/data/ttl/{scheme}/{authority}/{local-path}

RDF/XML:

  • curl -I -H "Accept: application/rdf+xml" http://linkeddata.uriburner.com/about/id/{scheme}/{authority}/{local-path}
  • curl http://linkeddata.uriburner.com/about/data/xml/{scheme}/{authority}/{local-path}

Conclusion

URIBurner is a "deceptively simple" solution for cost-effective exploitation of HTTP based Linked Data meshes. It doesn't require any programming or customization en route to immediately realizing its virtues.

If you like what URIBurner offers, but prefer to leverage its capabilities within your domain -- such that resource description URLs reside in your domain, all you have to do is perform the following steps:

  1. download a copy of Virtuoso (for local desktop, workgroup, or data center installation) or
  2. instantiate Virtuoso via the Amazon EC2 Cloud
  3. enable the Sponger Middleware component via the RDF Mapper VAD package (which includes cartridges for over 30 different resources types)

When you install your own URIBurner instances, you also have the ability to perform customizations that increase resource description fidelity in line with your specific needs. All you need to do is develop a custom extractor cartridge and/or meta cartridge.

Related:

]]>
URIBurner: Painless Generation & Exploitation of Linked Data (Update 1 - Demo Links Added)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1613Thu, 11 Mar 2010 15:16:34 GMT52010-03-11T10:16:34.000003-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Motivation for this post arose from a series of Twitter exchanges between Tony Hirst and I, in relation to his blog post titled: So What Is It About Linked Data that Makes it Linked Data™ ?

At the end of the marathon session, it was clear to me that a blog post was required for future reference, at the very least :-)

What is Linked Data?

"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.

Why is it Important?

Information makes the world tick!

Information doesn't exist without data to contextualize.

Information is inaccessible without a projection (presentation) medium.

All information (without exception, when produced by humans) is subjective. Thus, to truly maximize the innate heterogeneity of collective human intelligence, loose coupling of our information and associated data sources is imperative.

How is Linked Data Delivered?

Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources bearing structured representations of data object descriptions. Remember, you have a single Identifier abstraction (generic HTTP URI) that embodies: Data Object Name and Data Representation Location (aka URL).

How are Linked Data Object Representations Structured?

A structured representation of data exists when an Entity (Datum), its Attributes, and its Attribute Values are clearly discernible. In the case of a Linked Data Object, structured descriptions take the form of a hypermedia based Entity-Attribute-Value (EAV) graph pictorial -- where each Entity, its Attributes, and its Attribute Values (optionally) are identified using Generic HTTP URIs.

Examples of structured data representation formats (content types) associated with Linked Data Objects include:

  • text/html
  • text/turtle
  • text/n3
  • application/json
  • application/rdf+xml
  • Others

How Do I Create Linked Data oriented Hypermedia Resources?

You markup resources by expressing distinct entity-attribute-value statements (basically these a 3-tuple records) using a variety of notations:

  • (X)HTML+RDFa,
  • JSON,
  • Turtle,
  • N3,
  • TriX,
  • TriG,
  • RDF/XML, and
  • Others (for instance you can use Atom data format extensions to model EAV graph as per OData initiative from Microsoft).

You can achieve this task using any of the following approaches:

  • Notepad
  • WYSIWYG Editor
  • Transformation of Database Records via Middleware
  • Transformation of XML based Web Services output via Middleware
  • Transformation of other Hypermedia Resources via Middleware
  • Transformation of non Hypermedia Resources via Middleware
  • Use a platform that delivers all of the above.

Practical Examples of Linked Data Objects Enable

  • Describe Who You Are, What You Offer, and What You Need via your structured profile, then leave your HTTP network to perform the REST (serendipitous discovery of relevant things)
  • Identify (via map overlay) all items of interest based on a 2km+ radious of my current location (this could include vendor offerings or services sought by existing or future customers)
  • Share the latest and greatest family photos with family members *only* without forcing them to signup for Yet Another Web 2.0 service or Social Network
  • No repetitive signup and username and password based login sequences per Web 2.0 or Mobile Application combo
  • Going beyond imprecise Keyword Search to the new frontier of Precision Find - Example, Find Data Objects associated with the keywords: Tiger, while enabling the seeker disambiguate across the "Who", "What", "Where", "When" dimensions (with negation capability)
  • Determine how two Data Objects are Connected - person to person, person to subject matter etc. (LinkedIn outside the walled garden)
  • Use any resource address (e.g blog or bookmark URL) as the conduit into a Data Object mesh that exposes all associated Entities and their social network relationships
  • Apply patterns (social dimensions) above to traditional enterprise data sources in combination (optionally) with external data without compromising security etc.

How Do OpenLink Software Products Enable Linked Data Exploitation?

Our data access middleware heritage (which spans 16+ years) has enabled us to assemble a rich portfolio of coherently integrated products that enable cost-effective evaluation and utilization of Linked Data, without writing a single line of code, or exposing you to the hidden, but extensive admin and configuration costs. Post installation, the benefits of Linked Data simply materialize (along the lines described above).

Our main Linked Data oriented products include:

  • OpenLink Data Explorer -- visualizes Linked Data or Linked Data transformed "on the fly" from hypermedia and non hypermedia data sources
  • URIBurner -- a "deceptively simple" solution that enables the generation of Linked Data "on the fly" from a broad collection of data sources and resource types
  • OpenLink Data Spaces -- a platform for enterprises and individuals that enhances distributed collaboration via Linked Data driven virtualization of data across its native and/or 3rd party content manager for: Blogs, Wikis, Shared Bookmarks, Discussion Forums, Social Networks etc
  • OpenLink Virtuoso -- a secure and high-performance native hybrid data server (Relational, RDF-Graph, Document models) that includes in-built Linked Data transformation middleware (aka. Sponger).

Related

]]>
Revisiting HTTP based Linked Data (Update 1 - Demo Video Links Added)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1611Mon, 08 Mar 2010 14:59:37 GMT42010-03-08T09:59:37.000010-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Situation Analysis

Since the beginning of the modern IT era, each period of innovation has inadvertently introduced its fair share of Data Silos. The driving force behind this anomaly remains an overemphasis on the role of applications when selecting problem solutions. Unfortunately, most solution selecting decision makers remain oblivious to the fact that most applications are architecturally monolithic; i.e., they fail to separate the following five layers that are critical to all solutions:

  1. Data Unit (Datum or Data Object) Identity,
  2. Data Storage/Persistence,
  3. Data Access,
  4. Data Representation, and
  5. Data Presentation/Visualization.

The rise of the Internet, and its exponentially-growing user-friendly enclave known as the World Wide Web, is bringing the intrinsic costs of the monolithic application architecture anomaly to bear -- in manners unanticipated by many. For example, the emergence of network-oriented solutions across the realms of Enterprise 2.0-based Collaboration and Web 2.0-based Software-as-a-Service (SaaS), combined with the overarching influence of Social Media, are producing more heterogeneously-structured and disparately-located data sources than people can effectively process.

As is often the case, a variety of problem and product monikers have emerged for the data access and integration challenges outlined above. Contemporary examples include Enterprise Information Integration, Master Data Management, and Data Virtualization. Labeling aside, the fundamental issues of the unresolved Data Integration challenge boil down to the following:

  • Data Model Heterogeneity
  • Data Quality (Cleanliness)
  • Semantic Variance across Contexts (e.g., weights and measures).

Effectively solving today's data integration challenges requires a move away from monolithic application architecture to loosely-coupled, network-centric application architectures. Basically, we need a ubiquitous network-centric application protocol that lends itself to loosely-coupled across-the-wire orchestration of data interactions. In short, this will be what revitalizes the art of application development and deployment.

The World Wide Web is built around a network application protocol called HTTP. This protocol intrinsically separates the five layers listed earlier, thereby enabling:

  • Use of Generic HTTP URIs as Data Object (Entity) Identifiers;
  • Identifier Co-reference, such that multiple Data Object Identifiers may reference the same Data Object;
  • Use of the Entity-Attribute-Value Model to describe Data Objects using real world modeling friendly conceptual graphs;
  • Use of HTTP URLs to Identify Locations of Resources that bear (host) Data Object Descriptions (Representations);
  • Data Access mechanism for retrieving Data Object Representations from persistent or transient storage locations.

What is Virtuoso?

A uniquely designed to address today's escalating Data Access and Integration challenges without compromising performance, security, or platform independence. At its core lies an unrivaled commitment to industry standards combined with unique technology innovation that transcends erstwhile distinct realms such as:

When Virtuoso is installed and running, HTTP-based Data Objects are automatically created as a by-product of its powerful data virtualization, transcending data sources and data representation formats. The benefits of such power extend across profiles such as:

Product Benefits Summary

  • Enterprise Agility — Virtuoso lets you mix-&-match best-of-class combinations of Operating Systems, Programming Environments, Database Engines and Data-Access Middleware when building or tweaking your IS infrastructure, without the typical impedance of vendor-lock-in.
  • Data Model Dexterity — By supporting multiple protocols and data models in a single product, Virtuoso protects you against costly vulnerabilities such as: perennial acquisition and accumulation of expensive data model specific DBMS products that still operate on the fundamental principle of: proprietary technology lock-in, at a time when heterogeneity continues to intrinsically define the information technology landscape.
  • Cost-effectiveness — By providing a single point of access (and single-sign-on, SSO) to a plethora of Web 2.0-style social networks, Web Services, and Content Management Systems, and by using Data Object Identifiers as units of Data Virtualization that become the focal points of all data access, Virtuoso lowers the cost to exploit emerging frontiers such as socially-enhanced enterprise collaboration.
  • Speed of Exploitation — Virtuoso provides the ability to rapidly assemble 360-degree conceptual views of data, across internal line-of-business application (CRM, ERP, ECM, HR, etc.) data and/or external data sources, whether these are unstructured, semi-structured, or fully structured.

Bottom line, Virtuoso delivers unrivaled flexibility and scalability, without compromising performance or security.

Related

 

]]>
OpenLink Virtuoso - Product Value Proposition Overiewhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1609Sat, 27 Feb 2010 17:46:36 GMT32010-02-27T12:46:36-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
In recent times a lot of the commentary and focus re. Virtuoso has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked is the sophisticated Virtual Database Engine that provides the foundation for all of Virtuoso's data integration capabilities.

In this post I provide a brief re-introduction to this essential aspect of Virtuoso.

What is it?

This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources).

Why is it important?

In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools

In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso.

How do I use it?

The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include:

Relational Database Federation

You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres!

Conceptual Level Data Access using the RDF Model

You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types).

You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query).

It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language constructs transmitted over HTTP.

Conceptual Level Data Access using ADO.NET Entity Frameworks

As an alternative to RDF, Virtuoso can expose ADO.NET Entity Frameworks-based Conceptual Views over Relational Data Sources. It achieves this by generating Entity Relationship graphs via its native ADO.NET Provider, exposing all externally attached ODBC- and JDBC-accessible data sources. In addition, the ADO.NET Provider supports direct access to Virtuoso's native RDF database engine, eliminating the need for resource intensive Entity Frameworks model transformations.

Related

]]>
Re-introducing the Virtuoso Virtual Database Engine http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1608Wed, 17 Feb 2010 21:46:53 GMT12010-02-17T16:46:53-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Situation Analysis:

Dr. Dre is one of the artists in the Linked Data Space we host for the BBC. He is also referenced in music oriented data spaces such as DBpedia, MusicBrainz and Last.FM (to name a few).

Challenge:

How do I obtain a holistic view of the entity "Dr. Dre" across the BBC, MusicBrainz, and Last.FM data spaces? We know the BBC published Linked Data, but what about Last.FM and MusicBrainz? Both of these data spaces only expose XML or JSON data via REST APIs?

Solution:

Simple 3 step Linked Data Meshup courtesy of Virtuoso's in-built RDFizer Middleware "the Sponger" (think ODBC Driver Manager for the Linked Data Web) and its numerous Cartridges (think ODBC Drivers for the Linked Data Web).

Steps:

  1. Go to Last.FM and search using pattern: Dr. Dre (you will end up with this URL: http://www.last.fm/music/Dr.+Dre)
  2. Go to the Virtuoso powered BBC Linked Data Space home page and enter: http://bbc.openlinksw.com/about/html/http://www.last.fm/music/Dr.+Dre
  3. Go to the BBC Linked Data Space home page and type full text pattern (using default tab): Dr. Dre, then view Dr. Dre's metadata via the Statistics Link.

What Happened?

The following took place:

  1. Virtuoso Sponger sent an HTTP GET to Last.FM
  2. Distilled the "Artist" entity "Dr. Dre" from the page, and made a Linked Data graph
  3. Inverse Functional Property and sameAs reasoning handled the Meshup (augmented graph from a conjunctive query processing pipeline)
  4. Links for "Dr. Dre" across BBC (sameAs), Last.FM (seeAlso), via DBpedia URI.

The new enhanced URI for Dr. Dre now provides a rich holistic view of the aforementioned "Artist" entity. This URI is usable anywhere on the Web for Linked Data Conduction :-)

Related (as in NearBy)

]]>
BBC Linked Data Meshup In 3 Stepshttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1560Fri, 12 Jun 2009 20:38:34 GMT22009-06-12T16:38:34.000046-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
While exploring the Subject Headings Linked Data Space (LCSH) recently unveiled by the Library of Congress, I noticed that the URI for the subject heading: World Wide Web, exposes an "owl:sameAs" link to resource URI: "info:lc/authorities/sh95000541" -- in fact, a URI.URN that isn't HTTP protocol scheme based.

The observations above triggered a discussion thread on Twitter that involved: @edsu, @iand, and moi. Naturally, it morphed into a live demonstration of: human vs machine, interpretation of claims expressed in the RDF graph.

What makes this whole thing interesting?

It showcases (in Man vs Machine style) the issue of unambiguously discerning the meaning of the owl:sameAs claim expressed in the LCSH Linked Data Space.

Perspectives & Potential Confusion

From the Linked Data perspective, it may spook a few people to see owl:sameAs values such as: "info:lc/authorities/sh95000541", that cannot be de-referenced using HTTP.

It may confuse a few people or user agents that see URI de-referencing as not necessarily HTTP specific, thereby attempting to de-reference the URI.URN on the assumption that it's associated with a "handle system", for instance.

It may even confuse RDFizer / RDFization middleware that use owl:sameAs as a data provider attribution mechanism via hint/nudge URI values derived from original content / data URI.URLs that de-reference to nothing e.g., an original resource URI.URL plus "#this" which produces URI.URN-URL -- think of this pattern as "owl:shameAs" in a sense :-)

Unambiguously Discerning Meaning

Simply bring OWL reasoning (inference rules and reasoners) into the mix, thereby negating human dialogue about interpretation which ultimately unveils a mesh of orthogonal view points. Remember, OWL is all about infrastructure that ultimately enables you to express yourself clearly i.e., say what you mean, and mean what you say.

Path to Clarity (using Virtuoso, its in-built Sponger Middleware, and Inference Engine):

  1. GET the data into the Virtuoso Quad store -- what the sponger does via its URIBurner Service (while following designated predicates such as owl:sameAs in case they point to other mesh-able data sources)
  2. Query the data in Quad Store with "owl:sameAs" inference rules enabled
  3. Repeat the last step with the inference rules excluded.

Actual SPARQL Queries:

Observations:

The SPARQL queries against the Graph generated and automatically populated by the Sponger reveal -- without human intervention-- that: "info:lc/authorities/sh95000541", is just an alternative name for < xmlns="http" id.loc.gov="id.loc.gov" authorities="authorities" sh95000541="sh95000541" concept="concept">, and that the graph produced by LCSH is self-describing enough for an OWL reasoner to figure this all out courtesy of the owl:sameAs property :-).

Hopefully, this post also provides a simple example of how OWL facilitates "Reasonable Linked Data".

Related

]]>
Library of Congress & Reasonable Linked Datahttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1556Wed, 06 May 2009 18:26:15 GMT22009-05-06T14:26:15.000034-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
What is it?

A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.

What does it offer?

From a Web Entrepreneur perspective it offers:
  1. Low cost entry point to a game-changing Web 3.0+ (and beyond) platform that combines SQL, RDF, XML, and Web Services functionality
  2. Flexible variable cost model (courtesy of EC2 DevPay) tightly bound to revenue generated by your services
  3. Delivers federated and/or centralized model flexibility for you SaaS based solutions
  4. Simple entry point for developing and deploying sophisticated database driven applications (SQL or RDF Linked Data Web oriented)
  5. Complete framework for exploiting OpenID, OAuth (including Role enhancements) that simplifies exploitation of these vital Identity and Data Access technologies
  6. Easily implement RDF Linked Data based Mail, Blogging, Wikis, Bookmarks, Calendaring, Discussion Forums, Tagging, Social-Networking as Data Space (data containers) features of your application or service offering
  7. Instant alleviation of challenges (e.g. service costs and agility) associated with Data Portability and Open Data Access across Web 2.0 data silos
  8. LDAP integration for Intranet / Extranet style applications.

From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:

  1. RDF Database (a Quad Store with SPARQL & SPARUL Language & Protocol support)
  2. SQL Database (with ODBC, JDBC, OLE-DB, ADO.NET, and XMLA driver access)
  3. XML Database (XML Schema, XQuery/Xpath, XSLT, Full Text Indexing)
  4. Full Text Indexing.

From a Middleware perspective it provides:

  1. RDF Views (Wrappers / Semantic Covers) over SQL, XML, and other data sources accessible via SOAP or REST style Web Services
  2. Sponger Service for converting non RDF information resources into RDF Linked Data "on the fly" via a large collection of pre-installed RDFizer Cartridges.

From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering

  1. HTTP Web Server
  2. WebDAV Server
  3. Web Application Server (includes PHP runtime hosting)
  4. SOAP or REST style Web Services Deployment
  5. RDF Linked Data Deployment
  6. SPARQL (SPARQL Query Language) and SPARUL (SPARQL Update Language) endpoints
  7. Virtuoso Hosted PHP packages for MediaWiki, Drupal, Wordpress, and phpBB3 (just install the relevant Virtuoso Distro. Package).

From the general System Administrator's perspective it provides:

  1. Online Backups (Backup Set dispatched to S3 buckets, FTP, or HTTP/WebDAV server locations)
  2. Synchronized Incremental Backups to Backup Set locations
  3. Backup Restore from Backup Set location (without exiting to EC2 shell).

Higher level user oriented offerings include:

  1. OpenLink Data Explorer front-end for exploring the burgeoning Linked Data Web
  2. Ajax based SPARQL Query Builder (iSPARQL) that enables SPARQL Query construction by Example
  3. Ajax based SQL Query Builder (QBE) that enables SQL Query construction by Example.

For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:

  1. Point of presence on the Linked Data Web that meshes your Identity and your Data via URIs
  2. System generated Social Network Profile & Contact Data via FOAF?
  3. System generated SIOC (Semantically Interconnected Online Community) Data Space (that includes a Social Graph) exposing all your Web data in RDF Linked Data form
  4. System generated OpenID and automatic integration with FOAF
  5. Transparent Data Integration across Facebook, Digg, LinkedIn, FriendFeed, Twitter, and any other Web 2.0 data space equipped with RSS / Atom support and/or REST style Web Services
  6. In-built support for SyncML which enables data synchronization with Mobile Phones.

How Do I Get Going with It?

]]>
Introducing Virtuoso Universal Server (Cloud Edition) for Amazon EC2http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1489Fri, 28 Nov 2008 21:06:02 GMT22008-11-28T16:06:02.000006-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
It is getting clearer by the second that Master Data Management and RDF based Linked data are two realms separated by a common desire to provide "Entity Oriented Data Access" to heterogeneous data sources (within the enterprise and/or across the World Wide Web).

Here is how I see Linked Data providing tangible value to MDM tools vendors and users:

  1. Open access to Entities across MDM instances served up by different MDM solutions acting as Linked Data publishers (i.e., expose MDM Entities as RDF resources endowed with de-referencable URIs thereby enabling Hyperdata-style linking)
  2. Use of RDF-ization middleware to hook disparate data sources (SQL, XML, and other data sources) into existing MDM packages (i.e., the MDM solutions become consumers of RDF Linked Data).

Of course Virtuoso was designed and developed to deliver the above from day one (circa. 1998 re. the core and 2005 re. the use of RDF for the final mile) as depicted below:

Related

]]>
Master Data Management (MDM) & RDF based Linked Datahttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1482Wed, 05 Nov 2008 23:19:02 GMT12008-11-05T18:19:02-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Runtime hosting is functionality realm of Virtuoso that is sometimes easily overlooked. In this post I want to provide a simple no-hassles HOWTO guide for installing Virtuoso on Windows (32 or 64 Bit), Mac OS X (Universal or Native 64 Bit), and Linux (32 or 64 Bit). The installation guide also covers the instantiation of phpBB3 as verification of the Virtuoso hosted PHP 3.5 runtime.

What are the benefits of PHP Runtime Hosting?

Like Apache, Virtuoso is a bona-fide Web Application Server for PHP based applications. Unlike Apache, Virtuoso is also the following:

  • a Hybrid Native DBMS Engine (Relational, RDF-Graph, and Document models) that is accessible via industry standard interfaces (solely)
  • a Virtual DBMS or Master Data Manager (MDM) that virtualizes heterogeneous data sources (ODBC, JDBC, Web Services, Hypermedia Resources, Non Hypermedia Resources)
  • an RDF Middleware solution for RDF-zation of non RDF resources across the Web and enterprise Intranets and/or Extranets (in the form of Cartridges for data exposed via REST or SOA oriented SOAP interfaces)
  • an RDF Linked Data Server (meaning it can deploy RDF Linked Data based on its native and/or virtualized data)

As result of the above, when you deploy a PHP application using Virtuoso, you inherit the following benefits:

  1. Use of PHP-iODBC for in-process communication with Virtuoso
  2. Easy generation of RDF Linked Data Views atop the SQL schemas of PHP applications
  3. Easy deployment of RDF Linked Data from virtualized data sources
  4. Less LAMP monoculture (*there is no such thing as virtuous monoculture*) when dealing with PHP based Web applications.

As indicated in prior posts, producing RDF Linked Data from the existing Web, where a lot of content is deployed by PHP based content managers, should simply come down to RDF Views over the SQL Schemas and deployment / publishing of the RDF Views in RDF Linked data form. In a nutshell, this is what Virtuoso delivers via its PHP runtime hosting and pre packaged VADs (Virtuoso Application Distribution packages), for popular PHP based applications such as: phpBB3, Drupal, WordPress, and MediaWiki.

In addition, to the RDF Linked Data deployment, we've also taken the traditional LAMP installation tedium out of the typical PHP application deployment process. For instance, you don't have to rebuild PHP 3.5 (32 or 64 Bit) on Windows, Mac OS X, or Linux to get going, simply install Virtuoso, and then select a VAD package for the relevant application and you're set. If the application of choice isn't pre packaged by us, simply install as you would when using Apache, which comes dow to situating the PHP files in your Web structure under the Web Application's root directory.

Installation Guide

  1. Download the Virtuoso installer for Windows (32 Bit msi file or 64 Bit msi file), Mac OS X (Universal Binary dmg file), or instantiate the Virtuoso EC2 AMI (*search for pattern: "Virtuoso when using the Firefox extension for EC2 as the AMI ID is currently: ami-7c31d515 and name: virtuoso-test/virtuoso-cloud-beta-9-i386.manifest.xml, for latest cut*)
  2. Run the installer (or download the movies using the links in the related section below)
  3. Go to the Virtuoso Conductor (*which will show up at the end of the installation process* or go to http://localhost:8890/conductor)
  4. Go to the "Admin" tab within the (X)HTML based UI and select the "Packages" sub-menu item (a Tab)
  5. Pick phpBB3 (or any other pre-packaged PHP app) and then click on "Install/Upgrase"
  6. The watch one of my silent movies or read the initial startup guides for Virtuoso hosted phpBB3, Drupal, Wordpress, MediaWiki.

Related

At the current time, I've only provided links to ZIP files containing the Virtuoso installation "silent movies". This approach is a short-term solution to some of my current movie publishing challenges re. YouTube and Vimeo -- where the compressed output hasn't been of acceptable visual quality. Once resolved, I will publish much more "Multimedia Web" friendly movies :-)

]]>
Virtuoso, PHP Runtime Hosting: phpBB, Wordpress, Drupal, MediaWiki, and Linked Datahttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1461Fri, 26 Mar 2010 01:19:59 GMT52010-03-25T21:19:59-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.

If the RDF generated, results in an entity-to-entity level network (graph) in which each entity is endowed with a de-referencable HTTP based ID (a URI), we end up with an enhancement to the Web that adds Hyperdata linking across extracted entities, to the existing Hypertext based Web of linked documents (pages, images, and other information resource types). Thus, I can use the same URL linking mechanism to reference a broader range of "Things" i.e., documents, things that documents are about, or things loosely associated with documents.

The Virtuoso Sponger is an example of an RDF Middleware solution from OpenLink Software. It's an in-built component of the Virtuoso Universal Server, and deployable in many forms e.g., Software as Service (SaaS) or traditional software installation. It delivers RDF-ization services via a collection of Web information resource specific Cartridges/Providers/Drivers covering Wikipedia, Freebase, CrunchBase, WikiCompany, OpenLibrary, Digg, eBay, Amazon, RSS/Atom/OPML feed sources, XBRL, and many more.

RDF-ization alone doesn't ensure valuable RDF based Linked Data on the Web. The process of producing RDF Linked Data is ultimately about the art of effectively describing resources with an eye for context.

RDF-ization Processing Steps

  1. Entity Extraction
  2. Vocabulary/Schema/Ontology (Data Dictionary) mapping
  3. HTTP based Proxy URI generation
  4. Linked Data Cloud Lookups (e.g., perform UMBEL lookup to add "isAbout" fidelity to graph and then lookup DBpedia and other LOD instance data enclaves for Identical individuals and connect via "owl:sameAs")
  5. RDF Linked Data Graph projection that uses the description of the container information resource to expose the URIs of the distilled entities.

The animation that follows illustrates the process (5,000 feet view), from grabbing resources via HTTP GET, to injecting RDF Linked Data back into the Web cloud:

Note: the Shredder is a Generic Cartridge, so you would have one of these per data source type (information resource type).

]]>
What is Linked Data oriented RDF-ization?http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1453Tue, 07 Oct 2008 21:35:24 GMT32008-10-07T17:35:24-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
All enterprises run IS/MIS/EIS systems that are supposed to enable optimized exploitation of data, information, and knowledge. Unfortunately, applications, services (SOAP or REST), database engines, middleware, operating systems, programming languages, development frameworks, network protocols, network topologies, or some other piece of infrastructure, eventually lay claim (possessively) to the data.

Courtesy of Linked Data, we are now able to extend the "document to document" linking mechanism of the Web (Hypertext Linking) to more granular "entity to entity" level linking. And in doing so, we have a layer of abstraction that in one swoop alleviates all of the infrastructure oriented data access impediments of yore. I know this sounds simplistic, but be rest assured, imbibing Linked Data's value proposition is really just that simple, once you engage solutions (e.g. Virtuoso) that enable you to deploy Linked Data across your enterprise.

Example:

Microsoft ACCESS, SQL Server, and Virtuoso all use the Northwind SQL DB Schema as the basis of the demonstration database shipped with each DBMS product. This schema is comprised of common IS/MIS entities that include: Customers, Contacts, Orders, Products, Employees etc.

What we all really want to do as data, information, and knowledge consumers and/or dispatchers, is be no more than a single "mouse click" away from relevant data/information/knowledge data access and/or exploration. Even better (but not always so obvious), we also want anyone in our network (company, division, department, cube-cluster) to inherit these data access efficiencies.

In this example, the Web Page about the Customer "ALKI" provides me with a myriad of exploration and data access paths e.g., when I click on the foaf:primarytopic property value link.

This simple example, via a single Web Page, should put to rest any doubts about the utility of Linked Data. Of course this is an old demo, but this time around the UI is minimalist as my prior attempts skipped a few steps i.e., starting from within a Linked Data explorer/browser.

Important note: I haven't exported SQL into an RDF data warehouse, I am converting the SQL into RDF Linked Data on the fly which has two fundamental benefits:

  1. No vulnerability to changes in the source DBMS
  2. Superior performance over the RDF warehouse since the source schema is SQL based and I can leverage the optimization of the underlying SQL engine when translating between SPARQL and SQL.

Enjoy!

Related

  1. Requirements for Relational to RDF Mapping
  2. Handling Graph Transitivity in a SQL/RDF Hybrid Engine
  3. How Virtuoso handles the Web Aspects of Linked Data Queries.
]]>
Business Value of Linked Data (Enterprise Angle)? http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1437Thu, 11 Sep 2008 19:52:48 GMT22008-09-11T15:52:48.000050-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>

Ubiquity from Mozilla Labs, provides an alternative entry point for experiencing the "Controller" aspect of the Web's natural compatibility with the MVC development pattern. As I've noted (in various posts) Web Services, as practiced by the REST oriented Web 2.0 community or SOAP oriented SOA community within the enterprise, is fundamentally about the ("Controller" aspect of MVC.

Ubiquity provides a commandline interface for direct invocation of Web Services. For instance, in our case, we can expose the Virtuoso's in-built RDF Middleware ("Sponger") and Linked Data deployment services via a single command of the form: describe-resource <url>

To experience this neat addition to Firefox you need to do the following:

  1. Download and install the Ubiquity Extension for Firefox
  2. Subscribe to the OpenLink Command for Resource Description
  3. Click on CTRL+Space (Windows / Linux) or Option+Space (Mac OS X)
  4. Type in: describe-resource <a-web-resource-url>

How to unsubscribe

At the current time, you need to do this if you've installed commands using ubiquity 0.1.0 and seek to use newer versions of the same commands after upgrading to ubiquity 0.1.1.
  1. To unsubscribe use type "about:ubiquity" into browser
  2. Click on unsubscribe links associated with you command subscription list

Enjoy!

]]>
Linked Data, Ubiquity Commands, and Resource Descriptions (Update 3)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1430Mon, 08 Sep 2008 13:00:51 GMT72008-09-08T09:00:51-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Here are a few descriptions of pages covering Google's Chrome browser:

As per usual, this is part post and part Linked Data demo. This time around, I am showcasing Proxy/Wrapper based dereferencable URIs and a new "Page Description" feature that showcases the capabilities of Virtuoso's in-built RDFization Middleware. Also note, the resource descriptions (RDF) are presented using an HTML page.

]]>
What's Up with Chrome?http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1429Thu, 04 Sep 2008 12:39:02 GMT22008-09-04T08:39:02.000014-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
After reading Bengee's interview with CrunchBase, I decided to knock up a quick interview remix as part of my usual attempt to add to the developing discourse.

CrunchBase: When we released the CrunchBase API, you were one of the first developers to step up and quickly released a CrunchBase Sponger Cartridge. Can you explain what a CrunchBase Sponger Cartridge is?
Me: A Sponger Cartridge is a data access driver for Web Resources that plugs into our Virtuoso Universal Server (DBMS and Linked Data Web Server combo amongst other things). It uses the internal structure of a resource and/or a web service associated with a resource, to materialize an RDF based Linked Data graph that essentially describes the resource via its properties (Attributes & Relationships).




CrunchBase: And what inspired you to create it?
Me: Bengee built a new space with your data, and we've built a space on the fly from your data which still resides in your domain. Either solution extols the virtues of Linked Data i.e. the ability to explore relationships across data items with high degrees of serendipity (also colloquially known as: following-your-nose pattern in Semantic Web circles).
Bengee posted a notice to the Linking Open Data Community's public mailing list announcing his effort. Bearing in mind the fact that we've been using middleware to mesh the realms of Web 2.0 and the Linked Data Web for a while, it was a no-brainer to knock something up based on the conceptual similarities between Wikicompany and CrunchBase. In a sense, a quadrant of orthogonality is what immediately came to mind re. Wikicompany, CrunchBase, Bengee's RDFization efforts, and ours.
Bengee created an RDF based Linked Data warehouse based on the data exposed by your API, which is exposed via the Semantic CrunchBase data space. In our case we've taken the "RDFization on the fly" approach which produces a transient Linked Data View of the CrunchBase data exposed by your APIs. Our approach is in line with our world view: all resources on the Web are data sources, and the Linked Data Web is about incorporating HTTP into the naming scheme of these data sources so that the conventional URL based hyperlinking mechanism can be used to access a structured description of a resource, which is then transmitted using a range negotiable representation formats. In addition, based on the fact that we house and publish a lot of Linked Data on the Web (e.g. DBpedia, PingTheSemanticWeb, and others), we've also automatically meshed Crunchbase data with related data in DBpedia and Wikicompany data.

CrunchBase: Do you know of any apps that are using CrunchBase Cartridge to enhance their functionality?
Me: Yes, the OpenLink Data Explorer which provides CrunchBase site visitors with the option to explore the Linked Data in the CrunchBase data space. It also allows them to "Mesh" (rather than "Mash") CrunchBase data with other Linked Data sources on the Web without writing a single line of code.

CrunchBase: You have been immersed in the Semantic Web movement for a while now. How did you first get interested in the Semantic Web?
Me: We saw the Semantic Web as a vehicle for standardizing conceptual views of heterogeneous data sources via context lenses (URIs). In 1998 as part of our strategy to expand our business beyond the development and deployment of ODBC, JDBC, and OLE-DB data providers, we decided to build a Virtual Database Engine (see: Virtuoso History), and in doing so we sought a standards based mechanism for the conceptual output of the data virtualization effort. As of the time of the seminal unveiling of the Semantic Web in 1998 we were clear about two things, in relation to the effects of the Web and Internet data management infrastructure inflections: 1) Existing DBMS technology had reached it limits 2) Web Servers would ultimately hit their functional limits. These fundamental realities compelled us to develop Virtuoso with an eye to leveraging the Semantic Web as a vehicle from completing its technical roadmap.

CrunchBase: Can you put into layman’s terms exactly what RDF and SPARQL are and why they are important? Do they only matter for developers or will they extend past developers at some point and be used by website visitors as well?
Me: RDF (Resource Description Framework) is a Graph based Data Model that facilitates resource description using the Subject, Predicate, and Object principle. Associated with the core data model, as part of the overall framework, are a number of markup languages for expressing your descriptions (just as you express presentation markup semantics in HTML or document structure semantics in XML) that include: RDFa (simple extension of HTML markup for embedding descriptions of things in a page), N3 (a human friendly markup for describing resources), RDF/XML (a machine friendly markup for describing resources).
SPARQL is the query language associated with the RDF Data Model, just as SQL is a query language associated with the Relational Database Model. Thus, when you have RDF based structured and linked data on the Web, you can query against Web using SPARQL just as you would against an Oracle/SQL Server/DB2/Informix/Ingres/MySQL/etc.. DBMS using SQL. That's it in a nutshell.

CrunchBase: On your website you wrote that “RDF and SPARQL as productivity boosters in everyday web development”. Can you elaborate on why you believe that to be true?
Me: I think the ability to discern a formal description of anything via its discrete properties is of immense value re. productivity, especially when the capability in question results in a graph of Linked Data that isn't confined to a specific host operating system, database engine, application or service, programming language, or development framework. RDF Linked Data is about infrastructure for the true materialization of the "Information at Your Fingertips" vision of yore. Even though it's taken the emergence of RDF Linked Data to make the aforementioned vision tractable, the comprehension of the vision's intrinsic value have been clear for a very long time. Most organizations and/or individuals are quite familiar with the adage: Knowledge is Power, well there isn't any knowledge without accessible Information, and there isn't any accessible Information without accessible Data. The Web has always be grounded in accessibility to data (albeit via compound container documents called Web Pages).
Bottom line, RDF based Linked Data is about Open Data access by reference using URIs (HTTP based Entity IDs / Data Object IDs / Data Source Names), and as I said earlier, the intrinsic value is pretty obvious bearing in mind the costs associated with integrating disparate and heterogeneous data sources -- across intranets, extranets, and the Internet.

CrunchBase: In his definition of Web 3.0, Nova Spivack proposes that the Semantic Web, or Semantic Web technologies, will be force behind much of the innovation that will occur during Web 3.0. Do you agree with Nova Spivack? What role, if any, do you feel the Semantic Web will play in Web 3.0?
Me: I agree with Nova. But I see Web 3.0 as a phase within the Semantic Web innovation continuum. Web 3.0 exists because Web 2.0 exists. Both of these Web versions express usage and technology focus patterns. Web 2.0 is about the use of Open Source technologies to fashion Web Services that are ultimately used to drive proprietary Software as Service (SaaS) style solutions. Web 3.0 is about the use of "Smart Data Access" to fashion a new generation of Linked Data aware Web Services and solutions that exploit the federated nature of the Web to maximum effect; proprietary branding will simply be conveyed via quality of data (cleanliness, context fidelity, and comprehension of privacy) exposed by URIs.

Here are some examples of the CrunchBase Linked Data Space, as projected via our CruncBase Sponger Cartridge:

  1. Amazon.com
  2. Microsoft
  3. Google
  4. Apple
]]>
Crunchbase & Semantic Web Interview (Remix - Update 1)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1424Thu, 28 Aug 2008 00:35:15 GMT32008-08-27T20:35:15-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
At OpenLink, we've been investigating LinqToRdf, an exciting project from Andrew Matthews that seeks to expose the Semantic Web technology space to the large community of .NET developers.

The LinqToRdf project is about binding LINQ to RDF. It sits atop Joshua Tauberer's C# based Semantic Web/RDF library which has been out there for a while and works across Microsoft .NET and it's open source variant "Mono".

Historically, the Semantic Web realm has been dominated by RDF frameworks such as Sesame, Jena and Redland; which by their Open Source orientation, predominantly favor non-Windows platforms (Java and Linux). Conversely, Microsoft's .NET frameworks have sought to offer Conceptualization technology for heterogeneous Logical Data Sources via .NET's Entity Frameworks and ADO.NET, but without any actual bindings to RDF.

Interestingly, believe it or not, .NET already has a data query language that shares a number of similarities with SPARQL, called Entity-SQL, and a very innovative programming language called LINQ; that offers a blend of constructs for natural data access and manipulation across relational (SQL), hierarchical (XML), and graph (Object) models without the traditional object language->database impedance tensions of the past.

With regards to all of the above, we've just released a mini white paper that covers the exploitation of RDF-based Linked Data using .NET via LINQ. The paper offers a an overview of LinqToRdf, plus enhancements we've contributed to the project (available in LinqToRdf v0.8.). The paper includes real-world examples that tap into a MusicBrainz powered Linked Data Space, the Music Ontology, the Virtuoso RDF Quad Store, Virtuoso Sponger Middleware, and our RDfization Cartridges for Musicbrainz.

Enjoy!]]>
.NET, LINQ, and RDF based Linked Data (Update 2)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1408Fri, 08 Aug 2008 12:54:01 GMT42008-08-08T08:54:01.000002-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Linked Data meme continues on it's quest to unravel the mysteries of the Semantic Web vision, it's quite gratifying to see that data virtualization comprehension: creating "Conceptual Views" into logically organized "Disparate & Heterogeneous Data Sources" via "Context Lenses" is taking shape, as illustrated in the "note-to-self" post by David Provost.




Virtualization of heterogeneous data sources is only achievable if you have a dexterous data model based "Bus" into which the data sources are plugged. RDF has offered such a model for a long time.







When heterogeneous data sources are plugged into an RDF based integration bus e.g., customer records sourced from a variety of tables, across a plethora of databases, you can only end up with true value if the emergent entities from such an effort are coherently linked and (de)referencable; which is what Linked Data's fundamental preoccupation with dereferencable URIs is all about. Of course, Even when you have all of the above in place, you also need to be able to construct "Context Lenses" i.e., context driven views of the Linked Data Mesh (or Linked Data Spaces).


Additional Diagrams:


1. Clients of the RDF Bus
2. RDF Bus Server plugins: Scripts that emit RDF
3. RDF Bus Servers: RDF Data Managers (Triple or Quad Stores)
4. RDF Bus Servers: Relational to RDF Mappers (RDF Views, Semantic Covers etc.)
5. RDF Bus Server plugins: XML to RDF Mappers
6. RDF Bus Server plugins: GRDDL based XSLT stylesheets that emit RDF
7. RDF Bus Server plugins: Intelligent RDF Middleware






]]>
Time for Context Lenses (Update)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1405Mon, 04 Aug 2008 15:24:50 GMT32008-08-04T11:24:50.000001-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
ODBC delivers open data access (by reference) to a broad range of enterprise databases via a 'C' based API. Thanks to the iODBC and unixODBC projects, ODBC is available across broad range of platforms beyond Windows.

ODBC identifies data sources using Data Source Names (DSNs).

WODBC (Web Open Database Connectivity) delivers open data access to Web Databases / Data Spaces. The Data Source Naming scheme: URI or IRI, is HTTP based thereby enabling data access by reference via the Web.

ODBC DSNs bind ODBC client applications to Tables, Views, Stored Procedures.

WODBC DSNs bind you to a Data Space (e.g. my FOAF based Profile Page where you can use the "Explore Data Tab" to look around if you are a human visitor) or a specific Entity within a Data Space (i.e Person Entity Me).

ODBC Drivers are built using APIs (DBMS Call Level Interfaces) provided by DBMS vendors. Thus, a DBMS vendor can chose not to release an API, or do so selectivity, for competitive advantage or market disruption purposes (it's happened!).

WODBC Drivers are also built using APIs (Web Services associated with a Web Data Space). These drivers are also referred to as RDF Middleware or RDFizers. The "Web" component of WODBC ensures openness, you publish Data with URIs from your Linked Data Server and that's it; your data space or specific data entities are live and accessible (by reference) over the Web!

So we have come full circle (or cycle), the Web is becoming more of a structured database everyday! What's new is old, and what's old is new!

Data Access is everything, without "Data" there is no information or knowledge. Without "Data" there's not notion of vitality, purpose, or value.

URIs make or break everything in the Linked Data Web just as ODBC DSNs do within the enterprise.

I've deliberately left JDBC, ADO.NET, and OLE-DB out of this piece due to their respective programming languages and frameworks specificity. None of these mechanisms match the platform availability breadth of ODBC.

The Web as a true M-V-C pattern is now crystalizing. The "M" (Model) component of M-V-C is finally rising to the realm of broad attention courtesy of the "Linked Data" meme and "Semantic Web" vision.

By the way, M-V-C lines up nicely with Web 1.0 (Web Forms / Pages), Web 2.0 (Web Services based APIs), and Web 3.0 (Data Web, Web of Data, or Linked Data Web) :-)

]]>
ODBC & WODBC Comparisonhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1364Tue, 20 May 2008 19:46:11 GMT12008-05-20T15:46:11-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Unfortunately, I could only spend 4 days at the recent WWW2008 event in Beijing (I departed the morning following the Linked Data Workshop), so I couldn't take my slot on the "Commercializing the Semantic Web panel" etc.. Anyway, thanks to the Web I can still inject my points of view in the broad Web based discourse. Well so I hoped, when I attempted to post a comment to Paul Miller's ZDNet domain hosted blog thread titled: Commercialising the Semantic Web.

Unfortunately, the cost of completing ZDNet's unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I'll settle for a trackback ping instead.

What follows is the cut and paste of my intended comment contributions to Paul's post.

Paul,

As discussed earlier this week during our podcast session, commercialization of Semantic Web technology shouldn't be a mercurial matter at this stage in the game :-) It's all about looking at how it provides value :-)

From the Linked Data angle, the ability to produce, dispatch, and exploit "Context" across an array of "Perspectives" from a plethora of disparate data sources on the Web and/or behind corporate firewalls, offers immense commercial value.

Yahoo's Searchmonkey effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as "value consumption tickets" (Data Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to data encountered on the Web. Yahoo! is about to put this light on in a big way (imho).

The "self annotating" nature of the Web is what ultimately drives the manifestation of the long awaited Semantic Web. I believe I postulated about "Self Annotation & the Semantic Web" in a number of prior posts which, by the way, should be DataRSS compatible right now due to Yahoo's support of OpenSearch Data Providers (which this Blog Space has been for eons).

Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, Tag, Weblog, Shared Bookmark, Wikiword, Microformat, Microformat++ (eRDF or RDFa), GRDDL stylesheet, and RDFizer etc.. is a piece of structured data.

Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)

Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (information resource), and then you add Structure to the information resource (RSS, Atom, microformats, RDFa, eRDF, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to Linked Data) is a synch thanks to RDF Middleware (as per earlier RDF middleware posts).

]]>
Commercializing the Semantic Webhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1363Sun, 18 May 2008 14:58:26 GMT12008-05-18T10:58:26.000003-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Unfortunately, I could only spend 4 days at the recent WWW2008 event in Beijing (I departed the morning following the Linked Data Workshop), so I couldn't take my slot on the "Commercializing the Semantic Web panel" etc.. Anyway, thanks to the Web I can still inject my points of view in the broad Web based discourse. Well so I hoped, when I attempted to post a comment to Paul Miller's ZDNet domain hosted blog thread titled: Commercialising the Semantic Web.

Unfortunately, the cost of completing ZDNet's unwieldy signup process simply exceeded the benefits of dropping my comments in their particular space :-( Thus, I'll settle for a trackback ping instead.

What follows is the cut and paste of my intended comment contributions to Paul's post.

Paul,

As discussed earlier this week during our podcast session, commercialization of Semantic Web technology shouldn't be a mercurial matter at this stage in the game :-) It's all about looking at how it provides value :-)

From the Linked Data angle, the ability to produce, dispatch, and exploit "Context" across an array of "Perspectives" from a plethora of disparate data sources on the Web and/or behind corporate firewalls, offers immense commercial value.

Yahoo's Searchmonkey effort will certainly bring clarity to some of the points I made during the podcast re. the role of URIs as "value consumption tickets" (Data Services are exposed via URIs). There has to be a trigger (in user space) that compels Web users to seek broader, or simply varied, perspectives as a response to data encountered on the Web. Yahoo! is about to put this light on in a big way (imho).

The "self annotating" nature of the Web is what ultimately drives the manifestation of the long awaited Semantic Web. I believe I postulated about "Self Annotation & the Semantic Web" in a number of prior posts which, by the way, should be DataRSS compatible right now due to Yahoo's support of OpenSearch Data Providers (which this Blog Space has been for eons).

Today, have many communities adding strucuture to the Web (via their respective tools of preference) without explicitly realizing what they are contributing. Every RSS/Atom feed, Tag, Weblog, Shared Bookmark, Wikiword, Microformat, Microformat++ (eRDF or RDFa), GRDDL stylesheet, and RDFizer etc.. is a piece of structured data.

Finally, the different communities are all finding ways to work together (thank heavens!) and the results are going to be cataclysmic when it all plays out :-)

Data, Structure, and Extraction are the keys to the Semantic Life! First you get the Data in a container (information resource), and then you add Structure to the information resource (RSS, Atom, microformats, RDFa, eRDF, SIOC, FOAF, etc.), once you have Structure RDFization (i.e. transformation to Linked Data) is a synch thanks to RDF Middleware (as per earlier RDF middleware posts).

]]>
Commercializing the Semantic Webhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1362Fri, 16 May 2008 20:15:29 GMT12008-05-16T16:15:29.000001-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Wordpress is a Weblog platform comprised of the following:

  1. User Interface - PHP
  2. Application Logic - PHP
  3. Data Storage (SQL RDBMS) - MySQL via PHP-MySQL
  4. Application Server - Apache

In the form above (the norm), Wordpress data can be injected into the Linked Data Web via RDFization middleware such as theVirtuoso Sponger (built into all Virtuoso instances) and Triplr. The downside of this approach is that the blog owner doesn't necessary possess full control over their contributions to the emerging Giant Global Graph or Linked Data.

Another route to Linked Data exposure is via Virtuoso's Metaschema Language for producing RDF Views over ODBC/JDBC accessible Data Sources, that enables the following setup:

  1. User Interface - PHP
  2. Application Logic - PHP
  3. Data Storage (SQL RDBMS) - MySQL via the PHP-MySQL data access interface
  4. Virtual Database linkage of MySQL Tables into Virtuoso
  5. RDF View generated over the Virtual SQL Tables
  6. Application Server - Virtuoso which provides Linked Data Deployment such that RDF Linked Data is exposed when requested by Web User Agents.

Alternatively, you can also exploit Virtuoso as the SQL DBMS, RDF DBMS, Application Server, and Linked Data Deployment platform:

  1. User Interface - PHP
  2. Application Logic - PHP
  3. Data Storage (SQL RDBMS) - Virtuoso via PHP-ODBC data access interface (* ODBC is Virtuoso's native SQL CLI/API *)
  4. RDF View generated over the Native SQL Tables
  5. Application Server - Virtuoso which provides Linked Data Deployment such that RDF Linked Data is exposed when requested by Web User Agents (e.g. OpenLink RDF Browser, Zitgist Data Viewer, DISCO Hyperdata Browser, and Tabulator).

Benefits?

  • Each user account gets a proper Linked Data URI (ID) that can me meshed/smushed with other IDs (so you add data from this new blog space to other linked data sources associated with you other URIs/IDs)
  • Each post gets a proper URI All data is now query-able via SPARQL Discoverability increases exponentially (without drop in relevance in either direction i.e. discovering or being discovered)

How Do I map the WordPress SQL Schema to RDF using Virtuoso?

  • Determine the RDF Schema or Ontologies that define the Classes for which you will be producing instance data (e.g. SIOC and FOAF)
  • Declare URI/IRI generator functions (*special Virtuoso functions*)
  • Use SPARQL Graph patterns to apply URI/IRI generator functions to Tables, Views, Table Values mode Stored Procedures, Query Resultsets as part of RDBMS to RDF mapping

Read the Meta Schema Language guide or simply apply our "WordPress SQL Schema to RDF" script to your Virtuoso hosted instance. Of course, there are other mappings that cover other PHP applications deployed via Virtuoso:

Live Demos?

]]>
Adding Wordpress Blogs into the Linked Data Web using Virtuosohttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1333Thu, 10 Apr 2008 16:33:05 GMT42008-04-10T12:33:05.000003-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
John Schmidt, from Informatica, penned an interesting post titled: IT Doesn't Matter - Integration Does.

Yes, integration is hard, but I do profoundly believe that what's been happening on the Web over the last 10 or so years also applies to the Enterprise, and by this I absolutely do not mean "Enterprise 2.0" since "2.0" and productive agility do not compute in my realm of discourse.

large collections of RSS feeds, Wikiwords, Shared Bookmarks, Discussion Forums etc.. when disconnected at the data level (i.e. hosted in pages with no access to the "data behind") simply offer information deluge and inertia (there are only so many hours for processing opaque information sources in a given day).

Enterprises fundamentally need to process information efficiently as part of a perpetual assessment of their relative competitive Strengths, Weaknesses, Opportunities, and Threats (SWOT), in existing and/or future markets. Historically, IT acquisitions have run counter intuitively to the aforementioned quest for "Ability" due to the predominance of "rip and replace" approach technology acquisition that repeatedly creates and perpetuates information silos across Application, Database, Operating System, Development Environment boundaries. The sequence of events typically occurs as follows:

  1. applications are acquired on a problem by problem basis
  2. back-end application databases are discovered once ad-hoc information views are sought by information workers
  3. back-end database disparity across applications is discovered once holistic views are sought by knowledge workers (typically domain experts).

In the early to mid 90's (pre ubiquitous Web), operating system, programming language, operating system, and development framework independence inside the enterprise was technically achievable via ODBC (due to it's platform independence). That said, DBMS specific ODBC channels alone couldn't address the holistic requirements associated with Conceptual Views of disparate data sources, hence the need for Data Access Virtualization via Virtual Database Engine technology.

Just as is the case on the Web today, with the emergence of the "Linked Data" meme, enterprises now have a powerful mechanism for exploiting the Data Integration benefits associated with generating Data Objects from disparate data sources, endowed with HTTP based IDs (URIs).

Conceptualizing access to data exposed Databases APIs, SOA based Web Services (SOAP style Web Services), Web 2.0 APIs (REST style Web Services), XML Views of SQL Data (SQLX), pure XML etc.. is problem area addressed by RDF aware middleware (RDFizers e.g Virtuoso Sponger).

Here are examples of what SQL Rows exposed as RDF Data Objects (identified using HTTP based URIs) would look like outside or behind a corporate firewall:

What's Good for the Web Goose (Personal Data Space URIs) is good for the Enterprise Gander (Enterprise Data Space URIs).

Related

]]>
Linked Data is vital to Enterprise Integration driven Agilityhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1325Sat, 22 Mar 2008 18:13:41 GMT22008-03-22T14:13:41.000002-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
In response to the ReadWriteWeb piece titled: Semantic Web: What is the Killer App. by Alex Iskold:

Information overload and Data Portability are two of the most pressing and imminent challenges affecting every individual connected to the global village exposed by the Internet and World Wide Web. I wrote an earlier post titled: Why We Need Linked Data that shed light on frequently overlooked realities about the Document Web.

The real Killer application of the Semantic Web (imho) is Linked Data (or Hyperdata), just as the killer application of the Document Web was Linked Documents (Hyperlinks). Linked Data enables human users (indirectly) and software agents (directly in response to human instruction) to traverse Web Data Spaces (Linked Data enclaves within the Giant Global Graph).

Semantic Web applications (conduits between humans and agents) that take advantage of Linked Data include:

DBpedia - General Knowledge sourced from Wikipedia and a host of other Linked Data Spaces.

Various Linked Data Browsers: Zitgist Data Viewer, OpenLink RDF Browser, DISCO Browser, and TimBL's Tabulator.

zLknks - Linked Data Lookup technology for Web Content Publishing systems (note: more to come on this in a future post).

OpenLink Data Spaces - a solution for Data Portability via a Linked Data Junction Box for Web 1.0 ((X)HTML Document Webs), 2.0 (XML Web Services based Content Publishing, Content Syndication, and Aggregation), and 3.0 (Linked Data) Data Spaces. Thus, via my URI (when viewed through a Linked Data Browser/Viewer) you can traverse my Data Space (i.e my Linked Data Graph) generated by the following activities:

    Blog Posts publishing
    My RSS & Atom Content Subscriptions (what used to be called a "Blogroll")
    My Bookmarks (from my Desktop and Del.icio.us)
    and other things I choose to share with the public via the Web

Virtuoso - a Universal Server Platform that includes RDF Data Management, RDFization Middleware, SQL-RDF Mapping, RDF Linked Data Deployment, alongside a hybrid/multi-model, virtual/federated data service in a single product offering.

BTW - There is a Linked Data Workshop at this years World Wide Web conference. Also note the Healthcare & Life Science Workshop which is a related Linked Data technology and Semantic Web best practices realm. ]]>
Semantic Web Killer Application?http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1293Tue, 05 Feb 2008 01:32:42 GMT92008-02-04T20:32:42.000003-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
As 2007 came to a close I repeatedly mulled over the idea of putting together a usual "year in review" and a set of predictions for the coming year etc. Anyway, the more I pondered, the smaller the list became. While pondering (as 2008 rolled around), the Blogosphere was set ablaze with the Robert Scoble's announcement of his account suspension by Facebook. Of course, many chimed in expressing views either side of the ensuing debate: Who is right -- Scoble or Facebook. The more I assimilated the views expressed about this event, the more ironic I found the general discourse, for the following reasons:

  1. Web 2.0 is fundamentally about Web Services as the prime vehicle for interactions across "points of Web presence"
  2. Facebook is a Web 2.0 hosted service for social networking that provides Web Services APIs for accessing data in the Facebook data space. You have to do so "on the fly" within clearly defined constraints i.e you can interact with data across your social network via Facebook APIs, but you cannot cache the data (perform an export style dump of the data)
  3. Facebook is a main driver of the term: "social graph", but their underlying data model is relational and the Web Services response (data you get back) doesn't return a data graph, instead it returns an tree (i.e XML)
  4. Scoble's had a number of close encounters with Linked Data Web | Semantic Data Web | Web 3.0 aficionados in various forms throughout 2007, but still doesn't quite make the connection between Web Services APIs as part of a processing pipeline that includes structured data extraction from XML data en route to producing Data Graphs comprised of Data Objects (Entities) endowed with: Unique Identifiers, Classification or Categorization schemes, Attributes, and Relationships prescribed by one or more shared Data Dictionaries/Schemas/Ontologies
  5. A global information bus that exposes a Linked Data mesh comprised of Data Objects, Object Attributes, and Object Relationships across "points of Web presence" is what TimBL described in 1998 (Semantic Web Roadmap) and more recently in 2007 (Giant Global Graph)
  6. The Linked Data mesh (i.e Linked Data Web or GGG) is anchored by the use of HTTP to mint Location, Structure, and Value independent Object Identifiers called URIs or IRIs. In addition, the Linked Data Web is also equipped with a query language, protocol, and results serialization format for XML and JSON called: SPARQL.

So, unlike Scoble, I am able to make my Facebook Data portable without violating Facebook rules (no data caching outside Facebook realm) by doing the following:

  1. Use an RDFizer for Facebook to convert XML response data from Facebook Web Services into RDF "on the fly" Ensure that my RDF is comprised of Object Identifiers that are HTTP based and thereby dereferencable (i.e. I can use SPARQL to unravel the Linked Data Graph in my Facebook data space)
  2. The act of data dereferencing enables me to expose my Facebook Data as Linked Data associated with my Personal URI
  3. This interaction only occurs via my data space and in all cases the interactions with data work via my RDFizer middleware (e.g the Virtuoso Sponger) that talks directly to Facebook Web Services.

In a nutshell, my Linked Data Space enables you to reference data in my data space via Object Identifiers (URIs), and some cases the Object IDs and Graphs are constructed on the fly via RDFization middleware.

Here are my URIs that provide different paths to my Facebook Data Space:

To conclude, 2008 is clearly the inflection year during which we will final unshackle Data and Identity from the confines of "Web Data Silos" by leveraging the HTTP, SPARQL, and RDF induced virtues of Linked Data.

Related Posts:

  1. 2008 and the Rise of Linked Data
  2. Scoble Right, Wrong, and Beyond
  3. Scoble interviewing TimBL (note to Scoble: re-watch your interview since he made some specific points about Linked Data and URIs that you need to grasp)
  4. Prior Blog posts my this Blog Data Space that include the literal patterns: Scoble Semantic Web
]]>
2008, Facebook Data Portability, and the Giant Global Graph of Linked Datahttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1289Mon, 07 Jan 2008 16:44:42 GMT32008-01-07T11:44:42.000007-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>

"The phrase Open Social implies portability of personal and social data. That would be exciting but there are entirely different protocols underway to deal with those ideas. As some people have told me tonight, it may have been more accurate to call this "OpenWidget" - though the press wouldn't have been as good. We've been waiting for data and identity portability - is this all we get?"
[Source: Read/Write Web's Commentary & Analysis of Google's OpenSocial API]

..Perhaps the world will read the terms of use of the API, and realize this is not an open API; this is a free API, owned and controlled by one company only: Google. Hopefully, the world will remember another time when Google offered a free API and then pulled it. Maybe the world will also take a deeper look and realize that the functionality is dependent on Google hosted technology, which has its own terms of service (including adding ads at the discretion of Google), and that building an OpenSocial application ties Google into your application, and Google into every social networking site that buys into the Dream. Hopefully the world will remember. Unlikely, though, as such memories are typically filtered in the Great Noise....

[Source: Poignant commentary excerpt from Shelly Power's Blog (as always)]

The "Semantic Data Web" vision has always been about "Data & Identity" portability across the Web. Its been that and more from day one.

In a nutshell, we continue to exhibit varying degrees of Cognitive Dissonance re the following realities:

  1. The Network is the Computer (Internet/Intranet/Extranet depending on your TCP/IP usage scenarios)
  2. The Web is the OS (ditto) and it provides a communications subsystem (Information BUS) comprised of
    • - URIs (pointer system for identifying, accessing, and manipulating data)
  3. HTTP based Interprocess (i.e Web Apps are processes when you discard the HTML UI and interact with the application logic containers called "Web Services" behind the pages) ultimately hit data
  4. Web Data is best Modeled as a Graph (RDF, Containers/Items/Item Types, Property & Value Pairs associated with something, and other labels)
  5. Network are Graphs and vice versa
  6. Social Networks are graphs where nodes are connected via social connectors ( [x]--knows-->[y] )
  7. The Web is a Graph that exposes a People and Data Network (to the degree we allude to humans not being data containers i.e. just nodes in a network, otherwise we are talking about a Data Network)
  8. Data access and manipulation depends inherently on canonical Data Access mechanisms such as Data Source Identifiers / Names (time-tested practice in various DBMS realms)
  9. Data is forever, it is the basis of Information, and it is increasing exponentially due to proliferation of Web Services induced user activities (User Generated Content)
  10. Survival, Vitality, Longevity, Efficiency, Productivity etc.. are all depend on our ability to process data effectively in a shrinking time continuum where Data and/or Information overload is the alternative.

The Data Web is about Presence over Eyeballs due to the following realities:

  1. Eyeballs are input devices for a DNA based processing system (Humans). The aforementioned processing system can reason very well, but simply cannot effectively process masses of data or information
  2. Widgets offer little value long term re. the imminent data and information overload dilemma, ditto Web pages (however pretty), and any other Eyeballs-only centric Web Apps
  3. Computers (machines) are equipped with inorganic (non DNA) based processing power, they are equipped to process huge volumes of data and/or information, but they cannot reason
  4. To be effective in the emerging frontier comprised of a Network Computer and a Web OS, we need an effective mechanism that makes best use of the capabilities possessed by humans and machines, by shifting the focus to creation and interaction with points of "Data Web Presence" that openly expose "Structured Linked Data".

This is why we need to inject a mesh of Linked Data into the existing Web. This is what the often misunderstood vision of the "Semantic Data Web" or "Web of Data" or "Web or Structured Data" is all about.

As stated earlier (point 10 above), "Data is forever" and there is only more of it to come! Sociality and associated Social Networking oriented solutions are at best a spec in the Web's ocean of data once you comprehend this reality.

Note: I am writing this post as an early implementor of GData and an implementor of RDF Linked Data technology and a "Web Purist".

OpenSocial implementation and support across our relevant product families: Virtuoso (i.e the Sponger Middleware for RDF component), OpenLink Data Spaces (Data Space Controller / Services), and the OpenLink Ajaxt Toolkit (i.e OAT Widgets and Libraries), is a triviality now that the OpenSocial APIs are public.

The concern I have, and the problem that remains mangled in the vast realms of Web Architecture incomprehension, is the fact that GData and GData based APIs cannot deliver Structured Linked Data in line with the essence of the Web without introducing "lock-in" that ultimately compromises the "Open Purity" of the Web. Facebook and Google's OpenSocial response to the Facebook juggernaut (i.e. open variant of the Facebook Activity Dashboard and Social Network functionality realms, primarily), are at best icebergs in the ocean we know as the "World Wide Web". The nice and predictable thing about icebergs is that they ultimately melt into the larger ocean :-)

On a related note, I had the pleasure of attending the W3C's RDF and DBMS Integration Workshop, last week. The event was well attended by organizations with knowledge, experience, and a vested interested in addressing the issues associated with exposing none RDF data (e.g. SQL) as RDF, and the imminence of data and/or information overload covered in different ways via the following presentations: . ]]>
Reminder: Why We Need Linked Data!http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1267Fri, 02 Nov 2007 22:52:34 GMT52007-11-02T18:52:34-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
I've written extensively on the subject of Data Spaces in relation to the Data Web for while. I've also written sparingly about OpenLink Data Spaces (a Data Web Platform that build using Virtuoso). On the other hand, I haven't shed much light on installation and deployment of OpenLink Data Spaces.

Jon Udell recently penned a post titled: The Fourth Platform. The post arrives at a spookily coincidental time (this happens quite often between Jon and I as demonstrated last year during our podcast; the "Fourth" in his Innovators Podcast series).

The platform that Jon describes is "Cloud Based" and comprised of Storage and Computation. I would like to add Data Access and Management (native and virtual) under the fourth platform banner with the end product called: "Cloud based Data Spaces".

As I write, we are releasing a Virtuoso AMI (Amazon Image) labeled: virtuoso-dataspace-server. This edition of Virtuoso includes the OpenLink Data Spaces Layer and all of the OAT applications we've been developing for a while.

What Benefits Does this offer?

  1. Personal Data Spaces in the Cloud - a place where you can control and consolidate data across your Blogs, Wikis, RSS/Atom Feed Subscriptions, Shared Bookmarks, Shared Calendars, Discussion Threads, Photo Galleries etc
  2. All the data in your Data Space is SPARQL or GData accessible.
  3. All of the data in your Personal Data Space is Linked Data from the get go. Each Item of data is URI addressable
  4. SIOC support - your Blogs, Wikis, Bookmarks etc.. are based on the SIOC ontology for Semantically Interlinking Online Communities (think: Open social-graph++)
  5. FOAF support - your FOAF Profile page provides a URI that is an in-road to all Data in your Data Space.
  6. OpenID support - your Personal Data Space ID is usable wherever OpenID is supported. OpenID and FOAF are integrated as per latest FOAF specs
  7. Two Integration with Facebook - You can access your Data Space from Facebook or access Facebook from your Data Space
  8. Unified Storage - The WebDAV based filesystem provides Cloud Storage that's integrated with Amazon S3; It also exposes all of your Data Space data via a traditional filesystem UI (think virtual Spotlight); You can also mount this drive to your local filesystem via your native operating system's WebDAV support
  9. SyncML - you can sync calendar and contact details with your Data Space in the cloud from your Mobile phone.
  10. A practical Semantic Data Web solution - based on Web Infrastructure and doesn't require you to do anything beyond exposing URIs for data in your Data Spaces.

EC2-AMI Details:

    AMI ID: ami-e2ca2f8b
    Manifest file: virtuoso-images/virtuoso-dataspace-server.manifest.xml

Installation Guide:

  1. Get an Amazon Web Services (AWS) account
  2. Signup for S3 and EC2 services
  3. Install the EC2 plugin for Firefox
  4. Start the EC2 plugin
  5. Locate the row containing ami-7c31d515  Manifest virtuoso-test/virtuoso-cloud-beta-9-i386.manifest.xml (sort using the AMI ID or Manifest Columns or search on pattern: virtuoso, due to name flux)
  6. Start the Virtuoso Data Space Server AMI
  7. Wait 4-5 minutes (*take a few minutes to create the pre-configured Linux Image*)
  8. Connect to http://http://your-ec2-instance-cname:8890/ Log in with user/password dba/dba
  9. Go to the Admin UI (Virtuoso Conductor) and change the PWDs for the 'dba' and 'dav' accounts (*Important!*)
  10. Give the "SPARQL" user "SPARQL_UPDATE" privileges (required if you want to exploit the in-built Sponger Middleware)
  11. Click on the ODS (OpenLink Data Spaces) link to start an Personal Editon of OpenLink Data Spaces (or go to: http://your-ec2-instance-cname/dataspace/ods/index.html)
  12. Log-in using the username and password credentials for the 'dav' account (or register a new user note: OpenID is an option here also) Create an Data Space Application Instance by clicking on a Data Space App. Tab
  13. Import data from your existing Web 2.0 style applications into OpenLink Data Spaces e.g. subscribe to a few RSS/Atom feeds via the "Feeds Manager" application or import some Bookmarks using the "Bookmarks" application
  14. Then look at the imported data in Linked Data form via your ODS generated URIs based on the patterns: http://your-ec2-instance-cname/dataspace/person/your-ods-id#this (URI for You the Person), http://your-ec2-instance-cname/dataspace/person/your-ods-id (FOAF File URI), http://your-ec2-instance-cname/dataspace/your-ods-id (SIOC File URI)

(OAT) from your Data Space instance

Install the OAT VAD package via the Admin UI and then apply the URI patterns below within your browser:
  1. http://:8890/oatdemo - Entire OAT Demo Collection
  2. http://:8890/rdfbrowser - RDF Browser
  3. http://:8890/isparql - SPARQL Query Builder (iSPARQL)
  4. http://:8890/qbe - SQL Query Builder (iSQL)
  5. http://:8890/formdesigner - Forms Builder (for building Meshups based on RDF, SQL, or Web Servives Data Souces)
  6. http://:8890/dbdesigner - SQL DB Schema Designer (note a Visual SQL-RDF Mapper is also on it's way
  7. http://:8890/DAV/JS/ - To view the OAT Tree (there are some experimental demos that are missing from the main demo app etc..)

There's more to come!

]]>
Fourth Platform: Data Spaces in The Cloud (Update)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1261Sun, 26 Oct 2008 21:59:33 GMT202008-10-26T17:59:33-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
The motivation behind this post is a response to the Read/WriteWeb post titled: Semantic Web: Difficulties with the Classic Approach.

First off, I am going to focus on the Semantic Data Web aspect of the overall Semantic Web vision (a continuum) as this is what we have now. I am also writing this post as a deliberate contribution to the discourse swirling around the real topic: Semantic Web Value Proposition.

Situation Analysis

We are in the early stages of the long anticipated Knowledge Economy. That being the case, it would be safe to assume that information access, processing, and dissemination are of utmost importance to individuals and organizations alike. You don't produce knowledge in a vacum! Likewise, you can produce Information in a vacum, you need Data.

The Semantic Data Web's value to Individuals

Problem:

Increasingly, Blogs, Wikis, Shared Bookmarks, Photo Galleries, Discussion Forums, Shared Calendars and the like, have become invaluable tools for individual and organizational participation in Web enabled global discourse (where a lot of knowledge is discovered). These tools, are typically associated with Web 2.0, implying Read-Write access via Web Services, centralized application hosting, and data lock-in (silos).

The reality expressed above is a recipe for "Information Overload" and complete annihilation of ones effective pursuit and exploitation of knowledge due "Time Scarcity" (note: disconnecting is not an option). Information abundance is inversely related to available processing time (for humans in particular). In my case for instance, I was actively subscribed to over 500+ RSS feeds in 2003. As of today, I've simply stopped counting, and that's just my Weblog Data Space. Then add to that, all of the Discussions I track across Blogs, wikis, message boards, mailing lists, traditional usnet discussion forumns, and the like, and I think you get the picture.

Beyond information overload, Web 2.0 data is "Semi-Structured" by way of it's dominant data containers ((X)HTML, RSS, Atom documents and data streams etc.) lacking semantics that formally expose individual data items as distinct entities, endowed with unambiguous naming / identification, descriptive attributes (a type of property/predicate), and relationships (a type of property/predicate).

Solution:

Devise a standard for Structured Data Semantics that is compatible with the Web Information BUS.

Produce structured data (entities, entity types, entity relationships) from Web 1.0 and Web 2.0 resources that already exists on the Web such that individual entities, their attributes, and relationships are accessible and discernible to software agents (machines).

Once the entities are individually exposed, the next requirement is a mechanism for selective access to these entities i.e. a query language.

Semantic Data Web Technologies that facilitate the solution described above include:

Structured Data Standards:
    RDF - Data Model for structured data
    RDF/XML - A serialization format for RDF based structured data
    N3 / Turtle - more human friendly serialization formats for RDF based structured data
Entity Exposure & Generation:
    GRDDL - enables association between XHTML pages and XSLT stylesheets that facilitates loosely coupled "on the fly" extraction of RDF from non RDF documents
    RDFa - enables document publishers or viewers (i.e those repurposing or annotating) to embed structured data into existing XHTML documents
    eRDF - another option for embedding structured RDF data within (X)HTML documents
    RDF Middleware - typically incorporating GRDDL, RDFa, eRDF, and custom extraction and mapping as part of a structured data production pipeline
. Entity Naming & Identification:

Use of URIs or IRIs for uniquely identifying physical (HTML Documents, Image Files, Multimedia Files etc..) and abstract (People, Places, Music, and other abstract things).

Entity Access & Querying:

    SPARQL Query Language - the SQL analog of the Semantic Data Web that enables query constructs that target named entities, entity attributes, and entity relationships

The Semantic Data Web's value to Organizations

Problem:

Organizations are rife with a plethora of business systems that are built atop a myriad of database engines, sourced from a variety of DBMS vendors. A typical organization would have a different database engine, from a specific DBMS vendor, underlying critical business applications such as: Human Resource Management (HR), Customer Relationship Management (CRM), Accounting, Supply Chain Management etc. In a nutshell, you have DBMS Engines, and DBMS Schema heterogeneity permeating the IT infrastructure of organizations on a global scale, making Data & Information Integration the biggest headache across all IT driven organizations.

Solution:

Alleviation of the pain (costs) associated with Data & Information Integration.

Semantic Data Web offerings:

A dexterous data model (RDF) that enables the construction of conceptual views of disparate data sources across an organization based on existing web architecture components such as HTTP and URIs.

Existing middleware solutions that facilitate the exposure of SQL DBMS data as RDF based Structured Data include:

BTW - There is an upcoming W3C Workshop covering the integration of SQL and RDF data.

Conclusion

The Semantic Data Web is here, it's value delivery vehicle is the URI. The URI is a conduit to Interlinked Structured Data (RDF based Linked Data) derived from existing data sources on the World Wide Web alongside data continuously injected into the Web by organizations world wide. Ironically, the Semantic Data Web only platform that crystallizes the: Information at Your Fingertips vision, without development environment, operating system, application, or database lock-in. You simply click on a Linked Data URI and the serendipitous exploration and discovery of data commences.

The unobtrusive emergence of the Semantic Data Web is a reflection of the soundness of the underlying Semantic Web vision.

If you are excited about Mash-ups then your are a Semantic Web enthusiast and benefactor in the making, because you only "Mash" (brute force data extraction and interlinking) because you can't "Mesh" (natural data extraction and interlinking). Likewise, if you are a social-networking, open social-graph, or portable social-network enthusiast, then you are also a Semantic Data Web benefactor and enthusiasts, because your "values" (yes, the values associated with the properties that define you e.g your interests etc) are the fundamental basis for portable, open, social-networking, which is what the Semantic Data Web hands to you on a platter without compromise (i.e. data lock-in or loss of data ownership).

Some practical examples of Semantic Data Web prowess:
    DBpedia (*note: I deliberately use DBpedia URIs in my posts where I would otherwise have used a Wikipedia article URI*)
]]>
Semantic Web Value Propositionhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1254Fri, 21 Sep 2007 12:05:07 GMT32007-09-21T08:05:07.000009-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Ivan Herman just posted another nice example of practical RDFa usage in a blog post titled: Yet Another RDFa Proccessor. In his post, Ivan exposes a URI for his FOAF-in-RDFa file.

Since I am aggressively tracking RDFa developments, I decided to quickly view Ivan's FOAF-in-RDFa file via the OpenLink RDF Browser. The full implications are best understood when you click on each of the Browser's Tabs -- each providing a different perspective on this interesting addition to the Semantic Data Web (note: the Fresnel Tab which demonstrates declarative UI templating using N3).

What's Going on Here?

The OpenLink RDF Browser is a Rich Internet Application built using OAT (OpenLink Ajax Toolkit). In my case, I am deploying the RDF Browser from a Virtuoso instance, which implies that the Browser is able to use the Virtuoso Sponger Middleware (exposed as a REST Service at the Virtuoso instance endpoint: /proxy); which includes an RDFa Cartridge comprised of a metadata extractor and an RDF Schema / OWL Ontology mapper. That's it!

]]>
Yet Another RDFa Demohttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1249Tue, 05 Feb 2008 01:44:37 GMT22008-02-04T20:44:37.000009-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
I now have the first cut of a Facebook application called: Dynamic Linked Data Pages.

What is a Dynamic Linked Data Page (DLD)?

A dynamically generated Web Page comprised of Semantic Data Web style data links (formally typed links) and traditional Document Web links (generic links lacking type specificity).

Linked Data Pages will ultimately enable Facebook users to inject their public data into the Semantic Data Web as RDF based Linked Data. For instance, my Facebook Profile & Photo albums data is now available as RDF, without paying a cent of RDF handcrafting tax, thanks to the Virtuoso Sponger (middleware for producing RDF from non RDF data sources) which is now equipped with a new RDFizer Cartridger for the Facebook Query Language (FQL) and RESTful Web Service.

Demo Notes:

When you click on a link in DLD pages, you will be presented with a lookup that exposes the different interaction options associated with a given URI. Examples include:

  1. Explore - find attributes and relationships that apply to the clicked URI
  2. Dereference (get the attributes of the clicked URI)
  3. Bookmark - store the URI for subsequent use e.g meshing with other URIs from across the Web
  4. (X)HTML Page Open - traditional Document Web link (i.e. just opens another Web document as per usual)

Remember, the facebook URLs (links to web pages) are being converted, on the fly, into RDF based Structured Data ( graph model database) i.e Entity Sets that possess formally defined characteristics (attributes) and associations (relationships).

Dynamic Linked Data Pages

  1. My facebook Profile
  2. My facebook Photo Album

Saved RDF Browser Sessions

  1. My facebook Profile
  2. My facebook Photo Album

Saved SPARQL Query Definitions

  1. My facebook Profile Query
  2. My facebook Photo Album Query
]]>
Injecting Facebook Data into the Semantic Data Webhttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1237Wed, 11 Feb 2009 12:40:11 GMT22009-02-11T07:40:11-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Chris Bizer, Richard Cyganiak, and Tom Heath have just published a Linked Data Publishing Tutorial that provides a guide to the mechanics of Linked Data injection into the Semantic Data Web.

On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS.

What is an Information BUS?

Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol.

The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers).

What are Web Information Payloads?

HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc).

Examples of Information Resource and Data Source URIs:

Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL).

What about Structured Data?

Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following:

  1. HTML - Web Resource with presentation focused structure (Web 1.0 dominant payload form)
  2. XML - Web Resource with structure that separates presentation and data (Web 2.0's dominant payload form).

The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as an Information Resource. It's comprised of granular data items called "Entities", that expose fine grained properties values, individual and/or group characteristics (attributes), and relationships (associations) with other Entities.

Where is this all headed?

The Web is in the final stages of the 3rd phase of it's evolution. A phase characterized by the shuttling of structured data payloads (RDF) alongside less data oriented payloads (HTML, XHTML, XML etc.). As you can see, Linked Data and Structured Data are both terms used to describe the addition of more data centric payloads to the Web. Thus, you could view the process of creating a Structured Web of Linked Data as follows:

  1. Identify or Create Structured Data Sources
  2. Name these Data Sources using Data Source URIs
  3. Expose Structured Data Sources to the Web as Linked Data using Information Resource (conduit) URIs

Conclusions

The Semantic Data Web is an evolution of the current Web (an Information Space) that adds structured data payloads (RDF) to current, less data oriented, structured payloads (HTML, XHTML, XML, and others).

The Semantic Data Web is increasingly seen as an inevitability because it's rapidly reaching the point of critical mass (i.e. network effect kick-in). As a result, Data Web emphasis is moving away from: "What is the Semantic Data Web?" To: "How will Semantic Data Web make our globally interconnected village an even better place?", relative to the contributions accrued from the Web thus far. Remember, the initial "Document Web" (Web 1.0) bootstrapped because of the benefits it delivered to blurb-style content publishing (remember the term electronic brochure-ware?). Likewise, in the case of the "Services Web" (Web 2.0), the bootstrap occurred because it delivered platform independence to Web Application Developers - enabling them to expose application logic behind Web Services. It is my expectation that the Data Integration prowess of the Data Web will create a value exchange realm for data architects and other practitioners from the database and data access realms.

Related Items

  1. Mike Bergman's post about Semi-Structured Data
  2. My Posts covering Structured and Un-Structured Containers
]]>
Linked Data & The Web Information BUShttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1231Wed, 08 Aug 2007 22:26:55 GMT52007-08-08T18:26:55-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Mike Bergman has written a very detailed article about OpenLink Software and it's product portfolio that basically answers the question: What has OpenLink been Up To?

As the company's founder, it was quite compelling to read a third party article that accurately navigates and articulates the depth of work that we've undertaken since that seminal moment in 1997 when we decided to extend our product portfolio beyond the Universal Data Access Drivers family.

Of course I also take this opportunity to slip in another Semantic Data Web demo :-) Thus, take a look at this mother of all blog posts from Mike via the following:

  1. OpenLink RDF Browser Session
  2. Dynamic Data Web Page

Note: In both cases above, you use the "Explore" or "Dereference" options of the Data Link (typed hyperlink) to traverse the RDF data that has been materialized "on the fly" courtesy of Virtuoso's in-built RDF Middleware (called the Sponger).

BTW - I am assembling a collection of interesting DBpedia based Dynamic pages that showcase the depth of knowledge available from Wikipedia. If you're a current or future technology entrepreneur (or VC trying to grok the Semantic Web) then you certainly need to look at:

  1. Venture Capital
  2. Venture Capital Firms
  3. Venture Capitalists
  4. Entrepreneurs By Nationality
]]>
What's OpenLink Software been Up To?http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1187Tue, 05 Feb 2008 01:47:40 GMT22008-02-04T20:47:40.000001-05:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Web Data Spaces

Now that broader understanding of the Semantic Data Web is emerging, I would like to revisit the issue of "Data Spaces".

A Data Space is a place where Data Resides. It isn't inherently bound to a specific Data Model (Concept Oriented, Relational, Hierarchical etc..). Neither is it implicitly an access point to Data, Information, or Knowledge (the perception is purely determined through the experiences of the user agents interacting with the Data Space.

A Web Data Space is a Web accessible Data Space.

Real world example:

Today we increasing perform one of more of the following tasks as part of our professional and personal interactions on the Web:

  1. Blog via many service providers or personally managed weblog platforms
  2. Create Event Calendars via Upcoming.com and Eventful
  3. Maintain and participate in Social Networks (e.g. Facebook, Orkut, MySpace)
  4. Create and Participate in Discussions (note: when you comment on blogs or wikis for instance, you are participating in, or creating, a conversation)
  5. Track news by subscribing to RSS 1.0, RSS 2.0, or Atom Feeds
  6. Share Bookmarks & Tags via Del.icio.us and other Services
  7. Share Photos via Flickr
  8. Buy, Review, or Search for books via Amazon
  9. Participates in auctions via eBay
  10. Search for data via Google (of course!)

John Breslin has nice a animation depicting the creation of Web Data Spaces that drives home the point.

Web Data Space Silos

Unfortunately, what isn't as obvious to many netizens, is the fact that each of the activities above results in the creation of data that is put into some context by you the user. Even worse, you eventually realize that the service providers aren't particularly willing, or capable of, giving you unfettered access to your own data. Of course, this isn't always by design as the infrastructure behind the service can make this a nightmare from security and/or load balancing perspectives. Irrespective of cause, we end up creating our own "Data Spaces" all over the Web without a coherent mechanism for accessing and meshing these "Data Spaces".

What are Semantic Web Data Spaces?

Data Spaces on the Web that provide granular access to RDF Data.

What's OpenLink Data Spaces (ODS) About?

Short History

In anticipation of this the "Web Data Silo" challenge (an issue that we tackled within internal enterprise networks for years) we commenced the development (circa. 2001) of a distributed collaborative application suite called OpenLink Data Spaces (ODS). The project was never released to the public since the problems associated with the deliberate or inadvertent creation of Web Data silos hadn't really materialized (silos only emerged in concreted form after the emergence of the Blogosphere and Web 2.0). In addition, there wasn't a clear standard Query Language for the RDF based Web Data Model (i.e. the SPARQL Query Language didn't exist).

Today, ODS is delivered as a packaged solution (in Open Source and Commercial flavors) that alleviates the pain associated with Data Space Silos that exist on the Web and/or behind corporate firewalls. In either scenario, ODS simply allows you to create Open and Secure Data Spaces (via it's suite of applications) that expose data via SQL, RDF, XML oriented data access and data management technologies. Of course it also enables you to integrates transparently with existing 3rd party data space generators (Blogs, Wikis, Shared Bookmrks, Discussion etc. services) by supporting industry standards that cover:

  1. Content Publishing - Atom, Moveable Type, MetaWeblog, Blogger protocols
  2. Content Syndication Formats - RSS 1.0, RSS 2.0, Atom, OPML etc.
  3. Data Management - SQL, RDF, XML, Free Text
  4. Data Access - SQL, SPARQL, GData, Web Services (SOAP or REST styles), WebDAV/HTTP
  5. Semantic Data Web Middleware - GRDDL, XSLT, SPARQL, XPath/XQuery, HTTP (Content Negotiation) for producing RDF from non RDF Data ((X)HTML, Microformats, XML, Web Services Response Data etc).

Thus, by installing ODS on your Desktop, Workgroup, Enterprise, or public Web Server, you end up with a very powerful solution for creating Open Data access oriented presence on the "Semantic Data Web" without incurring any of the typically assumed "RDF Tax".

Naturally, ODS is built atop Virtuoso and of course it exploits Virtuoso's feature-set to the max. It's also beginning to exploit functionality offered by the OpenLink Ajax Toolkit (OAT).

]]>
Semantic Web Data Spaceshttp://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1185Fri, 13 Apr 2007 22:19:29 GMT12007-04-13T18:19:29.000001-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Scobleizer's had a Semantic Web Epiphany but can't quite nail down what his discovered in laymans prose :-)

Well, I'll have a crack at helping him out i.e. defining the Semantic Data Web in simple terms with linked examples :-)

Tip: Watch the recent TimBL video interview re. the Semantic Data Web before, during, or after reading this post.

Here goes!

The popular Web is a "Web of Documents". The Semantic Data Web is a "Web of Data". Going down a level, the popular web connects documents across the web via hyperlinks. The Semantic Data Web connects data on the web via hyperlinks. Next level, hyperlinks on the popular web have no inherent meaning (lack context beyond: "there is another document"). Hyperlinks on the Semantic Data Web have inherent meaning (they possess context: "there is a Book" or "there is a Person" or "this is a piece of Music" etc..).

Very simple example:

Click the traditional web document URLs for Dan Connolly and Tim Berners-Lee. Then attempt to discern how they are connected. Of course you will see some obvious connections by reading the text, but you won't easily discern other data driven connections. Basically, this is no different to reading about either individual in a print journal, bar the ability to click on hyperlinks that open up other pages. The Data Extraction process remains labour intensive :-(

Repeat the exercise using the traditional web document URLs as Data Web URIs, this time around, paste the hyperlinks above into an RDF aware Browser (in this case the OpenLink RDF Browser). Note, we are making a subtle but critical change i.e. the URLs are now being used as Semantic Data Web URIs (a small-big-deal kind of thing).

If you're impatient or simply strapped for time (aren't we all these days), simply take a look at these links:

  1. Dan Connolly (DanC) RDF Browser Session permalink
  2. Tim Berners-Lee (TimBL) RDF Browser Session permalink
  3. TimBL and DanC combined RDF Browser Session permalink

Note: There are other RDF Browsers out there such as:

  1. Tabulator
  2. DISCO
  3. Objectviewer

All of these RDF Browsers (or User Agents) demonstrate the same core concepts in subtly different ways.

If I haven't lost you, proceed to a post I wrote a few weeks ago titled: Hello Data Web (Take 3 - Feel the "RDF" Force).

If you've made it this far, simply head over to DBpedia for a lot of fun :-)

Note Re. my demos: we make use of SVG in our RDF Browser which makes them incompatible with IE (6 or 7) and Safari. That said, Firefox (1.5+), Opera 9.x, WebKit (Open Source Safari), and Camino work fine.

Note to Scoble:

All the Blogs, Wikis, Shared Bookmarks, Image Galleries, Discussion Forums and the like are Semantic Web Data Spaces. The great thing about all of this is that through RSS 2.0's wild popularity, Blogosphere has done what I postulated about a while back: The Semantic Web would be self-annotating, and so it has come to be :-)

To prove the point above: paste your blog's URL into the OpenLink RDF Browser and see it morph into a Semantic Data Web URI (a pointer to Web Data that's you've created) once you click the "Query" button (click on the TimeLine tab for full effect). The same applies to del.icio.us, Flickr, Googlebase, and basically any REST style Web Service as per my RDF Middleware post.

Lazy Semantic Web Callout:

If you're a good animator (pro or hobbyist), please produce an animation of a document going through a shredder. The strips that emerge from the shredder represent the granular data that was once the whole document. The same thing is happening on the Web right now, we are putting photocopies of (X)HTML documents through the shredder (in a good way) en route to producing granular items of data that remain connected to the original copy while developing new and valuable connections to other items of Web Data.

That's it!

]]>
Describing the Semantic Data Web (Take 3)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1180Fri, 13 Apr 2007 21:15:42 GMT32007-04-13T17:15:42-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>
Danny Ayers responds, via his post titled: Sampling, to "Stefano Mazzochi's post about Data Integration using Semantic Web Technologies.

"There is a potential problem with republication of transformed data, in that right away there may be inconsistency with the original source data. Here provenance tracking (probably via named graphs) becomes a must-have. The web data space itself can support very granular separation. Whatever, data integration is a hard problem. But if you have a uniform language for describing resources, at least it can be possible."

Alex James also chimes in with valuable insights in his post: Sampling the global data model, where he concludes:

"Exactly we need to use projected views, or conceptual models. '

See a projected view can be thought of as a conceptual model that has some mapping to a *sampling* of the global data model.

The benefits of introducing this extra layer are many and varied: Simplicity, URI predictability, Domain Specificity and the ability to separate semantics from lower level details like data mapping.

Unfortunately if you look at today’s ORMs you will quickly notice that they simply map directly from Object Model to Data Model in one step.

This naïve approach provides no place to manage the mapping to a conceptual model that sampling the world’s data requires.

What we need to solve the problems Stefano sees is to bring together the world of mapping and semantics. And the place they will meet is simply the Conceptual Model."

Data Integration challenges arise because the following facts hold true all of the time (whether we like it or not):

  1. Data Heterogeneity is a fact of life at the intranet and internet levels
  2. Data is rarely clean
  3. Data Integration prowess are ultimately measured by pain alleviation
  4. A some point human participation is required, but the trick is to move human activity up the value chain
  5. Glue code size and Data Integration success are inversely related
  6. Data Integration is best addressed via "M" rather than "C" (if we use the MVC pattern as a guide. "V" is dead on arrival for the scrappers out there)

In 1997 we commenced the Virtuoso Virtual DBMS Project that morphed into the Virtuoso Universal Server; A fusion of DBMS functionality and Middleware functionality in a single product. The goal of this undertaking remains alleviation of the costs associated with Data Integration Challenges by Virtualizing Data at the Logical and Conceptual Layers.

The Logical Data Layer has been concrete for a while (e.g Relational DBMS Engines), what hasn't reached the mainstream is the Concrete Conceptual Model, but this is changing fast courtesy of the activity taking place in the realm of RDF.

RDF provides an Open and Standards compliant vehicle for developing and exploiting Concrete Conceptual Data Models that ultimately move the Human aspect of the "Data Integration alleviation quest" higher up the value chain.

]]>
RDF based Integration Challenges (update)http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1174Fri, 30 Mar 2007 23:35:35 GMT12007-03-30T19:35:35-04:00Kingsley Uyi Idehen <kidehen@openlinksw.com>