Details

OpenLink Software
Burlington, United States

Subscribe

Post Categories

Recent Articles

Community Member Blogs

Display Settings

articles per page.
order.

Translate

Showing posts in all categories RefreshRefresh
Take N: Yet Another OpenLink Data Spaces Introduction [ Kingsley Uyi Idehen ]

Problem:

Your Life, Profession, Web, and Internet do not need to become mutually exclusive due to "information overload".

Solution:

A platform or service that delivers a point of online presence that embodies the fundamental separation of: Identity, Data Access, Data Representation, Data Presentation, by adhering to Web and Internet protocols.

How:

Typical post installation (Local or Cloud) task sequence:

  1. Identify myself (happens automatically by way of registration)
  2. If in an LDAP environment, import accounts or associate system with LDAP for account lookup and authentication
  3. Identify Online Accounts (by fleshing out profile) which also connects system to online accounts and their data
  4. Use Profile for granular description (Biography, Interests, WishList, OfferList, etc.)
  5. Optionally upstream or downstream data to and from my online accounts
  6. Create content Tagging Rules
  7. Create rules for associating Tags with formal URIs
  8. Create automatic Hyperlinking Rules for reuse when new content is created (e.g. Blog posts)
  9. Exploit Data Portability virtues of RSS, Atom, OPML, RDFa, RDF/XML, and other formats for imports and exports
  10. Automatically tag imported content
  11. Use function-specific helper application UIs for domain specific data generation e.g. AddressBook (optionally use vCard import), Calendar (optionally use iCalendar import), Email, File Storage (use WebDAV mount with copy and paste or HTTP GET), Feed Subscriptions (optionally import RSS/Atom/OPML feeds), Bookmarking (optionally import bookmark.html or XBEL) etc..
  12. Optionally enable "Conversation" feature (today: Social Media feature) across the relevant application domains (manage conversations under covers using NNTP, the standard for this functionality realm)
  13. Generate HTTP based Entity IDs (URIs) for every piece of data in this burgeoning data space
  14. Use REST based APIs to perform CRUD tasks against my data (local and remote) (SPARQL, GData, Ubiquity Commands, Atom Publishing)
  15. Use OpenID, OAuth, FOAF+SSL, FOAF+SSL+OpenID for accessing data elsewhere
  16. Use OpenID, OAuth, FOAF+SSL, FOAF+SSL+OpenID for Controlling access to my data (Self Signed Certificate Generation, Browser Import of said Certificate & associated Private Key, plus persistence of Certificate to FOAF based profile data space in "one click")
  17. Have a simple UI for Entity-Attribute-Value or Subject-Predicate-Object arbitrary data annotations and creation since you can't pre model an "Open World" where the only constant is data flow
  18. Have my Personal URI (Web ID) as the single entry point for controlled access to my HTTP accessible data space

I've just outlined a snippet of the capabilities of the OpenLink Data Spaces platform. A platform built using OpenLink Virtuoso, architected to deliver: open, platform independent, multi-model, data access and data management across heterogeneous data sources.

All you need to remember is your URI when seeking to interact with your data space.

Related

  1. Get Yourself a URI (Web ID) in 5 Minutes or Less!
  2. Various posts over the years about Data Spaces
  3. Future of Desktop Post
  4. Simplify My Life Post by Bengee Nowack
# PermaLink Comments [0]
04/22/2009 14:46 GMT Modified: 04/22/2009 15:32 GMT
Take N: Yet Another OpenLink Data Spaces Introduction [ Kingsley Uyi Idehen ]

Problem:

Your Life, Profession, Web, and Internet do not need to become mutually exclusive due to "information overload".

Solution:

A platform or service that delivers a point of online presence that embodies the fundamental separation of: Identity, Data Access, Data Representation, Data Presentation, by adhering to Web and Internet protocols.

How:

Typical post installation (Local or Cloud) task sequence:

  1. Identify myself (happens automatically by way of registration)
  2. If in an LDAP environment, import accounts or associate system with LDAP for account lookup and authentication
  3. Identify Online Accounts (by fleshing out profile) which also connects system to online accounts and their data
  4. Use Profile for granular description (Biography, Interests, WishList, OfferList, etc.)
  5. Optionally upstream or downstream data to and from my online accounts
  6. Create content Tagging Rules
  7. Create rules for associating Tags with formal URIs
  8. Create automatic Hyperlinking Rules for reuse when new content is created (e.g. Blog posts)
  9. Exploit Data Portability virtues of RSS, Atom, OPML, RDFa, RDF/XML, and other formats for imports and exports
  10. Automatically tag imported content
  11. Use function-specific helper application UIs for domain specific data generation e.g. AddressBook (optionally use vCard import), Calendar (optionally use iCalendar import), Email, File Storage (use WebDAV mount with copy and paste or HTTP GET), Feed Subscriptions (optionally import RSS/Atom/OPML feeds), Bookmarking (optionally import bookmark.html or XBEL) etc..
  12. Optionally enable "Conversation" feature (today: Social Media feature) across the relevant application domains (manage conversations under covers using NNTP, the standard for this functionality realm)
  13. Generate HTTP based Entity IDs (URIs) for every piece of data in this burgeoning data space
  14. Use REST based APIs to perform CRUD tasks against my data (local and remote) (SPARQL, GData, Ubiquity Commands, Atom Publishing)
  15. Use OpenID, OAuth, FOAF+SSL, FOAF+SSL+OpenID for accessing data elsewhere
  16. Use OpenID, OAuth, FOAF+SSL, FOAF+SSL+OpenID for Controlling access to my data (Self Signed Certificate Generation, Browser Import of said Certificate & associated Private Key, plus persistence of Certificate to FOAF based profile data space in "one click")
  17. Have a simple UI for Entity-Attribute-Value or Subject-Predicate-Object arbitrary data annotations and creation since you can't pre model an "Open World" where the only constant is data flow
  18. Have my Personal URI (Web ID) as the single entry point for controlled access to my HTTP accessible data space

I've just outlined a snippet of the capabilities of the OpenLink Data Spaces platform. A platform built using OpenLink Virtuoso, architected to deliver: open, platform independent, multi-model, data access and data management across heterogeneous data sources.

All you need to remember is your URI when seeking to interact with your data space.

Related

  1. Get Yourself a URI (Web ID) in 5 Minutes or Less!
  2. Various posts over the years about Data Spaces
  3. Future of Desktop Post
  4. Simplify My Life Post by Bengee Nowack
# PermaLink Comments [0]
04/22/2009 14:46 GMT Modified: 04/22/2009 15:32 GMT
Simple Compare & Contrast of Web 1.0, 2.0, and 3.0 (Update 1) [ Kingsley Uyi Idehen ]

Here is a tabulated "compare and contrast" of Web usage patterns 1.0, 2.0, and 3.0.

  Web 1.0 Web 2.0 Web 3.0
Simple Definition Interactive / Visual Web Programmable Web Linked Data Web
Unit of Presence Web Page Web Service Endpoint Data Space (named structured data enclave)
Unit of Value Exchange Page URL Endpoint URL for API Resource / Entity / Object URI
Data Granularity Low (HTML) Medium (XML) High (RDF)
Defining Services Search Community (Blogs to Social Networks) Find
Participation Quotient Low Medium High
Serendipitous Discovery Quotient Low Medium High
Data Referencability Quotient Low (Documents) Medium (Documents) High (Documents and their constituent Data)
Subjectivity Quotient High Medium (from A-list bloggers to select source and partner lists) Low (everything is discovered via URIs)
Transclusence Low Medium (Code driven Mashups) HIgh (Data driven Meshups)
What You See Is What You Prefer (WYSIWYP) Low Medium High (negotiated representation of resource descriptions)
Open Data Access (Data Accessibility) Low Medium (Silos) High (no Silos)
Identity Issues Handling Low Medium (OpenID)

High (FOAF+SSL)

Solution Deployment Model Centralized Centralized with sprinklings of Federation Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia)
Data Model Orientation Logical (Tree based DOM) Logical (Tree based XML) Conceptual (Graph based RDF)
User Interface Issues Dynamically generated static interfaces Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath) Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF
Data Querying Full Text Search Full Text Search Full Text Search + Structured Graph Pattern Query Language (SPARQL)
What Each Delivers Democratized Publishing Democratized Journalism & Commentary (Citizen Journalists & Commentators) Democratized Analysis (Citizen Data Analysts)
Star Wars Edition Analogy Star Wars (original fight for decentralization via rebellion) Empire Strikes Back (centralization and data silos make comeback) Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation")

Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)

Related

# PermaLink Comments [1]
03/14/2009 14:20 GMT Modified: 04/29/2009 13:21 GMT
Simple Compare & Contrast of Web 1.0, 2.0, and 3.0 (Update 1) [ Kingsley Uyi Idehen ]

Here is a tabulated "compare and contrast" of Web usage patterns 1.0, 2.0, and 3.0.

  Web 1.0 Web 2.0 Web 3.0
Simple Definition Interactive / Visual Web Programmable Web Linked Data Web
Unit of Presence Web Page Web Service Endpoint Data Space (named structured data enclave)
Unit of Value Exchange Page URL Endpoint URL for API Resource / Entity / Object URI
Data Granularity Low (HTML) Medium (XML) High (RDF)
Defining Services Search Community (Blogs to Social Networks) Find
Participation Quotient Low Medium High
Serendipitous Discovery Quotient Low Medium High
Data Referencability Quotient Low (Documents) Medium (Documents) High (Documents and their constituent Data)
Subjectivity Quotient High Medium (from A-list bloggers to select source and partner lists) Low (everything is discovered via URIs)
Transclusence Low Medium (Code driven Mashups) HIgh (Data driven Meshups)
What You See Is What You Prefer (WYSIWYP) Low Medium High (negotiated representation of resource descriptions)
Open Data Access (Data Accessibility) Low Medium (Silos) High (no Silos)
Identity Issues Handling Low Medium (OpenID)

High (FOAF+SSL)

Solution Deployment Model Centralized Centralized with sprinklings of Federation Federated with function specific Centralization (e.g. Lookup hubs like LOD Cloud or DBpedia)
Data Model Orientation Logical (Tree based DOM) Logical (Tree based XML) Conceptual (Graph based RDF)
User Interface Issues Dynamically generated static interfaces Dyanically generated interafaces with semi-dynamic interfaces (courtesy of XSLT or XQuery/XPath) Dynamic Interfaces (pre- and post-generation) courtesy of self-describing nature of RDF
Data Querying Full Text Search Full Text Search Full Text Search + Structured Graph Pattern Query Language (SPARQL)
What Each Delivers Democratized Publishing Democratized Journalism & Commentary (Citizen Journalists & Commentators) Democratized Analysis (Citizen Data Analysts)
Star Wars Edition Analogy Star Wars (original fight for decentralization via rebellion) Empire Strikes Back (centralization and data silos make comeback) Return of the JEDI (FORCE emerges and facilitates decentralization from "Identity" all the way to "Open Data Access" and "Negotiable Descriptive Data Representation")

Naturally, I am not expecting everyone to agree with me. I am simply making my contribution to what will remain facinating discourse for a long time to come :-)

Related

# PermaLink Comments [1]
03/14/2009 14:20 GMT Modified: 04/29/2009 13:21 GMT
Introducing Virtuoso Universal Server (Cloud Edition) for Amazon EC2 [ Kingsley Uyi Idehen ]

What is it?

A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform.

What does it offer?

From a Web Entrepreneur perspective it offers:
  1. Low cost entry point to a game-changing Web 3.0+ (and beyond) platform that combines SQL, RDF, XML, and Web Services functionality
  2. Flexible variable cost model (courtesy of EC2 DevPay) tightly bound to revenue generated by your services
  3. Delivers federated and/or centralized model flexibility for you SaaS based solutions
  4. Simple entry point for developing and deploying sophisticated database driven applications (SQL or RDF Linked Data Web oriented)
  5. Complete framework for exploiting OpenID, OAuth (including Role enhancements) that simplifies exploitation of these vital Identity and Data Access technologies
  6. Easily implement RDF Linked Data based Mail, Blogging, Wikis, Bookmarks, Calendaring, Discussion Forums, Tagging, Social-Networking as Data Space (data containers) features of your application or service offering
  7. Instant alleviation of challenges (e.g. service costs and agility) associated with Data Portability and Open Data Access across Web 2.0 data silos
  8. LDAP integration for Intranet / Extranet style applications.

From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services:

  1. RDF Database (a Quad Store with SPARQL & SPARUL Language & Protocol support)
  2. SQL Database (with ODBC, JDBC, OLE-DB, ADO.NET, and XMLA driver access)
  3. XML Database (XML Schema, XQuery/Xpath, XSLT, Full Text Indexing)
  4. Full Text Indexing.

From a Middleware perspective it provides:

  1. RDF Views (Wrappers / Semantic Covers) over SQL, XML, and other data sources accessible via SOAP or REST style Web Services
  2. Sponger Service for converting non RDF information resources into RDF Linked Data "on the fly" via a large collection of pre-installed RDFizer Cartridges.

From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering

  1. HTTP Web Server
  2. WebDAV Server
  3. Web Application Server (includes PHP runtime hosting)
  4. SOAP or REST style Web Services Deployment
  5. RDF Linked Data Deployment
  6. SPARQL (SPARQL Query Language) and SPARUL (SPARQL Update Language) endpoints
  7. Virtuoso Hosted PHP packages for MediaWiki, Drupal, Wordpress, and phpBB3 (just install the relevant Virtuoso Distro. Package).

From the general System Administrator's perspective it provides:

  1. Online Backups (Backup Set dispatched to S3 buckets, FTP, or HTTP/WebDAV server locations)
  2. Synchronized Incremental Backups to Backup Set locations
  3. Backup Restore from Backup Set location (without exiting to EC2 shell).

Higher level user oriented offerings include:

  1. OpenLink Data Explorer front-end for exploring the burgeoning Linked Data Web
  2. Ajax based SPARQL Query Builder (iSPARQL) that enables SPARQL Query construction by Example
  3. Ajax based SQL Query Builder (QBE) that enables SQL Query construction by Example.

For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes:

  1. Point of presence on the Linked Data Web that meshes your Identity and your Data via URIs
  2. System generated Social Network Profile & Contact Data via FOAF?
  3. System generated SIOC (Semantically Interconnected Online Community) Data Space (that includes a Social Graph) exposing all your Web data in RDF Linked Data form
  4. System generated OpenID and automatic integration with FOAF
  5. Transparent Data Integration across Facebook, Digg, LinkedIn, FriendFeed, Twitter, and any other Web 2.0 data space equipped with RSS / Atom support and/or REST style Web Services
  6. In-built support for SyncML which enables data synchronization with Mobile Phones.

How Do I Get Going with It?

# PermaLink Comments [0]
11/28/2008 19:27 GMT Modified: 11/28/2008 16:06 GMT
Where Are All the RDF-based Semantic Web Applications? [ Kingsley Uyi Idehen ]

In response to the "Semantic Web Technology" application classification scheme espoused by ReadWriteWeb (RWW), emphasized in the post titled: Where are all the RDF-based Semantic Web Apps?, here is my attempt to clarify and reintroduce what OpenLink Software offers (today) in relation to Semantic Web technology.

From the RWW Top-Down category, which I interpret as: technologies that produce RDF from non RDF data sources. Our product portfolio is comprised of the following; Virtuoso Universal Server, OpenLink Data Spaces, OpenLink Ajax Toolkit, and OpenLink Data Explorer (which includes ubiquity commands).

Virtuoso Universal Server functionality summary:

  1. Generation of RDF Linked Data Views of SQL, XML, and Web Services in general
  2. Deployment of RDF Linked Data
  3. "On the Fly" generation of RDF Linked Data from Document Web information resources (i.e. distillation of entities from their containers e.g. Web pages) via Cartridges / Drivers
  4. SPARQL query language support
  5. SPARQL extensions that bring SPARQL closer to SQL e.g Aggregates, Update, Insert, Delete Named Graph support (i.e. use of logical names to partition RDF data within Virtuoso's multi-model dbms engine)
  6. Inference Engine (currently in use re. DBpedia via Yago and UMBEL)
  7. Host and exposes data from Drupal, Wordpress, MediaWiki, phpBB3 as RDF Linked Data via in-built support for PHP runtime
  8. Available as an EC2 AMI
  9. etc..

OpenLink Data Spaces functionality summary:

  1. Simple mechanism for Linked Data Web enabling yourself by giving you an HTTP based User ID (a de-referencable URI) that is linked to a FOAF based Profile page and OpenID
  2. Binds all your data sources (blogs, wikis, bookmarks, photos, calendar items etc. ) to your URI so can "Find" things by only remembering your URI
  3. Makes your profile page and personal URI the focal point of Linked Data Web presence
  4. Delivers Data Portability (using data access by value or data access by reference) across data silos (e.g. Web 2.0 style social networks)
  5. Allows you make annotations about anything in your own Data Space(s) on the Web without exposure to RDF markup
  6. A Briefcase feature that provides a WebDAV driven RDF Linked Data variant of functionality seen in Mac OS X Spotlight and WinFS with the addition of SPARQL compliance
  7. Automatically generates RDFa in its (X)HTML pages
  8. Blog, Wiki, WebDAV File Server, Shared Bookmarks, Calendar, and other applications that look and feel like Web 2.0 counterparts but emitt RDF Linked Data amongst a plethora of data exchange formats
  9. Available as an EC2 AMI
  10. etc..

OpenLink Ajax Toolkit functionality summary:

  1. Provides binding to SQL, RDF, XML, and Web Services via Ajax Database Connectivity Layer (you only need an ODBC, JDBC, OLE-DB, ADO.NET, XMLA Driver, or Web Service on the backend for dynamic data access from Javascript)
  2. All controls are Ajax Database Connectivity bound (widgets get their data from Ajax Database Connectivity data sources)
  3. Bundled with Virtuoso and ODS installations.
  4. etc.

OpenLink Data Explorer functionality summary

  1. Distills entities associated with information resource style containers (e.g. Web Pages or files) as RDF Linked Data
  2. Exposes the RDF based Linked Data graph associated with information resources (see the Linked Data behind Web pages)
  3. Ubiquity commands for invoking the above
  4. Available as a Hosted Service or Firefox Extension
  5. Bundled with Virtuoso and ODS installations
  6. etc.

Note:

Of course you could have simply looked up OpenLink Software's FOAF based Profile page (*note the Linked Data Explorer tab*), or simply passed the FOAF profile page URL to a Linked Data aware client application such as: OpenLink Data Explorer, Zitgist Data Viewer, Marbles, and Tabulator, and obtained information. Remember, OpenLink Software is an Entity of Type: foaf:Organization, on the burgeoning Linked Data Web :-)

Related

# PermaLink Comments [3]
10/01/2008 19:09 GMT Modified: 10/02/2008 15:27 GMT
WUPnP Cheatsheet [ Kingsley Uyi Idehen ]

WUPnP Cheatsheet: "

The Web Universal Plug and Play (WUPnP) Cheatsheet:

Web Universal Plug and Play (WUPnP) Cheatsheet

Essentially, if you build an application and use the technologies suggested in the ‘glue section’ then your web application/service (whether it’s front-end or back-end) will fit into many many other web applications/services… and therefore also more manageable for the future! This is WUPnP.

Key technologies for making your services/applications as sticky as possible:

Web-based plug and play fun!

"

(Via Daniel Lewis.)

# PermaLink Comments [0]
07/28/2008 23:37 GMT Modified: 07/29/2008 13:06 GMT
Response to: Where's the Killer Semantic Web Application (Update #2) [ Kingsley Uyi Idehen ]

As is often the case these days, it's much easier to drop a blog post than it is to make a simple comment in an "old media" style data space :-(

My use of "old media" implies: a place that still seeks subscriber data (no OpenID etc..), for the umpteenth time, as the toll fee for discourse development and participation on the Web.

Anyway, here is what I attempted to post as a comment to Dan Grigorovici's post titled: Where is the Semantic Web Killer App?

Dan,

An intriguing post to say the least :-)

"Linked Data" and "Semantic Web" aren't synonymous, they are simply connected, infrastructure DNA-wise. You can have "Semantic Web" style graphs (i.e RDF Data) and not have "Linked Data" as per Linked Data deployment tenets and best practices, a very important point.

I've stated repeatedly, the "Linked Data" emphasis has more to do with focusing on a point of crystallization within the larger "Semantic Web" vision, so here is a quick recap:

What is Linked Data?

A term coined by TimBL that describes an application of HTTP to the time-tested process of "Data Access by Reference". "Linked Data" adds vital items to the "Data Access by Reference" pattern that have been erstwhile unattainable:

  • The use of a Data Source Naming scoped to Database / Data Container Records as opposed to Tables, Views, Stored Procedures, Databases, and other Record Container tuple collections. Example: in ODBC / JDBC, a Data Source Name's scope stops at the Table / View level. In the Linked Data realm you get an added layer of granularity due to record level name scope
  • Incorporation of HTTP into the Data Source Naming scheme, which injects the expanse of the Web into the Data Access Range of the Data Source Name (i.e. a Named Record); so you can reference a record's description directly via HTTP which is simply a major deal (to put things mildly).

So we have HTTP based URIs as the Data Sources Names for a "Linked Data Web" i.e a Web of inter-connected Data Source Names that de-emphasize the importance of their host containers (Compound Documents / Information Resources).

The business case or value proposition of "Linked Data" is synonymous with the value proposition of data access technologies such as ODBC, JDBC. ADO.NET, OLE-DB, XMLA, and others (enterprise or consumer) in relation to the Individual and Enterprise pursuit of agility; in a realm where data is growing exponentially, and the maximum processing time in a single day remains 24 hrs. Data Access & Data Integration are timeless challenges due to the following constants:

  • Structured Data Schema Heterogeneity - we will always model the same things differently
  • Dirtiness of Data within Structured Data Containers - we are error prone due to laziness / sloppiness, time constraints, and the inherent limitation of our DNA based CPUs when dealing with large volumes of data.

Note: The line between the Enterprise & Individuals continue to blur by the second, this is something I covered during my Linked Data Planet keynote, which is like most things I put on the Web (via this blog data space), is a live and practical demonstration of the virtues of Linked Data courtesy of RDFa, the Bibliographic Ontology, and dereferencable URIs (i.e. HTTP based Data Source Names for Documents and the Entities they host).

Related

# PermaLink Comments [0]
06/26/2008 18:28 GMT Modified: 07/19/2008 15:50 GMT
ESWC 2008 [ Virtuso Data Space Bot ]
ESWC 2008

Yrjänä Rankka and I attended ESWC2008 on behalf of OpenLink.

We were invited at the last minute to give a Linked Open Data talk at Paolo Bouquet's Identity and Reference workshop. We also had a demo of SPARQL BI (PPT); other formats coming soon), our business intelligence extensions to SPARQL as well as joining between relational data mapped to RDF and native RDF data. i was also speaking at the social networks panel chaired by Harry Halpin.

I have gathered a few impressions that I will share in the next few posts (1 - RDF Mapping, 2 - DARQ, 3 - voiD, 4 - Paradigmata). Caveat: This is not meant to be complete or impartial press coverage of the event but rather some quick comments on issues of personal/OpenLink interest. The fact that I do not mention something does not mean that it is unimportant.

The voiD Graph

Linked Open Data was well represented, with Chris Bizer, Tom Heath, ourselves and many others. The great advance for LOD this time around is voiD, the Vocabulary of Interlinked Datasets, a means to describe what in fact is inside the LOD cloud, how to join it with what and so forth. Big time important if there is to be a web of federatable data sources, feeding directly into what we have been saying for a while about SPARQL end-point self-description and discovery. There is reasonable hope of having something by the date of Linked Data Planet in a couple of weeks.

Federating

Bastian Quilitz gave a talk about his DARQ, a federated version of Jena's ARQ.

Something like DARQ's optimization statistics should make their way into the SPARQL protocol as well as the voiD data set description.

We really need federation but more on this in a separate post.

XSPARQL

Axel Polleres et al had a paper about XSPARQL, a merge of XQuery and SPARQL. While visiting DERI a couple of weeks back and again at the conference, we talked about OpenLink implementing the spec. It is evident that the engines must be in the same process and not communicate via the SPARQL protocol for this to be practical. We could do this. We'll have to see when.

Politically, using XQuery to give expressions and XML synthesis to SPARQL would be fitting. These things are needed anyhow, as surely as aggregation and sub-queries but the latter would not so readily come from XQuery. Some rapprochement between RDF and XML folks is desirable anyhow.

Panel: Will the Sem Web Rise to the Challenge of the Social Web?

The social web panel presented the question of whether the sem web was ready for prime time with data portability.

The main thrust was expressed in Harry Halpin's rousing closing words: "Men will fight in a battle and lose a battle for a cause they believe in. Even if the battle is lost, the cause may come back and prevail, this time changed and under a different name. Thus, there may well come to be something like our semantic web, but it may not be the one we have worked all these years to build if we do not rise to the occasion before us right now."

So, how to do this? Dan Brickley asked the audience how many supported, or were aware of, the latest Web 2.0 things, such as OAuth and OpenID. A few were. The general idea was that research (after all, this was a research event) should be more integrated and open to the world at large, not living at the "outdated pace" of a 3 year funding cycle. Stefan Decker of DERI acquiesced in principle. Of course there is impedance mismatch between specialization and interfacing with everything.

I said that triples and vocabularies existed, that OpenLink had ODS (OpenLink Data Spaces, Community LinkedData) for managing one's data-web presence, but that scale would be the next thing. Rather large scale even, with 100 gigatriples (Gtriples) reached before one even noticed. It takes a lot of PCs to host this, maybe $400K worth at today's prices, without replication. Count 16G ram and a few cores per Gtriple so that one is not waiting for disk all the time.

The tricks that Web 2.0 silos do with app-specific data structures and app-specific partitioning do not really work for RDF without compromising the whole point of smooth schema evolution and tolerance of ragged data.

So, simple vocabularies, minimal inference, minimal blank nodes. Besides, note that the inference will have to be done at run time, not forward-chained at load time, if only because users will not agree on what sameAs and other declarations they want for their queries. Not to mention spam or malicious sameAs declarations!

As always, there was the question of business models for the open data web and for semantic technologies in general. As we see it, information overload is the factor driving the demand. Better contextuality will justify semantic technologies. Due to the large volumes and complex processing, a data-as-service model will arise. The data may be open, but its query infrastructure, cleaning, and keeping up-to-date, can be monetized as services.

Identity and Reference

For the identity and reference workshop, the ultimate question is metaphysical and has no single universal answer, even though people, ever since the dawn of time and earlier, have occupied themselves with the issue. Consequently, I started with the Genesis quote where Adam called things by nominibus suis, off-hand implying that things would have some intrinsic ontologically-due names. This would be among the older references to the question, at least in widely known sources.

For present purposes, the consensus seemed to be that what would be considered the same as something else depended entirely on the application. What was similar enough to warrant a sameAs for cooking purposes might not warrant a sameAs for chemistry. In fact, complete and exact sameness for URIs would be very rare. So, instead of making generic weak similarity assertions like similarTo or seeAlso, one would choose a set of strong sameAs assertions and have these in effect for query answering if they were appropriate to the granularity demanded by the application.

Therefore sameAs is our permanent companion, and there will in time be malicious and spam sameAs. So, nothing much should be materialized on the basis of sameAs assertions in an open world. For an app-specific warehouse, sameAs can be resolved at load time.

There was naturally some apparent tension between the Occam camp of entity name services and the LOD camp. I would say that the issue is more a perceived polarity than a real one. People will, inevitably, continue giving things names regardless of any centralized authority. Just look at natural language. But having a dictionary that is commonly accepted for established domains of discourse is immensely helpful.

CYC and NLP

The semantic search workshop was interesting, especially CYC's presentation. CYC is, as it were, the grand old man of knowledge representation. Over the long term, I would have support of the CYC inference language inside a database query processor. This would mostly be for repurposing the huge knowledge base for helping in search type queries. If it is for transactions or financial reporting, then queries will be SQL and make little or no use of any sort of inference. If it is for summarization or finding things, the opposite holds. For scaling, the issue is just making correct cardinality guesses for query planning, which is harder when inference is involved. We'll see.

I will also have a closer look at natural language one of these days, quite inevitably, since Zitgist (for example) is into entity disambiguation.

Scale

Garlic gave a talk about their Data Patrol and QDOS. We agree that storing the data for these as triples instead of 1000 or so constantly changing relational tables could well make the difference between next-to-unmanageable and efficiently adaptive.

Garlic probably has the largest triple collection in constant online use to date. We will soon join them with our hosting of the whole LOD cloud and Sindice/Zitgist as triples.

Conclusions

There is a mood to deliver applications. Consequently, scale remains a central, even the principal topic. So for now we make bigger centrally-managed databases. At the next turn around the corner we will have to turn to federation. The point here is that a planetary-scale, centrally-managed, online system can be made when the workload is uniform and anticipatable, but if it is free-form queries and complex analysis, we have a problem. So we move in the direction of federating and charging based on usage whenever the workload is more complex than making simple lookups now and then.

For the Virtuoso roadmap, this changes little. Next we make data sets available on Amazon EC2, as widely promised at ESWC. With big scale also comes rescaling and repartitioning, so this gets additional weight, as does further parallelizing of single user workloads. As it happens, the same medicine helps for both. At Linked Data Planet, we will make more announcements.

# PermaLink Comments [0]
06/09/2008 10:02 GMT Modified: 06/11/2008 13:15 GMT
ESWC 2008 [ Orri Erling ]

Yrjänä Rankka and I attended ESWC2008 on behalf of OpenLink.

We were invited at the last minute to give a Linked Open Data talk at Paolo Bouquet's Identity and Reference workshop. We also had a demo of SPARQL BI (PPT); other formats coming soon), our business intelligence extensions to SPARQL as well as joining between relational data mapped to RDF and native RDF data. i was also speaking at the social networks panel chaired by Harry Halpin.

I have gathered a few impressions that I will share in the next few posts (1 - RDF Mapping, 2 - DARQ, 3 - voiD, 4 - Paradigmata). Caveat: This is not meant to be complete or impartial press coverage of the event but rather some quick comments on issues of personal/OpenLink interest. The fact that I do not mention something does not mean that it is unimportant.

The voiD Graph

Linked Open Data was well represented, with Chris Bizer, Tom Heath, ourselves and many others. The great advance for LOD this time around is voiD, the Vocabulary of Interlinked Datasets, a means to describe what in fact is inside the LOD cloud, how to join it with what and so forth. Big time important if there is to be a web of federatable data sources, feeding directly into what we have been saying for a while about SPARQL end-point self-description and discovery. There is reasonable hope of having something by the date of Linked Data Planet in a couple of weeks.

Federating

Bastian Quilitz gave a talk about his DARQ, a federated version of Jena's ARQ.

Something like DARQ's optimization statistics should make their way into the SPARQL protocol as well as the voiD data set description.

We really need federation but more on this in a separate post.

XSPARQL

Axel Polleres et al had a paper about XSPARQL, a merge of XQuery and SPARQL. While visiting DERI a couple of weeks back and again at the conference, we talked about OpenLink implementing the spec. It is evident that the engines must be in the same process and not communicate via the SPARQL protocol for this to be practical. We could do this. We'll have to see when.

Politically, using XQuery to give expressions and XML synthesis to SPARQL would be fitting. These things are needed anyhow, as surely as aggregation and sub-queries but the latter would not so readily come from XQuery. Some rapprochement between RDF and XML folks is desirable anyhow.

Panel: Will the Sem Web Rise to the Challenge of the Social Web?

The social web panel presented the question of whether the sem web was ready for prime time with data portability.

The main thrust was expressed in Harry Halpin's rousing closing words: "Men will fight in a battle and lose a battle for a cause they believe in. Even if the battle is lost, the cause may come back and prevail, this time changed and under a different name. Thus, there may well come to be something like our semantic web, but it may not be the one we have worked all these years to build if we do not rise to the occasion before us right now."

So, how to do this? Dan Brickley asked the audience how many supported, or were aware of, the latest Web 2.0 things, such as OAuth and OpenID. A few were. The general idea was that research (after all, this was a research event) should be more integrated and open to the world at large, not living at the "outdated pace" of a 3 year funding cycle. Stefan Decker of DERI acquiesced in principle. Of course there is impedance mismatch between specialization and interfacing with everything.

I said that triples and vocabularies existed, that OpenLink had ODS (OpenLink Data Spaces, Community LinkedData) for managing one's data-web presence, but that scale would be the next thing. Rather large scale even, with 100 gigatriples (Gtriples) reached before one even noticed. It takes a lot of PCs to host this, maybe $400K worth at today's prices, without replication. Count 16G ram and a few cores per Gtriple so that one is not waiting for disk all the time.

The tricks that Web 2.0 silos do with app-specific data structures and app-specific partitioning do not really work for RDF without compromising the whole point of smooth schema evolution and tolerance of ragged data.

So, simple vocabularies, minimal inference, minimal blank nodes. Besides, note that the inference will have to be done at run time, not forward-chained at load time, if only because users will not agree on what sameAs and other declarations they want for their queries. Not to mention spam or malicious sameAs declarations!

As always, there was the question of business models for the open data web and for semantic technologies in general. As we see it, information overload is the factor driving the demand. Better contextuality will justify semantic technologies. Due to the large volumes and complex processing, a data-as-service model will arise. The data may be open, but its query infrastructure, cleaning, and keeping up-to-date, can be monetized as services.

Identity and Reference

For the identity and reference workshop, the ultimate question is metaphysical and has no single universal answer, even though people, ever since the dawn of time and earlier, have occupied themselves with the issue. Consequently, I started with the Genesis quote where Adam called things by nominibus suis, off-hand implying that things would have some intrinsic ontologically-due names. This would be among the older references to the question, at least in widely known sources.

For present purposes, the consensus seemed to be that what would be considered the same as something else depended entirely on the application. What was similar enough to warrant a sameAs for cooking purposes might not warrant a sameAs for chemistry. In fact, complete and exact sameness for URIs would be very rare. So, instead of making generic weak similarity assertions like similarTo or seeAlso, one would choose a set of strong sameAs assertions and have these in effect for query answering if they were appropriate to the granularity demanded by the application.

Therefore sameAs is our permanent companion, and there will in time be malicious and spam sameAs. So, nothing much should be materialized on the basis of sameAs assertions in an open world. For an app-specific warehouse, sameAs can be resolved at load time.

There was naturally some apparent tension between the Occam camp of entity name services and the LOD camp. I would say that the issue is more a perceived polarity than a real one. People will, inevitably, continue giving things names regardless of any centralized authority. Just look at natural language. But having a dictionary that is commonly accepted for established domains of discourse is immensely helpful.

CYC and NLP

The semantic search workshop was interesting, especially CYC's presentation. CYC is, as it were, the grand old man of knowledge representation. Over the long term, I would have support of the CYC inference language inside a database query processor. This would mostly be for repurposing the huge knowledge base for helping in search type queries. If it is for transactions or financial reporting, then queries will be SQL and make little or no use of any sort of inference. If it is for summarization or finding things, the opposite holds. For scaling, the issue is just making correct cardinality guesses for query planning, which is harder when inference is involved. We'll see.

I will also have a closer look at natural language one of these days, quite inevitably, since Zitgist (for example) is into entity disambiguation.

Scale

Garlic gave a talk about their Data Patrol and QDOS. We agree that storing the data for these as triples instead of 1000 or so constantly changing relational tables could well make the difference between next-to-unmanageable and efficiently adaptive.

Garlic probably has the largest triple collection in constant online use to date. We will soon join them with our hosting of the whole LOD cloud and Sindice/Zitgist as triples.

Conclusions

There is a mood to deliver applications. Consequently, scale remains a central, even the principal topic. So for now we make bigger centrally-managed databases. At the next turn around the corner we will have to turn to federation. The point here is that a planetary-scale, centrally-managed, online system can be made when the workload is uniform and anticipatable, but if it is free-form queries and complex analysis, we have a problem. So we move in the direction of federating and charging based on usage whenever the workload is more complex than making simple lookups now and then.

For the Virtuoso roadmap, this changes little. Next we make data sets available on Amazon EC2, as widely promised at ESWC. With big scale also comes rescaling and repartitioning, so this gets additional weight, as does further parallelizing of single user workloads. As it happens, the same medicine helps for both. At Linked Data Planet, we will make more announcements.

# PermaLink Comments [0]
06/09/2008 13:49 GMT Modified: 06/11/2008 13:15 GMT
 <<     | 1 | 2 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform