Details

OpenLink Software
Burlington, United States

Subscribe

Post Categories

Recent Articles

Community Member Blogs

Display Settings

articles per page.
order.

Translate

Showing posts in all categories RefreshRefresh
Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5) [ Kingsley Uyi Idehen ]

What is SPARQL?

A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).

SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.

Why is it important?

Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.

Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.

How do I use it, generally?

  1. Locate a SPARQL endpoint (DBpedia, LOD Cloud Cache, Data.Gov, URIBurner, others), or;
  2. Install a SPARQL compliant database server (quad or triple store) on your desktop, workgroup server, data center, or cloud (e.g., Amazon EC2 AMI)
  3. Start the database server
  4. Execute SPARQL Queries via the SPARQL endpoint.

How do I use SPARQL with Virtuoso?

What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:

  1. Software Download and Installation
  2. Data Loading from Data Sources exposed at Network Addresses (e.g. HTTP URLs) using very simple methods
  3. Actual SPARQL query execution via SPARQL endpoint.

Installation Steps

  1. Download Virtuoso Open Source or Virtuoso Commercial Editions
  2. Run installer (if using Commercial edition of Windows Open Source Edition, otherwise follow build guide)
  3. Follow post-installation guide and verify installation by typing in the command: virtuoso -? (if this fails check you've followed installation and setup steps, then verify environment variables have been set)
  4. Start the Virtuoso server using the command: virtuoso-start.sh
  5. Verify you have a connection to the Virtuoso Server via the command: isql localhost (assuming you're using default DB settings) or the command: isql localhost:1112 (assuming demo database) or goto your browser and type in: http://<virtuoso-server-host-name>:[port]/conductor (e.g. http://localhost:8889/conductor for default DB or http://localhost:8890/conductor if using Demo DB)
  6. Go to SPARQL endpoint which is typically -- http://<virtuoso-server-host-name>:[port]/sparql
  7. Run a quick sample query (since the database always has system data in place): select distinct * where {?s ?p ?o} limit 50 .

Troubleshooting

  1. Ensure environment settings are set and functional -- if using Mac OS X or Windows, so you don't have to worry about this, just start and stop your Virtuoso server using native OS services applets
  2. If using the Open Source Edition, follow the getting started guide -- it covers PATH and startup directory location re. starting and stopping Virtuoso servers.
  3. Sponging (HTTP GETs against external Data Sources) within SPARQL queries is disabled by default. You can enable this feature by assigning "SPARQL_SPONGE" privileges to user "SPARQL". Note, more sophisticated security exists via WebID based ACLs.

Data Loading Steps

  1. Identify an RDF based structured data source of interest -- a file that contains 3-tuple / triples available at an address on a public or private HTTP based network
  2. Determine the Address (URL) of the RDF data source
  3. Go to your Virtuoso SPARQL endpoint and type in the following SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM <RDFDataSourceURL> WHERE {?s ?p ?o}
  4. All the triples in the RDF resource (data source accessed via URL) will be loaded into the Virtuoso Quad Store (using RDF Data Source URL as the internal quad store Named Graph IRI) as part of the SPARQL query processing pipeline.

Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:

  1. Transformation of data from non RDF data sources (file content, hypermedia resources, web services output etc..) into RDF based 3-tuples (triples)
  2. Cache Invalidation Scheme Construction -- thus, subsequent queries (without the define get:soft "replace" pragma will not be required bar when you forcefully want to override cache).
  3. If you have very large data sources like DBpedia etc. from CKAN, simply use our bulk loader .

SPARQL Endpoint Discovery

Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.

Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:

  1. dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for services instances
  2. dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results in Zone File format

Related

  1. Using HTTP from Ruby -- you can just make SPARQL Protocol URLs re. SPARQL
  2. Using SPARQL Endpoints via Ruby -- Ruby example using DBpedia endpoint
  3. Interactive SPARQL Query By Example (QBE) tool -- provides a graphical user interface (as is common in SQL realm re. query building against RDBMS engines) that works with any SPARQL endpoint
  4. Other methods of loading RDF data into Virtuoso
  5. Virtuoso Sponger -- architecture and how it turns a wide variety of non RDF data sources into SPARQL accessible data
  6. Using OpenLink Data Explorer (ODE) to populate Virtuoso -- locate a resource of interest; click on a bookmarklet or use context menus (if using ODE extensions for Firefox, Safari, or Chrome); and you'll have SPARQL accessible data automatically inserted into your Virtuoso instance.
  7. W3C's SPARQLing Data Access Ingenuity -- an older generic SPARQL introduction post
  8. Collection of SPARQL Query Examples -- GoodRelations (Product Offers), FOAF (Profiles), SIOC (Data Spaces -- Blogs, Wikis, Bookmarks, Feed Collections, Photo Galleries, Briefcase/DropBox, AddressBook, Calendars, Discussion Forums)
  9. Collection of Live SPARQL Queries against LOD Cloud Cache -- simple and advanced queries.
# PermaLink Comments [2]
01/16/2011 02:06 GMT Modified: 01/19/2011 10:43 GMT
Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5) [ Kingsley Uyi Idehen ]

What is SPARQL?

A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).

SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.

Why is it important?

Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.

Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.

How do I use it, generally?

  1. Locate a SPARQL endpoint (DBpedia, LOD Cloud Cache, Data.Gov, URIBurner, others), or;
  2. Install a SPARQL compliant database server (quad or triple store) on your desktop, workgroup server, data center, or cloud (e.g., Amazon EC2 AMI)
  3. Start the database server
  4. Execute SPARQL Queries via the SPARQL endpoint.

How do I use SPARQL with Virtuoso?

What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:

  1. Software Download and Installation
  2. Data Loading from Data Sources exposed at Network Addresses (e.g. HTTP URLs) using very simple methods
  3. Actual SPARQL query execution via SPARQL endpoint.

Installation Steps

  1. Download Virtuoso Open Source or Virtuoso Commercial Editions
  2. Run installer (if using Commercial edition of Windows Open Source Edition, otherwise follow build guide)
  3. Follow post-installation guide and verify installation by typing in the command: virtuoso -? (if this fails check you've followed installation and setup steps, then verify environment variables have been set)
  4. Start the Virtuoso server using the command: virtuoso-start.sh
  5. Verify you have a connection to the Virtuoso Server via the command: isql localhost (assuming you're using default DB settings) or the command: isql localhost:1112 (assuming demo database) or goto your browser and type in: http://<virtuoso-server-host-name>:[port]/conductor (e.g. http://localhost:8889/conductor for default DB or http://localhost:8890/conductor if using Demo DB)
  6. Go to SPARQL endpoint which is typically -- http://<virtuoso-server-host-name>:[port]/sparql
  7. Run a quick sample query (since the database always has system data in place): select distinct * where {?s ?p ?o} limit 50 .

Troubleshooting

  1. Ensure environment settings are set and functional -- if using Mac OS X or Windows, so you don't have to worry about this, just start and stop your Virtuoso server using native OS services applets
  2. If using the Open Source Edition, follow the getting started guide -- it covers PATH and startup directory location re. starting and stopping Virtuoso servers.
  3. Sponging (HTTP GETs against external Data Sources) within SPARQL queries is disabled by default. You can enable this feature by assigning "SPARQL_SPONGE" privileges to user "SPARQL". Note, more sophisticated security exists via WebID based ACLs.

Data Loading Steps

  1. Identify an RDF based structured data source of interest -- a file that contains 3-tuple / triples available at an address on a public or private HTTP based network
  2. Determine the Address (URL) of the RDF data source
  3. Go to your Virtuoso SPARQL endpoint and type in the following SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM <RDFDataSourceURL> WHERE {?s ?p ?o}
  4. All the triples in the RDF resource (data source accessed via URL) will be loaded into the Virtuoso Quad Store (using RDF Data Source URL as the internal quad store Named Graph IRI) as part of the SPARQL query processing pipeline.

Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:

  1. Transformation of data from non RDF data sources (file content, hypermedia resources, web services output etc..) into RDF based 3-tuples (triples)
  2. Cache Invalidation Scheme Construction -- thus, subsequent queries (without the define get:soft "replace" pragma will not be required bar when you forcefully want to override cache).
  3. If you have very large data sources like DBpedia etc. from CKAN, simply use our bulk loader .

SPARQL Endpoint Discovery

Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.

Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:

  1. dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for services instances
  2. dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results in Zone File format

Related

  1. Using HTTP from Ruby -- you can just make SPARQL Protocol URLs re. SPARQL
  2. Using SPARQL Endpoints via Ruby -- Ruby example using DBpedia endpoint
  3. Interactive SPARQL Query By Example (QBE) tool -- provides a graphical user interface (as is common in SQL realm re. query building against RDBMS engines) that works with any SPARQL endpoint
  4. Other methods of loading RDF data into Virtuoso
  5. Virtuoso Sponger -- architecture and how it turns a wide variety of non RDF data sources into SPARQL accessible data
  6. Using OpenLink Data Explorer (ODE) to populate Virtuoso -- locate a resource of interest; click on a bookmarklet or use context menus (if using ODE extensions for Firefox, Safari, or Chrome); and you'll have SPARQL accessible data automatically inserted into your Virtuoso instance.
  7. W3C's SPARQLing Data Access Ingenuity -- an older generic SPARQL introduction post
  8. Collection of SPARQL Query Examples -- GoodRelations (Product Offers), FOAF (Profiles), SIOC (Data Spaces -- Blogs, Wikis, Bookmarks, Feed Collections, Photo Galleries, Briefcase/DropBox, AddressBook, Calendars, Discussion Forums)
  9. Collection of Live SPARQL Queries against LOD Cloud Cache -- simple and advanced queries.
# PermaLink Comments [2]
01/16/2011 02:06 GMT Modified: 01/19/2011 10:43 GMT
Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5) [ Kingsley Uyi Idehen ]

What is SPARQL?

A declarative query language from the W3C for querying structured propositional data (in the form of 3-tuple [triples] or 4-tuple [quads] records) stored in a deductive database (colloquially referred to as triple or quad stores in Semantic Web and Linked Data parlance).

SPARQL is inherently platform independent. Like SQL, the query language and the backend database engine are distinct. Database clients capture SPARQL queries which are then passed on to compliant backend databases.

Why is it important?

Like SQL for relational databases, it provides a powerful mechanism for accessing and joining data across one or more data partitions (named graphs identified by IRIs). The aforementioned capability also enables the construction of sophisticated Views, Reports (HTML or those produced in native form by desktop productivity tools), and data streams for other services.

Unlike SQL, SPARQL includes result serialization formats and an HTTP based wire protocol. Thus, the ubiquity and sophistication of HTTP is integral to SPARQL i.e., client side applications (user agents) only need to be able to perform an HTTP GET against a URL en route to exploiting the power of SPARQL.

How do I use it, generally?

  1. Locate a SPARQL endpoint (DBpedia, LOD Cloud Cache, Data.Gov, URIBurner, others), or;
  2. Install a SPARQL compliant database server (quad or triple store) on your desktop, workgroup server, data center, or cloud (e.g., Amazon EC2 AMI)
  3. Start the database server
  4. Execute SPARQL Queries via the SPARQL endpoint.

How do I use SPARQL with Virtuoso?

What follows is a very simple guide for using SPARQL against your own instance of Virtuoso:

  1. Software Download and Installation
  2. Data Loading from Data Sources exposed at Network Addresses (e.g. HTTP URLs) using very simple methods
  3. Actual SPARQL query execution via SPARQL endpoint.

Installation Steps

  1. Download Virtuoso Open Source or Virtuoso Commercial Editions
  2. Run installer (if using Commercial edition of Windows Open Source Edition, otherwise follow build guide)
  3. Follow post-installation guide and verify installation by typing in the command: virtuoso -? (if this fails check you've followed installation and setup steps, then verify environment variables have been set)
  4. Start the Virtuoso server using the command: virtuoso-start.sh
  5. Verify you have a connection to the Virtuoso Server via the command: isql localhost (assuming you're using default DB settings) or the command: isql localhost:1112 (assuming demo database) or goto your browser and type in: http://<virtuoso-server-host-name>:[port]/conductor (e.g. http://localhost:8889/conductor for default DB or http://localhost:8890/conductor if using Demo DB)
  6. Go to SPARQL endpoint which is typically -- http://<virtuoso-server-host-name>:[port]/sparql
  7. Run a quick sample query (since the database always has system data in place): select distinct * where {?s ?p ?o} limit 50 .

Troubleshooting

  1. Ensure environment settings are set and functional -- if using Mac OS X or Windows, so you don't have to worry about this, just start and stop your Virtuoso server using native OS services applets
  2. If using the Open Source Edition, follow the getting started guide -- it covers PATH and startup directory location re. starting and stopping Virtuoso servers.
  3. Sponging (HTTP GETs against external Data Sources) within SPARQL queries is disabled by default. You can enable this feature by assigning "SPARQL_SPONGE" privileges to user "SPARQL". Note, more sophisticated security exists via WebID based ACLs.

Data Loading Steps

  1. Identify an RDF based structured data source of interest -- a file that contains 3-tuple / triples available at an address on a public or private HTTP based network
  2. Determine the Address (URL) of the RDF data source
  3. Go to your Virtuoso SPARQL endpoint and type in the following SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM <RDFDataSourceURL> WHERE {?s ?p ?o}
  4. All the triples in the RDF resource (data source accessed via URL) will be loaded into the Virtuoso Quad Store (using RDF Data Source URL as the internal quad store Named Graph IRI) as part of the SPARQL query processing pipeline.

Note: the data source URL doesn't even have to be RDF based -- which is where the Virtuoso Sponger Middleware comes into play (download and install the VAD installer package first) since it delivers the following features to Virtuoso's SPARQL engine:

  1. Transformation of data from non RDF data sources (file content, hypermedia resources, web services output etc..) into RDF based 3-tuples (triples)
  2. Cache Invalidation Scheme Construction -- thus, subsequent queries (without the define get:soft "replace" pragma will not be required bar when you forcefully want to override cache).
  3. If you have very large data sources like DBpedia etc. from CKAN, simply use our bulk loader .

SPARQL Endpoint Discovery

Public SPARQL endpoints are emerging at an ever increasing rate. Thus, we've setup up a DNS lookup service that provides access to a large number of SPARQL endpoints. Of course, this doesn't cover all existing endpoints, so if our endpoint is missing please ping me.

Here are a collection of commands for using DNS-SD to discover SPARQL endpoints:

  1. dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for services instances
  2. dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results in Zone File format

Related

  1. Using HTTP from Ruby -- you can just make SPARQL Protocol URLs re. SPARQL
  2. Using SPARQL Endpoints via Ruby -- Ruby example using DBpedia endpoint
  3. Interactive SPARQL Query By Example (QBE) tool -- provides a graphical user interface (as is common in SQL realm re. query building against RDBMS engines) that works with any SPARQL endpoint
  4. Other methods of loading RDF data into Virtuoso
  5. Virtuoso Sponger -- architecture and how it turns a wide variety of non RDF data sources into SPARQL accessible data
  6. Using OpenLink Data Explorer (ODE) to populate Virtuoso -- locate a resource of interest; click on a bookmarklet or use context menus (if using ODE extensions for Firefox, Safari, or Chrome); and you'll have SPARQL accessible data automatically inserted into your Virtuoso instance.
  7. W3C's SPARQLing Data Access Ingenuity -- an older generic SPARQL introduction post
  8. Collection of SPARQL Query Examples -- GoodRelations (Product Offers), FOAF (Profiles), SIOC (Data Spaces -- Blogs, Wikis, Bookmarks, Feed Collections, Photo Galleries, Briefcase/DropBox, AddressBook, Calendars, Discussion Forums)
  9. Collection of Live SPARQL Queries against LOD Cloud Cache -- simple and advanced queries.
# PermaLink Comments [2]
01/16/2011 02:06 GMT Modified: 01/19/2011 10:43 GMT
Virtuoso Linked Data Deployment In 3 Simple Steps [ Kingsley Uyi Idehen ]

Injecting Linked Data into the Web has been a major pain point for those who seek personal, service, or organization-specific variants of DBpedia. Basically, the sequence goes something like this:

  1. You encounter DBpedia or the LOD Cloud Pictorial.
  2. You look around (typically following your nose from link to link).
  3. You attempt to publish your own stuff.
  4. You get stuck.

The problems typically take the following form:

  1. Functionality confusion about the complementary Name and Address functionality of a single URI abstraction
  2. Terminology confusion due to conflation and over-loading of terms such as Resource, URL, Representation, Document, etc.
  3. Inability to find robust tools with which to generate Linked Data from existing data sources such as relational databases, CSV files, XML, Web Services, etc.

To start addressing these problems, here is a simple guide for generating and publishing Linked Data using Virtuoso.

Step 1 - RDF Data Generation

Existing RDF data can be added to the Virtuoso RDF Quad Store via a variety of built-in data loader utilities.

Many options allow you to easily and quickly generate RDF data from other data sources:

  • Install the Sponger Bookmarklet for the URIBurner service. Bind this to your own SPARQL-compliant backend RDF database (in this scenario, your local Virtuoso instance), and then Sponge some HTTP-accessible resources.
  • Convert relational DBMS data to RDF using the Virtuoso RDF Views Wizard.
  • Starting with CSV files, you can
    • Place them at an HTTP-accessible location, and use the Virtuoso Sponger to convert them to RDF or;
    • Use the CVS import feature to import their content into Virtuoso's relational data engine; then use the built-in RDF Views Wizard as with other RDBMS data.
  • Starting from XML files, you can
    • Use Virtuoso's inbuilt XSLT-Processor for manual XML to RDF/XML transformation or;
    • Leverage the Sponger Cartridge for GRDDL, if there is a transformation service associated with your XML data source, or;
    • Let the Sponger analyze the XML data source and make a best-effort transformation to RDF.

Step 2 - Linked Data Deployment

Install the Faceted Browser VAD package (fct_dav.vad) which delivers the following:

  1. Faceted Browser Engine UI
  2. Dynamic Hypermedia Resource Generator
    • delivers descriptor resources for every entity (data object) in the Native or Virtual Quad Stores
    • supports a broad array of output formats, including HTML+RDFa, RDF/XML, N3/Turtle, NTriples, RDF-JSON, OData+Atom, and OData+JSON.

Step 3 - Linked Data Consumption & Exploitation

Three simple steps allow you, your enterprise, and your customers to consume and exploit your newly deployed Linked Data --

  1. Load a page like this in your browser: http://<cname>[:<port>]/describe/?uri=<entity-uri>
    • <cname>[:<port>] gets replaced by the host and port of your Virtuoso instance
    • <entity-uri> gets replaced by the URI you want to see described -- for instance, the URI of one of the resources you let the Sponger handle.
  2. Follow the links presented in the descriptor page.
  3. If you ever see a blank page with a hyperlink subject name in the About: section at the top of the page, simply add the parameter "&sp=1" to the URL in the browser's Address box, and hit [ENTER]. This will result in an "on the fly" resource retrieval, transformation, and descriptor page generation.
  4. Use the navigator controls to page up and down the data associated with the "in scope" resource descriptor.

Related

# PermaLink Comments [0]
10/29/2010 18:54 GMT Modified: 11/02/2010 11:55 GMT
The Business of Semantically Linked Data ("SemData") [ Orri Erling ]

I had the opportunity the other day to converse about the semantic technology business proposition in terms of business development. My interlocutor was a business development consultant who had little prior knowledge of this technology but a background in business development inside a large diversified enterprise.

I will here recap some of the points discussed, since these can be of broader interest.

Why is there no single dominant vendor?

The field is young. We can take the relational database industry as a historical precedent. From the inception of the relational database around 1970, it took 15 years for the relational model to become mainstream. "Mainstream" here does not mean dominant in installed base, but does mean something that one tends to include as a component in new systems. The figure of 15 years might repeat with RDF, from around 1990 for the first beginnings to 2015 for routine inclusion in new systems, where applicable.

This does not necessarily mean that the RDF graph data model (or more properly, EAV+CR; Entity-Attribute-Value + Classes and Relationships) will take the place of the RDBMS as the preferred data backbone. This could mean that RDF model serialization formats will be supported as data exchange mechanisms, and that systems will integrate data extracted by semantic technology from unstructured sources. Some degree of EAV storage is likely to be common, but on-line transactional data is guaranteed to stay pure relational, as EAV is suboptimal for OLTP. Analytics will see EAV alongside relational especially in applications where in-house data is being combined with large numbers of outside structured sources or with other open sources such as information extracted from the web.

EAV offerings will become integrated by major DBMS vendors, as is already the case with Oracle. Specialized vendors will exist alongside these, just as is the case with relational databases.

Can there be a positive reinforcement cycle (e.g., building cars creates a need for road construction, and better roads drive demand for more cars)? Or is this an up-front infrastructure investment that governments make for some future payoff or because of science-funding policies?

The Document Web did not start as a government infrastructure initiative. The infrastructure was already built, albeit first originating with the US defense establishment. The Internet became ubiquitous through the adoption of the Web. The general public's adoption of the Web was bootstrapped by all major business and media adopting the Web. They did not adopt the web because they particularly liked it, as it was essentially a threat to the position of media and to the market dominance of big players who could afford massive advertising in this same media. Adopting the web became necessary because of the prohibitive opportunity cost of not adopting it.

A similar process may take place with open data. For example, in E-commerce, vendors do not necessarily welcome easy-and-automatic machine-based comparison of their offerings against those of their competitors. Publishing data will however be necessary in order to be listed at all. Also, in social networks, we have the identity portability movement which strives to open the big social network silos. Data exchange via RDF serializations, as already supported in many places, is the natural enabling technology for this.

Will the web of structured data parallel the development of web 2.0?

Web 2.0 was about the blogosphere, exposure of web site service APIs, creation of affiliate programs, and so forth. If the Document Web was like a universal printing press, where anybody could publish at will, Web 2.0 was a newspaper, bringing the democratization of journalism, creating the blogger, the citizen journalist. The Data Web will create the Citizen Analyst, the Mini Media Mogul (e.g., social-network-driven coops comprised of citizen journalists, analysts, and other content providers such as video and audio producers and publishers). As the blogosphere became an alternative news source to the big media, the web of data may create an ecosystem of alternative data products. Analytics is no longer a government or big business only proposition.

Is there a specifically semantic market or business model, or will semantic technology be exploited under established business models and merged as a component technology into existing offerings?

We have seen a migration from capital expenses to operating expenses in the IT sector in general, as exemplified by cloud computing's Platform as a Service (PaaS) and Software as a Service (SaaS). It is reasonable to anticipate that this trend will continue to Data as a Service (DaaS). Microsoft Odata and Dallas are early examples of this and go towards legitimizing the data as service concept. DaaS is not related to semantic technology per se, but since this will involve integration of data, RDF serializations will be attractive, especially given the takeoff of linked data in general. The data models in Odata are also much like RDF, as both stem from EAV+CR, which makes for easy translation and a degree of inherent interoperability.

The integration of semantic technology into existing web properties and business applications will manifest to the end user as increased serendipity. The systems will be able to provide more relevant and better contextualized data for the user's situation. This applies equally to the consumer and business user cases.

Identity virtualization in the forms of WebID and Webfinger — making first-class de-referenceable identifiers of mailto: and acct: schemes — is emerging as a new way to open social network and Web 2.0 data silos.

On the software production side, especially as concerns data integration, the increased schema- and inference-flexibility of EAV will lead to a quicker time to answer in many situations. The more complex the task or the more diverse the data, the higher the potential payoff. Data in cyberspace is mirroring the complexity and diversity of the real world, where heterogeneity and disparity are simply facts of life, and such flexibility is becoming an inescapable necessity.

# PermaLink Comments [0]
09/22/2010 14:20 GMT Modified: 09/22/2010 13:44 GMT
The Business of Semantically Linked Data ("SemData") [ Virtuso Data Space Bot ]

I had the opportunity the other day to converse about the semantic technology business proposition in terms of business development. My interlocutor was a business development consultant who had little prior knowledge of this technology but a background in business development inside a large diversified enterprise.

I will here recap some of the points discussed, since these can be of broader interest.

Why is there no single dominant vendor?

The field is young. We can take the relational database industry as a historical precedent. From the inception of the relational database around 1970, it took 15 years for the relational model to become mainstream. "Mainstream" here does not mean dominant in installed base, but does mean something that one tends to include as a component in new systems. The figure of 15 years might repeat with RDF, from around 1990 for the first beginnings to 2015 for routine inclusion in new systems, where applicable.

This does not necessarily mean that the RDF graph data model (or more properly, EAV+CR; Entity-Attribute-Value + Classes and Relationships) will take the place of the RDBMS as the preferred data backbone. This could mean that RDF model serialization formats will be supported as data exchange mechanisms, and that systems will integrate data extracted by semantic technology from unstructured sources. Some degree of EAV storage is likely to be common, but on-line transactional data is guaranteed to stay pure relational, as EAV is suboptimal for OLTP. Analytics will see EAV alongside relational especially in applications where in-house data is being combined with large numbers of outside structured sources or with other open sources such as information extracted from the web.

EAV offerings will become integrated by major DBMS vendors, as is already the case with Oracle. Specialized vendors will exist alongside these, just as is the case with relational databases.

Can there be a positive reinforcement cycle (e.g., building cars creates a need for road construction, and better roads drive demand for more cars)? Or is this an up-front infrastructure investment that governments make for some future payoff or because of science-funding policies?

The Document Web did not start as a government infrastructure initiative. The infrastructure was already built, albeit first originating with the US defense establishment. The Internet became ubiquitous through the adoption of the Web. The general public's adoption of the Web was bootstrapped by all major business and media adopting the Web. They did not adopt the web because they particularly liked it, as it was essentially a threat to the position of media and to the market dominance of big players who could afford massive advertising in this same media. Adopting the web became necessary because of the prohibitive opportunity cost of not adopting it.

A similar process may take place with open data. For example, in E-commerce, vendors do not necessarily welcome easy-and-automatic machine-based comparison of their offerings against those of their competitors. Publishing data will however be necessary in order to be listed at all. Also, in social networks, we have the identity portability movement which strives to open the big social network silos. Data exchange via RDF serializations, as already supported in many places, is the natural enabling technology for this.

Will the web of structured data parallel the development of web 2.0?

Web 2.0 was about the blogosphere, exposure of web site service APIs, creation of affiliate programs, and so forth. If the Document Web was like a universal printing press, where anybody could publish at will, Web 2.0 was a newspaper, bringing the democratization of journalism, creating the blogger, the citizen journalist. The Data Web will create the Citizen Analyst, the Mini Media Mogul (e.g., social-network-driven coops comprised of citizen journalists, analysts, and other content providers such as video and audio producers and publishers). As the blogosphere became an alternative news source to the big media, the web of data may create an ecosystem of alternative data products. Analytics is no longer a government or big business only proposition.

Is there a specifically semantic market or business model, or will semantic technology be exploited under established business models and merged as a component technology into existing offerings?

We have seen a migration from capital expenses to operating expenses in the IT sector in general, as exemplified by cloud computing's Platform as a Service (PaaS) and Software as a Service (SaaS). It is reasonable to anticipate that this trend will continue to Data as a Service (DaaS). Microsoft Odata and Dallas are early examples of this and go towards legitimizing the data as service concept. DaaS is not related to semantic technology per se, but since this will involve integration of data, RDF serializations will be attractive, especially given the takeoff of linked data in general. The data models in Odata are also much like RDF, as both stem from EAV+CR, which makes for easy translation and a degree of inherent interoperability.

The integration of semantic technology into existing web properties and business applications will manifest to the end user as increased serendipity. The systems will be able to provide more relevant and better contextualized data for the user's situation. This applies equally to the consumer and business user cases.

Identity virtualization in the forms of WebID and Webfinger — making first-class de-referenceable identifiers of mailto: and acct: schemes — is emerging as a new way to open social network and Web 2.0 data silos.

On the software production side, especially as concerns data integration, the increased schema- and inference-flexibility of EAV will lead to a quicker time to answer in many situations. The more complex the task or the more diverse the data, the higher the potential payoff. Data in cyberspace is mirroring the complexity and diversity of the real world, where heterogeneity and disparity are simply facts of life, and such flexibility is becoming an inescapable necessity.

# PermaLink Comments [0]
09/22/2010 14:20 GMT Modified: 09/22/2010 13:44 GMT
Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 5 [ Kingsley Uyi Idehen ]

After a long period of trying to demystify and unravel the wonders of standards compliant structured data access, combined with protocols (e.g., HTTP) that separate:

  1. Identity,
  2. Access,
  3. Storage,
  4. Representation, and
  5. Presentation.

I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.

Some Related Work

Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).

It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.

Data 3.0 manifesto

  • An "Entity" is the "Referent" of an "Identifier."
  • An "Identifier" SHOULD provide a global, unambiguous, and unchanging (though it MAY be opaque!) "Name" for its "Referent".
  • A "Referent" MAY have many "Identifiers" (Names), but each "Identifier" MUST have only one "Referent".
  • Structured Entity Descriptions SHOULD be based on the Entity-Attribute-Value (EAV) Data Model, and SHOULD therefore take the form of one or more 3-tuples (triples), each comprised of:
    • an "Identifier" that names an "Entity" (i.e., Entity Name),
    • an "Identifier" that names an "Attribute" (i.e., Attribute Name), and
    • an "Attribute Value", which may be an "Identifier" or a "Literal".
  • Structured Descriptions SHOULD be CARRIED by "Descriptor Documents" (i.e., purpose specific documents where Entity Identifiers, Attribute Identifiers, and Attribute Values are clearly discernible by the document's intended consumers, e.g., humans or machines).
  • Structured Descriptor Documents can contain (carry) several Structured Entity Descriptions
  • Stuctured Descriptor Documents SHOULD be network accessible via network addresses (e.g., HTTP URLs when dealing with HTTP-based Networks).
  • An Identifier SHOULD resolve (de-reference) to a Structured Representation of the Referent's Structured Description.

Related

# PermaLink Comments [6]
04/16/2010 17:09 GMT Modified: 05/25/2010 17:10 GMT
Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 5 [ Kingsley Uyi Idehen ]

After a long period of trying to demystify and unravel the wonders of standards compliant structured data access, combined with protocols (e.g., HTTP) that separate:

  1. Identity,
  2. Access,
  3. Storage,
  4. Representation, and
  5. Presentation.

I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.

Some Related Work

Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).

It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.

Data 3.0 manifesto

  • An "Entity" is the "Referent" of an "Identifier."
  • An "Identifier" SHOULD provide a global, unambiguous, and unchanging (though it MAY be opaque!) "Name" for its "Referent".
  • A "Referent" MAY have many "Identifiers" (Names), but each "Identifier" MUST have only one "Referent".
  • Structured Entity Descriptions SHOULD be based on the Entity-Attribute-Value (EAV) Data Model, and SHOULD therefore take the form of one or more 3-tuples (triples), each comprised of:
    • an "Identifier" that names an "Entity" (i.e., Entity Name),
    • an "Identifier" that names an "Attribute" (i.e., Attribute Name), and
    • an "Attribute Value", which may be an "Identifier" or a "Literal".
  • Structured Descriptions SHOULD be CARRIED by "Descriptor Documents" (i.e., purpose specific documents where Entity Identifiers, Attribute Identifiers, and Attribute Values are clearly discernible by the document's intended consumers, e.g., humans or machines).
  • Structured Descriptor Documents can contain (carry) several Structured Entity Descriptions
  • Stuctured Descriptor Documents SHOULD be network accessible via network addresses (e.g., HTTP URLs when dealing with HTTP-based Networks).
  • An Identifier SHOULD resolve (de-reference) to a Structured Representation of the Referent's Structured Description.

Related

# PermaLink Comments [6]
04/16/2010 17:09 GMT Modified: 05/25/2010 17:10 GMT
Data 3.0 (a Manifesto for Platform Agnostic Structured Data) Update 5 [ Kingsley Uyi Idehen ]

After a long period of trying to demystify and unravel the wonders of standards compliant structured data access, combined with protocols (e.g., HTTP) that separate:

  1. Identity,
  2. Access,
  3. Storage,
  4. Representation, and
  5. Presentation.

I ended up with what I can best describe as the Data 3.0 Manifesto. A manifesto for standards complaint access to structured data object (or entity) descriptors.

Some Related Work

Alex James (Program Manager Entity Frameworks at Microsoft), put together something quite similar to this via his Base4 blog (around the Web 2.0 bootstrap time), sadly -- quoting Alex -- that post has gone where discontinued blogs and their host platforms go (deep deep irony here).

It's also important to note that this manifesto is also a variant of the TimBL's Linked Data Design Issues meme re. Linked Data, but totally decoupled from RDF (data representation formats aspect) and SPARQL which -- in my world view -- remain implementation details.

Data 3.0 manifesto

  • An "Entity" is the "Referent" of an "Identifier."
  • An "Identifier" SHOULD provide a global, unambiguous, and unchanging (though it MAY be opaque!) "Name" for its "Referent".
  • A "Referent" MAY have many "Identifiers" (Names), but each "Identifier" MUST have only one "Referent".
  • Structured Entity Descriptions SHOULD be based on the Entity-Attribute-Value (EAV) Data Model, and SHOULD therefore take the form of one or more 3-tuples (triples), each comprised of:
    • an "Identifier" that names an "Entity" (i.e., Entity Name),
    • an "Identifier" that names an "Attribute" (i.e., Attribute Name), and
    • an "Attribute Value", which may be an "Identifier" or a "Literal".
  • Structured Descriptions SHOULD be CARRIED by "Descriptor Documents" (i.e., purpose specific documents where Entity Identifiers, Attribute Identifiers, and Attribute Values are clearly discernible by the document's intended consumers, e.g., humans or machines).
  • Structured Descriptor Documents can contain (carry) several Structured Entity Descriptions
  • Stuctured Descriptor Documents SHOULD be network accessible via network addresses (e.g., HTTP URLs when dealing with HTTP-based Networks).
  • An Identifier SHOULD resolve (de-reference) to a Structured Representation of the Referent's Structured Description.

Related

# PermaLink Comments [6]
04/16/2010 17:09 GMT Modified: 05/25/2010 17:10 GMT
Revisiting HTTP based Linked Data (Update 1 - Demo Video Links Added) [ Kingsley Uyi Idehen ]

Motivation for this post arose from a series of Twitter exchanges between Tony Hirst and I, in relation to his blog post titled: So What Is It About Linked Data that Makes it Linked Data™ ?

At the end of the marathon session, it was clear to me that a blog post was required for future reference, at the very least :-)

What is Linked Data?

"Data Access by Reference" mechanism for Data Objects (or Entities) on HTTP networks. It enables you to Identify a Data Object and Access its structured Data Representation via a single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may vary; but in all cases, they are hypermedia oriented, fully structured, and negotiable within the context of a client-server message exchange.

Why is it Important?

Information makes the world tick!

Information doesn't exist without data to contextualize.

Information is inaccessible without a projection (presentation) medium.

All information (without exception, when produced by humans) is subjective. Thus, to truly maximize the innate heterogeneity of collective human intelligence, loose coupling of our information and associated data sources is imperative.

How is Linked Data Delivered?

Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources bearing structured representations of data object descriptions. Remember, you have a single Identifier abstraction (generic HTTP URI) that embodies: Data Object Name and Data Representation Location (aka URL).

How are Linked Data Object Representations Structured?

A structured representation of data exists when an Entity (Datum), its Attributes, and its Attribute Values are clearly discernible. In the case of a Linked Data Object, structured descriptions take the form of a hypermedia based Entity-Attribute-Value (EAV) graph pictorial -- where each Entity, its Attributes, and its Attribute Values (optionally) are identified using Generic HTTP URIs.

Examples of structured data representation formats (content types) associated with Linked Data Objects include:

  • text/html
  • text/turtle
  • text/n3
  • application/json
  • application/rdf+xml
  • Others

How Do I Create Linked Data oriented Hypermedia Resources?

You markup resources by expressing distinct entity-attribute-value statements (basically these a 3-tuple records) using a variety of notations:

  • (X)HTML+RDFa,
  • JSON,
  • Turtle,
  • N3,
  • TriX,
  • TriG,
  • RDF/XML, and
  • Others (for instance you can use Atom data format extensions to model EAV graph as per OData initiative from Microsoft).

You can achieve this task using any of the following approaches:

  • Notepad
  • WYSIWYG Editor
  • Transformation of Database Records via Middleware
  • Transformation of XML based Web Services output via Middleware
  • Transformation of other Hypermedia Resources via Middleware
  • Transformation of non Hypermedia Resources via Middleware
  • Use a platform that delivers all of the above.

Practical Examples of Linked Data Objects Enable

  • Describe Who You Are, What You Offer, and What You Need via your structured profile, then leave your HTTP network to perform the REST (serendipitous discovery of relevant things)
  • Identify (via map overlay) all items of interest based on a 2km+ radious of my current location (this could include vendor offerings or services sought by existing or future customers)
  • Share the latest and greatest family photos with family members *only* without forcing them to signup for Yet Another Web 2.0 service or Social Network
  • No repetitive signup and username and password based login sequences per Web 2.0 or Mobile Application combo
  • Going beyond imprecise Keyword Search to the new frontier of Precision Find - Example, Find Data Objects associated with the keywords: Tiger, while enabling the seeker disambiguate across the "Who", "What", "Where", "When" dimensions (with negation capability)
  • Determine how two Data Objects are Connected - person to person, person to subject matter etc. (LinkedIn outside the walled garden)
  • Use any resource address (e.g blog or bookmark URL) as the conduit into a Data Object mesh that exposes all associated Entities and their social network relationships
  • Apply patterns (social dimensions) above to traditional enterprise data sources in combination (optionally) with external data without compromising security etc.

How Do OpenLink Software Products Enable Linked Data Exploitation?

Our data access middleware heritage (which spans 16+ years) has enabled us to assemble a rich portfolio of coherently integrated products that enable cost-effective evaluation and utilization of Linked Data, without writing a single line of code, or exposing you to the hidden, but extensive admin and configuration costs. Post installation, the benefits of Linked Data simply materialize (along the lines described above).

Our main Linked Data oriented products include:

  • OpenLink Data Explorer -- visualizes Linked Data or Linked Data transformed "on the fly" from hypermedia and non hypermedia data sources
  • URIBurner -- a "deceptively simple" solution that enables the generation of Linked Data "on the fly" from a broad collection of data sources and resource types
  • OpenLink Data Spaces -- a platform for enterprises and individuals that enhances distributed collaboration via Linked Data driven virtualization of data across its native and/or 3rd party content manager for: Blogs, Wikis, Shared Bookmarks, Discussion Forums, Social Networks etc
  • OpenLink Virtuoso -- a secure and high-performance native hybrid data server (Relational, RDF-Graph, Document models) that includes in-built Linked Data transformation middleware (aka. Sponger).

Related

# PermaLink Comments [0]
03/04/2010 10:16 GMT Modified: 03/08/2010 09:59 GMT
 <<     | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform