A piece of built-in Virtuoso SPARQL Processor middleware for extracting RDF "on the fly" from non-RDF Web Data Sources.
When an RDF-aware client requests data from a network-accessible resource via the Sponger, the following events occur:
- A request is made for data in RDF form, and if RDF is returned, no further transformation happens -- the RDF Entities are returned to the client
- If RDF isn't returned, then the Sponger passes the data through a Metadata Extraction Pipeline process (using Metadata Extractors)
- The extracted data is transformed to RDF via a Mapping Pipeline process (RDF is extracted by ontology matching and mapping) that generates RDF Entities (instance data)
- RDF Entities are returned to the client
The imported data forms a local cache with invalidation rules conforming to those of traditional HTTP clients (Web Browsers). That is to say, expiration time is determined based on subsequent data fetches of the same resource (note: the first data load will record the 'expires' header) with current time compared to expiration time stored in the local cache. If HTTP 'expires' header data isn't returned by the source data server, then the Sponger will derive it's own invalidation time frame by evaluating the 'date' header and 'last-modified' HTTP headers. Irrespective of path taken, local cache invalidation is driven by an assessment of current time relative to recorded expiration time.
The Schema Mappers are typically XSLT- (e.g., GRDDL and other OpenLink Mapping Schemes) or Virtuoso PL-based. The Metadata Extractors may be developed in Virtuoso PL, C/C++, Java, or any other language that can be integrated into Virtuoso via its server extension APIs.
- Ontology Mapper
- Target Ontology
- Data Extractor, e.g., RDF Cartridge Programmer Guide
- OpenLink Cartridge-Supported Data Sources
- Note: The cartridges_filesystem.vad must be installed for the actual extraction and mapping to occur.
The Sponger is basically very close to an implementation of cURL exposed as built-in Virtuoso Web Service (so you can interact with it as you do Triple).
- Virtuoso Demo Server Proxy - same as using http://fgiasson.com as a URI in the OpenLink Browser (which has built-in support for /proxy)
- A Googlebase Query Service URI for Job vacancies - http://www.google.com/base/feeds/snippets?bq=%20[employer:%20Hewlett-Packard]%20%20[job%20type:full-time]
- Non-RDF URI to demonstrate the Sponger via:
- Kingsley Idehen Weblog Post about RDF Middleware
- Frederick Giasson Weblog Post about RDF Middleware and including examples of "/proxy" usage
- Sponger Proxy Service Documentation
- RDF Mappers Programmers Guide
- Documentation - Dereferencing URIs & Linked Data