Details
Kingsley Uyi Idehen
Lexington, United States
Subscribe
Post Categories
Subscribe
Recent Articles
Display Settings
|
Simple Virtuoso Installation & Utilization Guide for SPARQL Users (Update 5)
A declarative query language from the W3C for querying
structured propositional data (in the form of
3-tuple [triples] or 4-tuple [quads] records)
stored in a deductive database (colloquially referred
to as triple or quad stores in Semantic Web and Linked Data parlance).
SPARQL is inherently platform independent. Like SQL, the query language and the backend
database engine are distinct. Database clients capture SPARQL
queries which are then passed on to compliant backend
databases.
Why is it important?
Like SQL for relational databases, it provides a powerful
mechanism for accessing and joining data across one or more data
partitions (named graphs identified by IRIs). The aforementioned
capability also enables the construction of sophisticated Views,
Reports (HTML or those produced in native form by desktop
productivity tools), and data streams for other services.
Unlike SQL, SPARQL includes result serialization formats and an
HTTP based wire protocol. Thus, the ubiquity and sophistication of
HTTP is integral to SPARQL i.e., client side applications (user
agents) only need to be able to perform an HTTP GET against a
URL en route to exploiting the power of
SPARQL.
How do I use it, generally?
- Locate a SPARQL endpoint (DBpedia, LOD Cloud
Cache, Data.Gov, URIBurner, others), or;
- Install a SPARQL compliant database server (quad or triple
store) on your desktop, workgroup server, data center, or cloud
(e.g., Amazon EC2 AMI)
- Start the database server
- Execute SPARQL Queries via the SPARQL
endpoint.
How do I use SPARQL with Virtuoso?
What follows is a very simple guide for using SPARQL against
your own instance of Virtuoso:
- Software Download and Installation
- Data Loading from Data Sources exposed at Network Addresses
(e.g. HTTP URLs) using very simple methods
- Actual SPARQL query execution via SPARQL endpoint.
Installation Steps
- Download Virtuoso Open Source or Virtuoso Commercial Editions
- Run installer (if using Commercial edition of Windows Open
Source Edition, otherwise follow build guide)
- Follow post-installation guide and verify installation by
typing in the command: virtuoso -? (if this fails check you've
followed installation and setup steps, then verify environment
variables have been set)
- Start the Virtuoso server using the command:
virtuoso-start.sh
- Verify you have a connection to the Virtuoso Server via the
command: isql localhost (assuming you're using default DB settings)
or the command: isql localhost:1112 (assuming demo database) or
goto your browser and type in:
http://<virtuoso-server-host-name>:[port]/conductor (e.g.
http://localhost:8889/conductor for default DB or
http://localhost:8890/conductor if using Demo DB)
- Go to SPARQL endpoint which is typically --
http://<virtuoso-server-host-name>:[port]/sparql
- Run a quick sample query (since the database always has system
data in place): select distinct * where {?s ?p ?o} limit 50 .
Troubleshooting
- Ensure environment settings are set and functional -- if using
Mac OS X or Windows, so you don't have to worry about this, just
start and stop your Virtuoso server using native OS services
applets
- If using the Open Source Edition, follow the getting started guide -- it covers PATH
and startup directory location re. starting and stopping Virtuoso
servers.
- Sponging (HTTP GETs against external Data Sources) within
SPARQL queries is disabled by default. You can enable this feature
by assigning "SPARQL_SPONGE" privileges to user
"SPARQL". Note, more sophisticated security exists via WebID based ACLs.
Data Loading Steps
- Identify an RDF based structured data source of interest -- a
file that contains 3-tuple / triples available at an address on a
public or private HTTP based network
- Determine the Address (URL) of the RDF data source
- Go to your Virtuoso SPARQL endpoint and type in the following
SPARQL query: DEFINE GET:SOFT "replace" SELECT DISTINCT * FROM
<RDFDataSourceURL> WHERE {?s ?p ?o}
- All the triples in the RDF resource (data source accessed via
URL) will be loaded into the Virtuoso Quad Store (using RDF Data
Source URL as the internal quad store Named Graph IRI) as part of
the SPARQL query processing pipeline.
Note: the data source URL doesn't even have to be RDF based --
which is where the Virtuoso Sponger Middleware comes into play
(download and install the VAD installer package first) since it
delivers the following features to Virtuoso's SPARQL engine:
- Transformation of data from non RDF data sources (file content,
hypermedia resources, web services
output etc..) into RDF based 3-tuples (triples)
- Cache Invalidation Scheme Construction -- thus, subsequent
queries (without the define get:soft "replace" pragma will not be
required bar when you forcefully want to override cache).
- If you have very large data sources like DBpedia etc. from
CKAN, simply use our bulk loader .
SPARQL Endpoint Discovery
Public SPARQL endpoints are emerging at an ever increasing rate.
Thus, we've setup up a DNS lookup service that provides access to a
large number of SPARQL endpoints. Of course, this doesn't cover all
existing endpoints, so if our endpoint is missing please ping
me.
Here are a collection of commands for using DNS-SD to discover
SPARQL endpoints:
- dns-sd -B _sparql._tcp sparql.openlinksw.com -- browse for
services instances
- dns-sd -Z _sparql._tcp sparql.openlinksw.com -- output results
in Zone File format
Related
-
Using HTTP from Ruby -- you can just
make SPARQL Protocol URLs re. SPARQL
-
Using SPARQL Endpoints via Ruby -- Ruby
example using DBpedia endpoint
-
Interactive SPARQL Query By Example (QBE)
tool -- provides a graphical user interface (as is common in
SQL realm re. query building against RDBMS engines) that works with any
SPARQL endpoint
-
Other methods of loading RDF data into
Virtuoso
-
Virtuoso Sponger -- architecture and how
it turns a wide variety of non RDF data sources into SPARQL
accessible data
-
Using OpenLink Data Explorer (ODE) to
populate Virtuoso -- locate a resource of interest; click on a
bookmarklet or use context menus (if using ODE extensions for
Firefox, Safari, or Chrome); and you'll have SPARQL accessible data
automatically inserted into your Virtuoso instance.
-
W3C's SPARQLing Data Access Ingenuity --
an older generic SPARQL introduction post
-
Collection of SPARQL Query Examples --
GoodRelations (Product Offers), FOAF (Profiles), SIOC
(Data Spaces -- Blogs, Wikis, Bookmarks, Feed Collections, Photo Galleries, Briefcase/DropBox, AddressBook, Calendars, Discussion Forums)
-
Collection of Live SPARQL Queries against LOD
Cloud Cache -- simple and advanced queries.
01/16/2011 02:06 GMT-0500 |
Modified: 01/19/2011 10:43
GMT-0500 |
Virtuoso Linked Data Deployment 3-Step
Injecting Linked Data into the Web has been a major
pain point for those who seek personal, service, or
organization-specific variants of DBpedia. Basically, the sequence goes
something like this:
- You encounter DBpedia or the LOD Cloud Pictorial.
- You look around (typically following your nose from link to
link).
- You attempt to publish your own stuff.
- You get stuck.
The problems typically take the following form:
- Functionality confusion about the complementary Name and
Address functionality of a single URI abstraction
- Terminology confusion due to conflation and over-loading of
terms such as Resource, URL, Representation, Document, etc.
- Inability to find robust tools with which to generate Linked
Data from existing data sources such as relational databases, CSV
files, XML, Web Services, etc.
To start addressing these problems, here is a simple guide for
generating and publishing Linked Data using Virtuoso.
Step 1 - RDF Data Generation
Existing RDF data can be added to the Virtuoso RDF Quad Store
via a variety of built-in data loader utilities.
Many options allow you to easily and quickly generate RDF data
from other data sources:
- Install the Sponger Bookmarklet for the URIBurner
service. Bind this to your own SPARQL-compliant backend RDF database (in
this scenario, your local Virtuoso instance), and then Sponge some
HTTP-accessible resources.
- Convert relational DBMS data to RDF using the Virtuoso RDF
Views Wizard.
- Starting with CSV files, you can
- Place them at an HTTP-accessible location, and use the Virtuoso
Sponger to convert them to RDF or;
- Use the CVS import feature to import their content into
Virtuoso's relational data engine; then use the built-in RDF Views
Wizard as with other RDBMS data.
- Starting from XML files, you can
- Use Virtuoso's inbuilt XSLT-Processor for manual XML to RDF/XML
transformation or;
- Leverage the Sponger Cartridge for GRDDL, if there is a transformation service
associated with your XML data source, or;
- Let the Sponger analyze the XML data source and make a
best-effort transformation to RDF.
Step 2 - Linked Data Deployment
Install the Faceted Browser VAD package
(fct_dav.vad ) which delivers the following:
- Faceted Browser Engine UI
- Dynamic Hypermedia Resource Generator
- delivers descriptor resources for every entity (data object) in the Native or
Virtual Quad Stores
- supports a broad array of output formats, including
HTML+RDFa, RDF/XML, N3/Turtle, NTriples,
RDF-JSON, OData+Atom, and OData+JSON.
Step 3 - Linked Data Consumption & Exploitation
Three simple steps allow you, your enterprise, and your
customers to consume and exploit your newly deployed Linked Data
--
- Load a page like this in your browser:
http://<cname>[:<port>]/describe/?uri=<entity-uri>
-
<cname>[:<port>] gets replaced by the
host and port of your Virtuoso instance
-
<entity-uri> gets replaced by the URI you
want to see described -- for instance, the URI of one of the
resources you let the Sponger handle.
- Follow the links presented in the descriptor page.
- If you ever see a blank page with a hyperlink subject name in
the About: section at the top of the page, simply add the parameter
"&sp=1" to the URL in the browser's Address box, and hit
[ENTER]. This will result in an "on the fly" resource retrieval,
transformation, and descriptor page generation.
- Use the navigator controls to page up and down the data
associated with the "in scope" resource descriptor.
Related
10/29/2010 18:54 GMT-0500 |
Modified: 11/02/2010 11:57
GMT-0500 |
Virtuoso Linked Data Deployment In 3 Simple Steps
Injecting Linked Data into the Web has been a
major pain point for those who seek personal, service, or
organization-specific variants of DBpedia. Basically, the sequence goes
something like this:
- You encounter DBpedia or the LOD Cloud Pictorial.
- You look around (typically following your nose from link to
link).
- You attempt to publish your own stuff.
- You get stuck.
The problems typically take the following form:
- Functionality confusion about the complementary Name and
Address functionality of a single URI abstraction
- Terminology confusion due to conflation and over-loading of
terms such as Resource, URL, Representation, Document, etc.
- Inability to find robust tools with which to generate Linked
Data from existing data sources such as
relational databases, CSV files, XML, Web Services, etc.
To start addressing these problems, here is a simple guide for
generating and publishing Linked Data using Virtuoso.
Step 1 - RDF Data Generation
Existing RDF data can be added to the Virtuoso RDF Quad Store
via a variety of built-in data loader utilities.
Many options allow you to easily and quickly generate RDF data
from other data sources:
- Install the Sponger Bookmarklet for the URIBurner
service. Bind this to your own SPARQL-compliant backend RDF database (in
this scenario, your local Virtuoso instance), and then Sponge some
HTTP-accessible resources.
- Convert relational DBMS data to RDF using the Virtuoso RDF
Views Wizard.
- Starting with CSV files, you can
- Place them at an HTTP-accessible location, and use the Virtuoso
Sponger to convert them to RDF or;
- Use the CVS import feature to import their content into
Virtuoso's relational data engine; then use the built-in RDF Views
Wizard as with other RDBMS data.
- Starting from XML files, you can
- Use Virtuoso's inbuilt XSLT-Processor for manual XML to RDF/XML
transformation or;
- Leverage the Sponger Cartridge for GRDDL, if there is a transformation service
associated with your XML data source, or;
- Let the Sponger analyze the XML data source and make a
best-effort transformation to RDF.
Step 2 - Linked Data Deployment
Install the Faceted Browser VAD package
(fct_dav.vad ) which delivers the following:
- Faceted Browser Engine UI
- Dynamic Hypermedia Resource Generator
- delivers descriptor resources for every entity (data object) in the Native or
Virtual Quad Stores
- supports a broad array of output formats, including
HTML+RDFa, RDF/XML, N3/Turtle, NTriples,
RDF-JSON, OData+Atom, and OData+JSON.
Step 3 - Linked Data Consumption & Exploitation
Three simple steps allow you, your enterprise, and your
customers to consume and exploit your newly deployed Linked Data
--
- Load a page like this in your browser:
http://<cname>[:<port>]/describe/?uri=<entity-uri>
-
<cname>[:<port>] gets replaced by the
host and port of your Virtuoso instance
-
<entity-uri> gets replaced by the URI you
want to see described -- for instance, the URI of one of the
resources you let the Sponger handle.
- Follow the links presented in the descriptor page.
- If you ever see a blank page with a hyperlink subject name in
the About: section at the top of the page, simply add the parameter
"&sp=1" to the URL in the browser's Address box, and hit
[ENTER]. This will result in an "on the fly" resource retrieval,
transformation, and descriptor page generation.
- Use the navigator controls to page up and down the data
associated with the "in scope" resource descriptor.
Related
10/29/2010 18:54 GMT-0500 |
Modified: 11/02/2010 11:55
GMT-0500 |
Revisiting HTTP based Linked Data (Update 1 - Demo Video Links Added)
Motivation for this post arose from a series of Twitter
exchanges between Tony Hirst and I, in relation to his
blog post titled: So What Is It About Linked Data that Makes it
Linked Data™ ?
At the end of the marathon session, it was clear to me that a blog post was required for future
reference, at the very least :-)
"Data Access by Reference" mechanism for Data
Objects (or Entities) on HTTP networks. It enables you to Identify
a Data Object and Access its structured Data Representation via a
single Generic HTTP scheme based Identifier (HTTP URI). Data Object representation formats may
vary; but in all cases, they are hypermedia oriented, fully structured, and
negotiable within the context of a client-server message
exchange.
Why is it Important?
Information makes the world tick!
Information doesn't exist without data to contextualize.
Information is inaccessible without a projection (presentation)
medium.
All information (without exception, when produced by humans) is
subjective. Thus, to truly maximize the innate heterogeneity of
collective human intelligence, loose coupling of our information
and associated data sources is imperative.
How is Linked Data Delivered?
Linked Data is exposed to HTTP networks (e.g. World Wide Web) via hypermedia resources
bearing structured representations of data object descriptions.
Remember, you have a single Identifier abstraction (generic HTTP
URI) that embodies: Data Object Name and Data Representation
Location (aka URL).
How are Linked Data Object Representations Structured?
A structured representation of data exists when an Entity (Datum), its Attributes, and its
Attribute Values are clearly discernible. In the case of a Linked
Data Object, structured descriptions take the form of a hypermedia
based Entity-Attribute-Value (EAV) graph pictorial
-- where each Entity, its Attributes, and its Attribute Values
(optionally) are identified using Generic HTTP URIs.
Examples of structured data representation formats (content
types) associated with Linked Data Objects include:
- text/html
- text/turtle
- text/n3
- application/json
- application/rdf+xml
- Others
How Do I Create Linked Data oriented Hypermedia Resources?
You markup resources by expressing distinct
entity-attribute-value statements (basically these a 3-tuple
records) using a variety of notations:
- (X)HTML+RDFa,
-
JSON,
-
Turtle,
-
N3,
-
TriX,
-
TriG,
-
RDF/XML, and
- Others (for instance you can use Atom data format extensions to
model EAV graph as per OData initiative from Microsoft).
You can achieve this task using any of the following
approaches:
- Notepad
- WYSIWYG Editor
- Transformation of Database Records via Middleware
- Transformation of XML based Web Services
output via Middleware
- Transformation of other Hypermedia Resources via
Middleware
- Transformation of non Hypermedia Resources via Middleware
- Use a platform that delivers all of the above.
Practical Examples of Linked Data Objects Enable
- Describe Who You Are, What You Offer, and What You Need via
your structured profile, then leave your HTTP network to perform
the REST (serendipitous discovery of relevant things)
- Identify (via map overlay) all items of interest based on a
2km+ radious of my current location (this could include vendor
offerings or services sought by existing or future customers)
- Share the latest and greatest family photos with family members
*only* without forcing them to signup for Yet Another Web 2.0
service or Social Network
- No repetitive signup and username and password based login
sequences per Web 2.0 or Mobile Application combo
- Going beyond imprecise Keyword Search to the new frontier of
Precision Find - Example, Find Data Objects associated with the
keywords: Tiger, while enabling the seeker disambiguate across the
"Who", "What", "Where", "When" dimensions (with negation
capability)
- Determine how two Data Objects are Connected - person to
person, person to subject matter etc. (LinkedIn outside the walled
garden)
- Use any resource address (e.g blog
or bookmark URL) as the conduit into a Data Object mesh that
exposes all associated Entities and their social network
relationships
- Apply patterns (social dimensions) above to traditional
enterprise data sources in combination (optionally) with external
data without compromising security etc.
How Do OpenLink Software Products Enable Linked
Data Exploitation?
Our data access middleware heritage (which spans 16+ years) has
enabled us to assemble a rich portfolio of coherently integrated
products that enable cost-effective evaluation and utilization of
Linked Data, without writing a single line of code, or exposing you
to the hidden, but extensive admin and configuration costs. Post
installation, the benefits of Linked Data simply materialize (along
the lines described above).
Our main Linked Data oriented products include:
-
OpenLink Data Explorer -- visualizes Linked
Data or Linked Data transformed "on the fly" from hypermedia and
non hypermedia data sources
-
URIBurner -- a "deceptively simple" solution
that enables the generation of Linked Data "on the fly" from a
broad collection of data sources and resource types
-
OpenLink Data Spaces -- a platform for
enterprises and individuals that enhances distributed collaboration
via Linked Data driven virtualization of data across its native
and/or 3rd party content manager for: Blogs, Wikis, Shared
Bookmarks, Discussion Forums, Social Networks etc
-
OpenLink Virtuoso -- a secure and
high-performance native hybrid data server (Relational, RDF-Graph,
Document models) that includes in-built Linked Data transformation
middleware (aka. Sponger).
Related
03/04/2010 10:16 GMT-0500 |
Modified: 03/08/2010 09:59
GMT-0500 |
Linked Data & Socially Enhanced Collaboration (Enterprise or Individual) -- Update 1
Socially enhanced enterprise and invididual collaboration is
becoming a focal point for a variety of solutions that offer
erswhile distinct content managment features across the realms of
Blogging, Wikis, Shared Bookmarks, Discussion Forums etc.. as part
of an integrated platform suite. Recently, Socialtext
has caught my attention courtesy of its nice features and benefits page . In addition,
I've also found the Mike 2.0 portal immensely interesting and
valuable, for those with an enterprise collaboration bent.
Anyway, Socialtext and Mike 2.0 (they aren't identical and
juxtaposition isn't seeking to imply this) provide nice
demonstrations of socially enhanced collaboration for individuals
and/or enterprises is all about:
- Identifying Yourself
- Identifying Others (key contributors, peers,
collaborators)
- Serendipitous Discovery of key contributors, peers, and
collaborators
- Serendipitous Discovery by key contributors, peers, and
collaborators
- Develop and sustain relationships via socially enhanced
professional network hybrid
- Utilize your new "trusted network" (which you've personally
indexed) when seeking help or propagating a meme.
As is typically the case in this emerging realm, the critical
issue of discrete "identifiers" (record keys in sense) for data items, data containers,
and data creators (individuals and groups) is overlooked albeit
unintentionally.
How HTTP based Linked Data Addresses the Identifier
Issue
Rather than using platform constrained identifiers such as:
- email address (a "mailto" scheme identifier),
- a dbms user account,
- application specific account, or
- OpenID.
It enables you to leverage the platform independence of HTTP
scheme Identifiers (Generic URIs) such that Identifiers for:
- You,
- Your Peers,
- Your Groups, and
- Your Activity Generated Data,
simply become conduits into a mesh of HTTP -- referencable and accessible -- Linked
Data Objects endowed with High SDQ (Serendipitious Discovery
Quotient). For example my Personal WebID is all anyone needs to know if
they want to explore:
- My Profile (which includes references to data objects
associated with my interests, social-network, calendar, bookmarks
etc.)
- Data generated by my activities across various data spaces (via
data objects associated with my online accounts e.g. Del.icio.us, Twitter, Last.FM)
-
Linked Data Meshups via URIBurner (or any
other Virtuoso instance) that provide an extend
view of my profile
How FOAF+SSL adds Socially aware Security
Even when you reach a point of equilibrium where: your daily
activities trigger orchestratestration of CRUD (Create, Read,
Update, Delete) operations against Linked Data Objects within your
socially enhanced collaboration network, you still have to deal
with the thorny issues of security, that includes the
following:
- Single Sign On,
- Authentication, and
- Data Access Policies.
FOAF+SSL, an application of HTTP based Linked Data, enables you
to enhance your Personal HTTP scheme based Identifer (or WebID) via
the following steps (peformed by a FOAF+SSL compliant
platform):
- Imprint WebID within a self-signed x.509 based public key
(certificate) associated with your private key (generated by
FOAF+SSL platform or manually via OpenSSL)
- Store public key components (modulous and exponent) into your
FOAF based profile document which references your Personal HTTP
Identifier as its primary topic
- Leverage HTTP URL component of WebID for making public key
components (modulous and exponent) available for x.509 certificate
based authentication challenges posed by systems secured by
FOAF+SSL (directly) or OpenID (indirectly via FOAF+SSL to OpenID
proxy services).
Contrary to conventional experiences with all things PKI (Public
Key Infrastructure) related, FOAF+SSL compliant platforms typically
handle the PKI issues as part of the protocol implementation;
thereby protecting you from any administrative tedium without
compromising security.
Conclusions
Understanding how new technology innovations address long
standing problems, or understanding how new solutions inadvertently
fail to address old problems, provides time tested mechanisms for
product selection and value proposition comprehension that
ultimately save scarce resources such as time and money.
If you want to understand real world problem solution #1 with
regards to HTTP based Linked Data look no further than the issues
of secure, socially aware, and platform independent identifiers for
data objects, that build bridges across erstwhile data silos.
If you want to cost-effectively experience what I've outlined in
this post, take a look at OpenLink
Data Spaces (ODS) which is a distributed collaboration
engine (enterprise of individual) built around the Virtuoso
database engines. It simply enhances existing collaboration tools
via the following capabilities:
Addition of Social Dimensions via HTTP based Data Object Identifiers for all Data Items
(if missing)
- Ability to integrate across a myriad of Data Source Types
rather than a select few across RDBM Engines, LDAP, Web Services, and
various HTTP accessible Resources (Hypermedia or Non Hypermedia
content types)
- Addition of FOAF+SSL based authentication
- Addition of FOAF+SSL based Access Control Lists (ACLs) for
policy based data access.
Related:
03/02/2010 15:47 GMT-0500 |
Modified: 03/03/2010 19:50
GMT-0500 |
OpenLink Virtuoso - Product Value Proposition Overiew
Situation Analysis
Since the beginning of the modern IT era, each period of
innovation has inadvertently introduced its fair share of Data Silos. The driving
force behind this anomaly remains an overemphasis on the role of
applications when selecting problem solutions. Unfortunately, most
solution selecting decision makers remain oblivious to the fact
that most applications are architecturally monolithic; i.e., they
fail to separate the following five layers that are critical to all
solutions:
- Data Unit (Datum or Data Object) Identity,
- Data Storage/Persistence,
- Data Access,
- Data Representation, and
- Data Presentation/Visualization.
The rise of the Internet, and its exponentially-growing
user-friendly enclave known as the World Wide Web, is bringing the intrinsic
costs of the monolithic application architecture anomaly to bear --
in manners unanticipated by many. For example, the emergence of
network-oriented solutions across the realms of Enterprise
2.0-based Collaboration and Web 2.0-based
Software-as-a-Service (SaaS), combined with the overarching
influence of Social Media, are producing more
heterogeneously-structured and disparately-located data sources
than people can effectively process.
As is often the case, a variety of problem and product monikers
have emerged for the data access and integration challenges
outlined above. Contemporary examples include Enterprise Information Integration, Master Data
Management, and Data Virtualization. Labeling aside, the
fundamental issues of the unresolved Data Integration challenge
boil down to the following:
- Data Model Heterogeneity
- Data Quality (Cleanliness)
- Semantic Variance across Contexts (e.g., weights and
measures).
Effectively solving today's data integration challenges requires
a move away from monolithic application architecture to
loosely-coupled, network-centric application architectures.
Basically, we need a ubiquitous network-centric application
protocol that lends itself to loosely-coupled across-the-wire
orchestration of data interactions. In short, this will be what
revitalizes the art of application development and deployment.
The World Wide Web is built around a network application
protocol called HTTP. This protocol intrinsically separates the
five layers listed earlier, thereby enabling:
- Use of Generic HTTP URIs as Data Object (Entity) Identifiers;
- Identifier Co-reference, such that multiple Data Object Identifiers may reference the
same Data Object;
- Use of the Entity-Attribute-Value Model to describe Data
Objects using real world modeling friendly conceptual graphs;
- Use of HTTP URLs to Identify Locations of Resources that bear
(host) Data Object Descriptions (Representations);
- Data Access mechanism for retrieving Data Object
Representations from persistent or transient storage
locations.
A uniquely designed to address today's escalating Data Access
and Integration challenges without compromising performance,
security, or platform independence. At its core lies an unrivaled
commitment to industry standards combined with unique technology
innovation that transcends erstwhile distinct realms such as:
When Virtuoso is installed and running, HTTP-based Data Objects
are automatically created as a by-product of its powerful data
virtualization, transcending data sources and data representation
formats. The benefits of such power extend across profiles such
as:
Product Benefits Summary
-
Enterprise Agility — Virtuoso lets you mix-&-match
best-of-class combinations of Operating Systems, Programming
Environments, Database Engines and Data-Access Middleware when
building or tweaking your IS infrastructure, without the typical
impedance of vendor-lock-in.
-
Data Model Dexterity — By supporting multiple protocols
and data models in a single product, Virtuoso protects you against
costly vulnerabilities such as: perennial acquisition and
accumulation of expensive data model specific DBMS products that
still operate on the fundamental principle of: proprietary
technology lock-in, at a time when heterogeneity continues to
intrinsically define the information technology landscape.
-
Cost-effectiveness — By providing a single point of
access (and single-sign-on, SSO) to a plethora of Web 2.0-style
social networks, Web Services, and Content Management Systems, and
by using Data Object Identifiers as units of Data Virtualization
that become the focal points of all data access, Virtuoso lowers
the cost to exploit emerging frontiers such as socially-enhanced
enterprise collaboration.
-
Speed of Exploitation — Virtuoso provides the ability to
rapidly assemble 360-degree conceptual views of data, across
internal line-of-business application (CRM, ERP, ECM, HR, etc.)
data and/or external data sources, whether these are unstructured,
semi-structured, or fully structured.
Bottom line, Virtuoso delivers unrivaled flexibility and
scalability, without compromising performance or security.
Related
02/26/2010 14:12 GMT-0500 |
Modified: 02/27/2010 12:46
GMT-0500 |
Re-introducing the Virtuoso Virtual Database Engine
In recent times a lot of the commentary and focus re. Virtuoso
has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked
is the sophisticated Virtual Database Engine that provides the
foundation for all of Virtuoso's data integration
capabilities.
In this post I provide a brief re-introduction to this essential
aspect of Virtuoso.
What is it?
This component of Virtuoso is known as the Virtual Database
Engine (VDBMS). It provides transparent high-performance and secure
access to disparate data sources that are external to Virtuoso. It
enables federated access and integration of data hosted by any
ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or
Document (Free Text)-oriented Content Management System. In
addition, it facilitates integration with Web Services
(SOAP-based SOA RPCs or REST-fully accessible Web Resources).
Why is it important?
In the most basic sense, you shouldn't need to upgrade your
existing database engine version simply because your current DBMS
and Data Access Driver combo isn't compatible with ODBC-compliant
desktop tools such as Microsoft Access, Crystal Reports,
BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications.
Simply place Virtuoso in front of your so-called "legacy database,"
and let it deliver the compliance levels sought by these tools
In addition, it's important to note that today's enterprise,
through application evolution, company mergers, or acquisitions, is
often faced with disparately-structured data residing in any number
of line-of-business-oriented data silos. Compounding the problem is
the exponential growth of user-generated data via new social
media-oriented collaboration tools and platforms. For companies to
cost-effectively harness the opportunities accorded by the
increasing intersection between line-of-business applications and
social media, virtualization of data silos must be achieved, and
this virtualization must be delivered in a manner that doesn't
prohibitively compromise performance or completely undermine
security at either the enterprise or personal level. Again, this is
what you get by simply installing Virtuoso.
How do I use it?
The VDBMS may be used in a variety of ways, depending on the
data access and integration task at hand. Examples include:
Relational Database Federation
You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA
connection to multiple ODBC- or JDBC-accessible RDBMS data sources,
concurrently, with the ability to perform intelligent distributed
joins against externally-hosted database tables. For instance, you
can join internal human resources data against internal sales and
external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come
from Ingres!
Conceptual Level Data Access using the RDF Model
You can construct RDF Model-based Conceptual Views atop
Relational Data Sources. This is about generating HTTP-based
Entity-Attribute-Value (E-A-V) graphs
using data culled "on the fly" from native or external data sources
(Relational Tables/Views, XML-based Web Services, or User Defined
Types).
You can also derive RDF Model-based Conceptual Views from Web
Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component)
enables you to generate RDF Model Linked Data via a RESTful Web
Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the
URL of a Web Resource in the FROM clause of a
SPARQL query).
It's important to note that Views take the form of HTTP links
that serve as both Data Source Names and Data Source Addresses.
This enables you to query and explore relationships across entities
(i.e., People, Places, and other Real World Things) via HTTP
clients (e.g., Web Browsers) or directly via SPARQL Query Language
constructs transmitted over HTTP.
Conceptual Level Data Access using ADO.NET Entity Frameworks
As an alternative to RDF, Virtuoso can expose ADO.NET Entity
Frameworks-based Conceptual Views over Relational Data Sources. It
achieves this by generating Entity Relationship graphs via its
native ADO.NET Provider, exposing all externally attached ODBC- and
JDBC-accessible data sources. In addition, the ADO.NET Provider
supports direct access to Virtuoso's native RDF database engine,
eliminating the need for resource intensive Entity Frameworks model
transformations.
Related
02/17/2010 16:38 GMT-0500 |
Modified: 02/17/2010 16:46
GMT-0500 |
What is the DBpedia Project? (Updated)
The recent Wikipedia imbroglio centered around
DBpedia is the fundamental driver for this
particular blog post. At time of writing this blog post,
the DBpedia project definition in Wikipedia
remains unsatisfactory due to the following shortcomings:
- inaccurate and incomplete definition of the Project's What,
Why, Who, Where, When, and How
- inaccurate reflection of project essence, by skewing focus
towards data extraction and data set dump
production, which is at best a quarter of the project.
Here are some insights on DBpedia, from the perspective of
someone intimately involved with the other three-quarters of the
project.
What is DBpedia?
A live Web accessible RDF model database (Quad
Store) derived from Wikipedia content snapshots, taken
periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently
HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot
(warehouse) and Live Editions (currently being hot-staged).
This post is about the snapshot (warehouse) edition, I'll drop a
different post about the DBpedia Live Edition where a new
Delta-Engine covers both extraction and database record
replacement, in realtime.
When was it Created?
As an idea under the moniker "DBpedia" it was conceptualized in
late 2006 by researchers at University of Leipzig (lead by Soren
Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of
DBpedia (as described above) was released in February 2007. The
official DBpedia coming out party occurred at WWW2007, Banff,
during the inaugural Linked Data gathering, where it
showcased the virtues and immense potential of TimBL's Linked Data meme.
Who's Behind It?
OpenLink Software (developers of OpenLink
Virtuoso and providers of Web Hosting
infrastructure), University of Leipzig, and Freie Univerity,
Berlin. In addition, there is a burgeoning community of
collaborators and contributors responsible DBpedia based
applications, cross-linked data sets, ontologies (OpenCyc,
SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia
wouldn't be possible without the global content contribution and
curation efforts of Wikipedians, a point typically overlooked
(albeit inadvertently).
How is it Constructed?
The steps are as follows:
- RDF data set dump preparation via Wikipedia content extraction
and transformation to RDF model data, using the N3 data
representation format - Java and PHP
extraction code produced and maintained by the teams at Leipzig and
Berlin
- Deployment of Linked Data that enables Data browsing and
exploration using any HTTP aware user agent (e.g. basic Web
Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the
Pubby Linked Data Server during the early months of the DBpedia
project)
- SPARQL compliant Quad Store, enabling direct access to database
records via SPARQL (Query language, REST or SOAP Web Service, plus
a variety of query results serialization formats) - OpenLink
Virtuoso since first public release of DBpedia
In a nutshell, there are four distinct and vital components to
DBpedia. Thus, DBpedia doesn't exist if all the project offered was
a collection of RDF data dumps. Likewise, it doesn't exist if you
have a SPARQL compliant Quad Store without loaded data sets, and of
course it doesn't exist if you have a fully loaded SPARQL compliant
Quad Store is up to the cocktail of challenges presented by live
Web accessibility.
Why is it Important?
It remains a live exemplar for any individual or organization
seeking to publishing or exploit HTTP based Linked Data on the
World Wide Web. Its existence continues to
stimulate growth in both density and quality of the burgeoning Web
of Linked Data.
How Do I Use it?
In the most basic sense, simply browse the HTML pages en route
to discovery erstwhile relationships that exist across named entities and subject
matter concepts / headings. Beyond that, simply look at DBpedia
as a master lookup table in a Web hosted distributed database setup; enabling you to
mesh your local domain specific details with DBpedia records via
structured relations (triples or 3-tuples records) comprised of
HTTP URIs from both realms e.g., owl:sameAs relations.
What Can I Use it For?
Expanding on the Master-Details point above, you can use its
rich URI corpus to alleviate tedium associated
with activities such as:
- List maintenance - e.g., Countries, States, Companies, Units of
Measurement, Subject Headings etc.
- Tagging - as a compliment to existing practices
- Analytical Research - you're only a LINK (URI) away from
erstwhile difficult to attain research data spread across a broad
range of topics
- Closed Vocabulary Construction - rather than commence the
futile quest of building your own closed vocabulary, simply
leverage Wikipedia's human curated vocabulary as our common
base.
Related
01/31/2010 17:45 GMT-0500 |
Modified: 01/31/2010 17:46
GMT-0500 |
What is the DBpedia Project? (Updated)
The recent Wikipedia imbroglio centered around
DBpedia is the fundamental driver for this
particular blog post. At time of writing this blog post,
the DBpedia project definition in Wikipedia
remains unsatisfactory due to the following shortcomings:
- inaccurate and incomplete definition of the Project's What,
Why, Who, Where, When, and How
- inaccurate reflection of project essence, by skewing focus
towards data
extraction and data set dump production, which is at best a quarter
of the project.
Here are some insights on DBpedia, from the perspective of
someone intimately involved with the other three-quarters of the
project.
What is DBpedia?
A live Web accessible RDF
model database (Quad Store) derived from Wikipedia content
snapshots, taken periodically. The RDF database underlies a
Linked Data Space comprised of: HTML (and most recently
HTML+RDFa) based data browser pages and a SPARQL endpoint.
Note: DBpedia 3.4 now exists in snapshot
(warehouse) and Live Editions (currently being hot-staged).
This post is about the snapshot (warehouse) edition, I'll drop a
different post about the DBpedia Live Edition where a new
Delta-Engine covers both extraction and database record
replacement, in realtime.
When was it Created?
As an idea under the moniker "DBpedia" it was conceptualized in
late 2006 by researchers at University of Leipzig (lead by Soren
Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of
DBpedia (as described above) was released in February 2007. The
official DBpedia coming out party occurred at WWW2007, Banff,
during the inaugural Linked Data gathering, where it
showcased the virtues and immense potential of TimBL's Linked Data meme.
Who's Behind It?
OpenLink Software (developers of OpenLink
Virtuoso and providers of Web Hosting
infrastructure), University of Leipzig, and Freie Univerity,
Berlin. In addition, there is a burgeoning community of
collaborators and contributors responsible DBpedia based
applications, cross-linked data sets, ontologies (OpenCyc,
SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia
wouldn't be possible without the global content contribution and
curation efforts of Wikipedians, a point typically overlooked
(albeit inadvertently).
How is it Constructed?
The steps are as follows:
- RDF data set dump preparation via Wikipedia content extraction
and transformation to RDF model data, using the N3 data
representation format - Java and PHP
extraction code produced and maintained by the teams at Leipzig and
Berlin
- Deployment of Linked Data that enables Data browsing and
exploration using any HTTP aware user agent (e.g. basic Web
Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the
Pubby Linked Data Server during the early months of the DBpedia
project)
- SPARQL compliant Quad Store, enabling direct access to database
records via SPARQL (Query language, REST or SOAP Web Service, plus
a variety of query results serialization formats) - OpenLink
Virtuoso since first public release of DBpedia
In a nutshell, there are four distinct and vital components to
DBpedia. Thus, DBpedia doesn't exist if all the project offered was
a collection of RDF data dumps. Likewise, it doesn't exist without
a fully populated SPARQL compliant Quad Store. Last but not least,
it doesn't exist if you have a fully loaded SPARQL compliant Quad
Store isn't up to the cocktail of challenges (query load and
complexity) presented by live Web database accessibility.
Why is it Important?
It remains a live exemplar for any individual or organization
seeking to publishing or exploit HTTP based Linked Data on the
World Wide Web. Its existence continues to
stimulate growth in both density and quality of the burgeoning Web
of Linked Data.
How Do I Use it?
In the most basic sense, simply browse the HTML based resource
decriptor pages en route to discovering erstwhile undiscovered
relationships that exist across named entities and subject
matter concepts / headings. Beyond that, simply look at DBpedia
as a master lookup table in a Web hosted distributed database setup; enabling you to
mesh your local domain specific details with DBpedia records via
structured relations (triples or 3-tuples records), comprised of
HTTP URIs from both realms e.g., via owl:sameAs relations.
What Can I Use it For?
Expanding on the Master-Details point above, you can use its
rich URI corpus to alleviate tedium associated
with activities such as:
- List maintenance - e.g., Countries, States, Companies, Units of
Measurement, Subject Headings etc.
- Tagging - as a compliment to existing practices
- Analytical Research - you're only a LINK (URI) away from
erstwhile difficult to attain research data spread across a broad
range of topics
- Closed Vocabulary Construction - rather than commence the
futile quest of building your own closed vocabulary, simply
leverage Wikipedia's human curated vocabulary as our common
base.
Related
01/31/2010 17:43 GMT-0500 |
Modified: 09/15/2010 18:10
GMT-0500 |
Personal and/or Service Specific Linked Data Spaces in the Cloud: DBpedia 3.4
We have just released an Amazon EC2 based public Snapshot of
DBpedia 3.4. Thus, you can now instantiate a
personal and/or service specific variant of the DBpedia 3.4
Linked Data Space. Basically, you can replicate what we
host, within minutes (as opposed to days). In addition, you no
longer need to squabble --on an unpredictable basis with others--
for the infrastructure resources behind DBpedia's public instance,
when using the SPARQL Endpoint, Faceted Search & Find
Services, or HTML Browser Pages etc.
How Does It work?
-
Instantiate a Virtuoso EC2 AMI (paid
variety, which is aggressively priced at $49.99 for setup and
$19.99 per month thereafter)
-
Mount the shared DBpedia 3.4 public
snapshot
- Start Virtuoso Server
- Start exploiting the DBpedia Linked Data Space.
What Interfaces are exposed?
- SPARQL Endpoint
- Linked Data Viewer Pages (as you see in the public DBpedia
instance)
-
Faceted Search & Find UI and Web
Services (REST or SOAP)
- All the inference rules for UMBEL, SUMO, YAGO, OpenCYC,
and DBpedia-OWL data
dictionaries
- Type Correlations Between DBpedia and Freebase
Enjoy!
11/16/2009 13:17 GMT-0500 |
Modified: 11/16/2009 13:30
GMT-0500 |
|
|