Details

OpenLink Software
Burlington, United States

Subscribe

Post Categories

Recent Articles

Community Member Blogs

Display Settings

articles per page.
order.

Translate

Showing posts in all categories RefreshRefresh
What is the DBpedia Project? (Updated) [ Kingsley Uyi Idehen ]

The recent Wikipedia imbroglio centered around DBpedia is the fundamental driver for this particular blog post. At time of writing this blog post, the DBpedia project definition in Wikipedia remains unsatisfactory due to the following shortcomings:

  1. inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
  2. inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.

Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.

What is DBpedia?

A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.

Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.

When was it Created?

As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.

Who's Behind It?

OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).

How is it Constructed?

The steps are as follows:

  1. RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
  2. Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
  3. SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia

In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.

Why is it Important?

It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.

How Do I Use it?

In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.

What Can I Use it For?

Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:

  1. List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
  2. Tagging - as a compliment to existing practices
  3. Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
  4. Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.

Related

# PermaLink Comments [0]
11/22/2009 00:28 GMT Modified: 11/23/2009 07:05 GMT
What is the DBpedia Project? (Updated) [ Kingsley Uyi Idehen ]

The recent Wikipedia imbroglio centered around DBpedia is the fundamental driver for this particular blog post. At time of writing this blog post, the DBpedia project definition in Wikipedia remains unsatisfactory due to the following shortcomings:

  1. inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
  2. inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.

Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.

What is DBpedia?

A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.

Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.

When was it Created?

As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.

Who's Behind It?

OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).

How is it Constructed?

The steps are as follows:

  1. RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
  2. Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
  3. SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia

In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.

Why is it Important?

It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.

How Do I Use it?

In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.

What Can I Use it For?

Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:

  1. List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
  2. Tagging - as a compliment to existing practices
  3. Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
  4. Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.

Related

# PermaLink Comments [0]
11/22/2009 00:28 GMT Modified: 11/23/2009 07:05 GMT
What is the DBpedia Project? (Updated) [ Kingsley Uyi Idehen ]

The recent Wikipedia imbroglio centered around DBpedia is the fundamental driver for this particular blog post. At time of writing this blog post, the DBpedia project definition in Wikipedia remains unsatisfactory due to the following shortcomings:

  1. inaccurate and incomplete definition of the Project's What, Why, Who, Where, When, and How
  2. inaccurate reflection of project essence, by skewing focus towards data extraction and data set dump production, which is at best a quarter of the project.

Here are some insights on DBpedia, from the perspective of someone intimately involved with the other three-quarters of the project.

What is DBpedia?

A live Web accessible RDF model database (Quad Store) derived from Wikipedia content snapshots, taken periodically. The RDF database underlies a Linked Data Space comprised of: HTML (and most recently HTML+RDFa) based data browser pages and a SPARQL endpoint.

Note: DBpedia 3.4 now exists in snapshot (warehouse) and Live Editions (currently being hot-staged). This post is about the snapshot (warehouse) edition, I'll drop a different post about the DBpedia Live Edition where a new Delta-Engine covers both extraction and database record replacement, in realtime.

When was it Created?

As an idea under the moniker "DBpedia" it was conceptualized in late 2006 by researchers at University of Leipzig (lead by Soren Auer) and Freie University, Berlin (lead by Chris Bizer). The first public instance of DBpedia (as described above) was released in February 2007. The official DBpedia coming out party occurred at WWW2007, Banff, during the inaugural Linked Data gathering, where it showcased the virtues and immense potential of TimBL's Linked Data meme.

Who's Behind It?

OpenLink Software (developers of OpenLink Virtuoso and providers of Web Hosting infrastructure), University of Leipzig, and Freie Univerity, Berlin. In addition, there is a burgeoning community of collaborators and contributors responsible DBpedia based applications, cross-linked data sets, ontologies (OpenCyc, SUMO, UMBEL, and YAGO) and other utilities. Finally, DBpedia wouldn't be possible without the global content contribution and curation efforts of Wikipedians, a point typically overlooked (albeit inadvertently).

How is it Constructed?

The steps are as follows:

  1. RDF data set dump preparation via Wikipedia content extraction and transformation to RDF model data, using the N3 data representation format - Java and PHP extraction code produced and maintained by the teams at Leipzig and Berlin
  2. Deployment of Linked Data that enables Data browsing and exploration using any HTTP aware user agent (e.g. basic Web Browsers) - handled by OpenLink Virtuoso (handled by Berlin via the Pubby Linked Data Server during the early months of the DBpedia project)
  3. SPARQL compliant Quad Store, enabling direct access to database records via SPARQL (Query language, REST or SOAP Web Service, plus a variety of query results serialization formats) - OpenLink Virtuoso since first public release of DBpedia

In a nutshell, there are four distinct and vital components to DBpedia. Thus, DBpedia doesn't exist if all the project offered was a collection of RDF data dumps. Likewise, it doesn't exist if you have a SPARQL compliant Quad Store without loaded data sets, and of course it doesn't exist if you have a fully loaded SPARQL compliant Quad Store is up to the cocktail of challenges presented by live Web accessibility.

Why is it Important?

It remains a live exemplar for any individual or organization seeking to publishing or exploit HTTP based Linked Data on the World Wide Web. Its existence continues to stimulate growth in both density and quality of the burgeoning Web of Linked Data.

How Do I Use it?

In the most basic sense, simply browse the HTML pages en route to discovery erstwhile relationships that exist across named entities and subject matter concepts / headings. Beyond that, simply look at DBpedia as a master lookup table in a Web hosted distributed database setup; enabling you to mesh your local domain specific details with DBpedia records via structured relations (triples or 3-tuples records) comprised of HTTP URIs from both realms e.g., owl:sameAs relations.

What Can I Use it For?

Expanding on the Master-Details point above, you can use its rich URI corpus to alleviate tedium associated with activities such as:

  1. List maintenance - e.g., Countries, States, Companies, Units of Measurement, Subject Headings etc.
  2. Tagging - as a compliment to existing practices
  3. Analytical Research - you're only a LINK (URI) away from erstwhile difficult to attain research data spread across a broad range of topics
  4. Closed Vocabulary Construction - rather than commence the futile quest of building your own closed vocabulary, simply leverage Wikipedia's human curated vocabulary as our common base.

Related

# PermaLink Comments [0]
11/22/2009 00:28 GMT Modified: 11/23/2009 07:05 GMT
5 Very Important Things to Note about HTTP based Linked Data [ Kingsley Uyi Idehen ]
  1. It isn't World Wide Web Specific (HTTP != World Wide Web)
  2. It isn't Open Data Specific
  3. It isn't about "Free" (Beer or Speech)
  4. It isn't about Markup (so don't expect to grok it via "markup first" approach)
  5. It's about Hyperdata - the use of HTTP and REST to deliver a powerful platform agnostic mechanism for Data Reference, Access, and Integration.

When trying to understand HTTP based Linked Data, especially if you're well versed in DBMS technology use (User, Power User, Architect, Analyst, DBA, or Programmer) think:

  • Open Database Connectivity (ODBC) without operating system, data model, or wire-protocol specificity or lock-in potential
  • Java Database Connectivity (JDBC) without programming language specificity
  • ADO.NET without .NET runtime specificity and .NET bound language specificity
  • OLE-DB without Windows operating system & programming language specificity
  • XMLA without XML format specificity - with Tabular and Multidimensional results formats expressible in a variety of data representation formats.
  • All of the above scoped to the Record rather than Container level, with Generic HTTP scheme URIs associated with each Record, Field, and Field value (optionally)

Remember the need for Data Access & Integration technology is the by product of the following realities:

  1. Human curated data is ultimately dirty, because:
    • our thick thumbs, inattention, distractions, and general discomfort with typing, make typos prevalent
    • database engines exist for a variety of data models - Graph, Relational, Hierarchical;
    • within databases you have different record container/partition names e.g. Table Names;
    • within a database record container you have records that are really aspects of the same thing (different keys exist in a plethora of operational / line of business systems that expose aspects of the same entity e.g., customer data that spans Accounts, CRM, ERP application databases);
    • different field names (one database has "EMP" while another has "Employee") for the same record
    • .
  2. Units of measurement is driven by locale, the UK office wants to see sales in Pounds Sterling while the French office prefers Euros etc.
  3. All of the above is subject to context halos which can be quite granular re. sensitivity e.g. staff travel between locations that alter locales and their roles; basically, profiles matters a lot.

Related

# PermaLink Comments [2]
11/19/2009 14:49 GMT Modified: 11/23/2009 07:04 GMT
5 Very Important Things to Note about HTTP based Linked Data [ Kingsley Uyi Idehen ]
  1. It isn't World Wide Web Specific (HTTP != World Wide Web)
  2. It isn't Open Data Specific
  3. It isn't about "Free" (Beer or Speech)
  4. It isn't about Markup (so don't expect to grok it via "markup first" approach)
  5. It's about Hyperdata - the use of HTTP and REST to deliver a powerful platform agnostic mechanism for Data Reference, Access, and Integration.

When trying to understand HTTP based Linked Data, especially if you're well versed in DBMS technology use (User, Power User, Architect, Analyst, DBA, or Programmer) think:

  • Open Database Connectivity (ODBC) without operating system, data model, or wire-protocol specificity or lock-in potential
  • Java Database Connectivity (JDBC) without programming language specificity
  • ADO.NET without .NET runtime specificity and .NET bound language specificity
  • OLE-DB without Windows operating system & programming language specificity
  • XMLA without XML format specificity - with Tabular and Multidimensional results formats expressible in a variety of data representation formats.
  • All of the above scoped to the Record rather than Container level, with Generic HTTP scheme URIs associated with each Record, Field, and Field value (optionally)

Remember the need for Data Access & Integration technology is the by product of the following realities:

  1. Human curated data is ultimately dirty, because:
    • our thick thumbs, inattention, distractions, and general discomfort with typing, make typos prevalent
    • database engines exist for a variety of data models - Graph, Relational, Hierarchical;
    • within databases you have different record container/partition names e.g. Table Names;
    • within a database record container you have records that are really aspects of the same thing (different keys exist in a plethora of operational / line of business systems that expose aspects of the same entity e.g., customer data that spans Accounts, CRM, ERP application databases);
    • different field names (one database has "EMP" while another has "Employee") for the same record
    • .
  2. Units of measurement is driven by locale, the UK office wants to see sales in Pounds Sterling while the French office prefers Euros etc.
  3. All of the above is subject to context halos which can be quite granular re. sensitivity e.g. staff travel between locations that alter locales and their roles; basically, profiles matters a lot.

Related

# PermaLink Comments [2]
11/19/2009 14:49 GMT Modified: 11/23/2009 07:04 GMT
5 Game Changing Things about the OpenLink Virtuoso + AWS Cloud Combo [ Kingsley Uyi Idehen ]

Here are 5 powerful benefits you can immediately derive from the combination of Virtuoso and Amazon's AWS services (specifically the EC2 and EBS components):

  1. Acquire your own personal or service specific data space in the Cloud. Think DBase, Paradox, FoxPRO, Access of yore, but with the power of Oracle, Informix, Microsoft SQL Server etc.. using a Conceptual, as opposed to solely Logical, model based DBMS (i.e., a Hybrid DBMS Engine for: SQL, RDF, XML, and Full Text)
  2. Ability to share and control access to your resources using innovations like FOAF+SSL, OpenID, and OAuth, all from one place
  3. Construction of personal or organization based FOAF profiles in a matter of minutes; by simply creating a basic DBMS (or ODS application layer) account; and then using this profile to create strong links (references) to all your Data silos (esp. those from the Web 2.0 realm)
  4. Load data sets from the LOD cloud or Sponge existing Web resources (i.e., on the fly data transformation to RDF model based Linked Data) and then use the combination to build powerful lookup services that enrich the value of URLs (think: Web addressable reports holding query results) that you publish
  5. Bind all of the above to a domain that you own (e.g. a .Name domain) so that you have an attribution-friendly "authority" component for resource URLs and Entity URIs published from your Personal Linked Data Space on the Web (or private HTTP network).

In a nutshell, the AWS Cloud infrastructure simplifies the process of generating Federated presence on the Internet and/or World Wide Web. Remember, centralized networking models always end up creating data silos, in some context, ultimately! :-)

# PermaLink Comments [0]
11/18/2009 14:12 GMT Modified: 11/19/2009 15:20 GMT
5 Game Changing Things about the OpenLink Virtuoso + AWS Cloud Combo [ Kingsley Uyi Idehen ]

Here are 5 powerful benefits you can immediately derive from the combination of Virtuoso and Amazon's AWS services (specifically the EC2 and EBS components):

  1. Acquire your own personal or service specific data space in the Cloud. Think DBase, Paradox, FoxPRO, Access of yore, but with the power of Oracle, Informix, Microsoft SQL Server etc.. using a Conceptual, as opposed to solely Logical, model based DBMS (i.e., a Hybrid DBMS Engine for: SQL, RDF, XML, and Full Text)
  2. Ability to share and control access to your resources using innovations like FOAF+SSL, OpenID, and OAuth, all from one place
  3. Construction of personal or organization based FOAF profiles in a matter of minutes; by simply creating a basic DBMS (or ODS application layer) account; and then using this profile to create strong links (references) to all your Data silos (esp. those from the Web 2.0 realm)
  4. Load data sets from the LOD cloud or Sponge existing Web resources (i.e., on the fly data transformation to RDF model based Linked Data) and then use the combination to build powerful lookup services that enrich the value of URLs (think: Web addressable reports holding query results) that you publish
  5. Bind all of the above to a domain that you own (e.g. a .Name domain) so that you have an attribution-friendly "authority" component for resource URLs and Entity URIs published from your Personal Linked Data Space on the Web (or private HTTP network).

In a nutshell, the AWS Cloud infrastructure simplifies the process of generating Federated presence on the Internet and/or World Wide Web. Remember, centralized networking models always end up creating data silos, in some context, ultimately! :-)

# PermaLink Comments [0]
11/18/2009 14:12 GMT Modified: 11/19/2009 15:20 GMT
Personal and/or Service Specific Linked Data Spaces in the Cloud: DBpedia 3.4 [ Kingsley Uyi Idehen ]

We have just released an Amazon EC2 based public Snapshot of DBpedia 3.4. Thus, you can now instantiate a personal and/or service specific variant of the DBpedia 3.4 Linked Data Space. Basically, you can replicate what we host, within minutes (as opposed to days). In addition, you no longer need to squabble --on an unpredictable basis with others-- for the infrastructure resources behind DBpedia's public instance, when using the SPARQL Endpoint, Faceted Search & Find Services, or HTML Browser Pages etc.

How Does It work?

  1. Instantiate a Virtuoso EC2 AMI (paid variety, which is aggressively priced at $49.99 for setup and $19.99 per month thereafter)
  2. Mount the shared DBpedia 3.4 public snapshot
  3. Start Virtuoso Server
  4. Start exploiting the DBpedia Linked Data Space.

What Interfaces are exposed?

  1. SPARQL Endpoint
  2. Linked Data Viewer Pages (as you see in the public DBpedia instance)
  3. Faceted Search & Find UI and Web Services (REST or SOAP)
  4. All the inference rules for UMBEL, SUMO, YAGO, OpenCYC, and DBpedia-OWL data dictionaries
  5. Type Correlations Between DBpedia and Freebase

Enjoy!

# PermaLink Comments [0]
11/16/2009 13:17 GMT Modified: 11/16/2009 13:30 GMT
Conversation with Jon Udell: Are We There Yet Re. Web++ ? [ Kingsley Uyi Idehen ]

Personally, I believe that we've actually reached a watershed moment re. the evolution of the Web from a mesh of Linked Data Containers (Web of Linked Documents) to a mesh of Linked Data Items (entities or real world objects).

The journey towards this watershed moment started with the Semantic Web Project, gained focus and pragmatism via the Linked Data meme, attained substance & credibility via efforts such as DBpedia and the resulting cloud of Open Linked Data Spaces, and finally arrived at the most important destination of all: broad comprehension and coherence, via RDFa.

Over the years, I've chronicled the journey above via entries in this particular data space (my blog) and most recently, via my rapid-fire comments and debates on Twitter (basically hastag #linkeddata account: kidehen).

On a parallel front re. my chronicles, I've periodically had conversations with Jon Udell, who has always provided a coherent sounding board and reconciliation framework for my world views and open data access vision; naturally, this has a lot to do with his holistic grasp of the big picture issues, associated technical details, and special communication prowess :-)

Against this backdrop, I refer you to my most recent podcast conversation with Jon, which is about how the tandem of HTML+RDFa and the GoodRelations vocabulary deliver the critical missing links re. broad comprehension of the Semantic Web vision en route to mass exploitation.

Related

# PermaLink Comments [0]
09/10/2009 11:03 GMT Modified: 09/10/2009 11:32 GMT
The URI, URL, and Linked Data Meme's Generic HTTP URI (Updated) [ Kingsley Uyi Idehen ]

Situation Analysis

As the "Linked Data" meme has gained momentum you've more than likely been on the receiving end of dialog with Linked Open Data community members (myself included) that goes something like this:

"Do you have a URI", "Get yourself a URI", "Give me a de-referencable URI" etc..

And each time, you respond with a URL -- which to the best of your Web knowledge is a bona fide URI. But to your utter confusion you are told: Nah! You gave me a Document URI instead of the URI of a real-world thing or object etc..

What's up with that?

Well our everyday use of the Web is an unfortunate conflation of two distinct things, which have Identity: Real World Objects (RWOs) & Address/Location of Documents (Information bearing Resources).

The "Linked Data" meme is about enhancing the Web by unobtrusively reintroducing its core essence: the generic HTTP URI, a vital piece of Web Architecture DNA. Basically, its about so realizing the full capabilities of the Web as a platform for Open Data Identification, Definition, Access, Storage, Representation, Presentation, and Integration.

What is a Real World Object?

People, Places, Music, Books, Cars, Ideas, Emotions etc..

What is a URI?

A Uniform Resource Identifier. A global identifier mechanism for network addressable data items. Its sole function is Name oriented Identification.

URI Generic Syntax

The constituent parts of a URI (from URI Generic Syntax RFC) are depicted below: Image

What is a URL?

A location oriented HTTP scheme based URI. The HTTP scheme introduces a powerful and inherent duality that delivers:

  1. Resource Address/Location Identifier
  2. Data Access mechanism for an Information bearing Resource (Document, File etc..)

So far so good!

What is an HTTP based URI?

The kind of URI Linked Data aficionados mean when they use the term: URI.

An HTTP URI is an HTTP scheme based URI. Unlike a URL, this kind of HTTP scheme URI is devoid of any Web Location orientation or specificity. Thus, Its inherent duality provides a more powerful level of abstraction. Hence, you can use this form of URI to assign Names/Identifiers to Real World Objects (RWO). Even better, courtesy of the Identity/Address duality of the HTTP scheme, a single URI can deliver the following:

  1. RWO Identfier/Name
  2. RWO Metadata document Locator (courtesy of URL aspect)
  3. Negotiable Representation of the Located Document (courtesy of HTTP's content negotiation feature).

What is Metadata?

Data about Data. Put differently, data that describes other data in a structured manner.

How Do we Model Metadata?

The predominant model for metadata is the Entity-Attribute-Value + Classes & Relationships model (EAV/CR). A model that's been with us since the inception of modern computing (long before the Web).

What about RDF?

The Resource Description Framework (RDF) is a framework for describing Web addressable resources. In a nutshell, its a framework for adding Metadata bearing Information Resources to the current Web. Its comprised of:

  1. Entity-Attribute-Value (aka. Subject-Predictate-Object) plus Classes & Relationships (Data Dictionaries e.g., OWL) metadata model
  2. A plethora of instance data representation formats that include: RDFa (when doing so within (X)HTML docs), Turtle, N3, TriX, RDF/XML etc.

What's the Problem Today?

The ubiquitous use of the Web is primarily focused on a Linked Mesh of Information bearing Documents. URLs rather than generic HTTP URIs are the prime mechanism for Web tapestry; basically, we use URLs to conduct Information -- which is inherently subjective -- instead of using HTTP URIs to conduct "Raw Data" -- which is inherently objective.

Note: Information is "data in context", it isn't the same thing as "Raw Data". Thus, if we can link to Information via the Web, why shouldn't we be able to do the same for "Raw Data"?

How Does the Link Data meme solve the problem?

The meme simply provides a set of guidelines (best practices) for producing Web architecture friendly metadata. Meaning: when producing EAV/CR model based metadata, endow Subjects, their Attributes, and Attribute Values (optionally) with HTTP URIs. By doing so, a new level of Link Abstraction on the Web is possible i.e., "Data Item to Data Item" level links (aka hyperdata links). Even better, when you de-reference a RWO hyperdata link you end up with a negotiated representations of its metadata.

Conclusion

Linked Data is ultimately about an HTTP URI for each item in the Data Organization Hierarchy :-)

Related

  1. History of how "Resource" became part of URI - historic account by TimBL
  2. Linked Data Design Issues Document - TimBL's initial Linked Data Guide
  3. Linked Data Rules Simplified - My attempt at simplifying the Linked Data Meme without SPARQL & RDF distraction
  4. Linked Data & Identity - another related post
  5. The Linked Data Meme's Value Proposition
  6. My Del.icio.us hosted Bookmark Data Space for Identity Schemes
  7. TimBL's Ted Talk re. "Raw Linked Data".
# PermaLink Comments [2]
08/07/2009 14:34 GMT Modified: 10/07/2009 08:02 GMT
 <<     | 1 | 2 | 3 |     >>
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform