Virtuoso Open-Source Wiki
Virtuoso Open-Source, OpenLink Data Spaces, and OpenLink Ajax Toolkit
Advanced Search
Help?
Location: / Dashboard / Main / VirtTipsAndTricksGuide / VirtTipsAndTricksGuideRandomSampleAllTriples

What is best method to get a random sample of all triples for a subset of all the resources of a SPARQL endpoint?

The best method to get a random sample of all triples for a subset of all the resources of a SPARQL endpoint, is decimation in its original style:

SELECT ?s ?p ?o 
FROM <some-graph>
WHERE 
  { 
    ?s ?p ?o .
    FILTER ( 1 > bif:rnd (10, ?s, ?p, ?o) )
  }

By tweaking first argument of bif:rnd() and the left side of the inequality you can tweak decimation ratio from 1/10 to the desired value. What's important is to know that the SQL optimizer has a right to execute bif:rnd (10) only once at the beginning of the query, so we had to pass additional three arguments that can be known only when a table row is fetched so bif:rnd (10, ?s, ?p, ?o) is calculated for every row and thus any given row is either returned or ignored independently from others.

However, bif:rnd (10, ?s, ?p, ?o) contains a subtle inefficiency. In RDF store, graph nodes are stored as numeric IRI IDs and literal objects can be stored in a separate table. The call of an SQL function needs arguments of traditional SQL datatypes, so the query processor will extract the text of IRI for each node and the full value for each literal object. That is significant waste of time. The workaround is:

SPARQL 
SELECT ?s ?p ?o 
FROM <some-graph> 
WHERE 
  { 
    ?s ?p ?o .
    FILTER ( 1>  <SHORT_OR_LONG::bif:rnd>  (10, ?s, ?p, ?o))  
  }

This tells the SPARQL front-end to omit redundant conversions of values.

Live Example

The following SPARQL Query shows random occurrences of dc:description via LOD instance:

SELECT * 
WHERE 
  {
    ?s <http://purl.org/dc/elements/1.1/description> ?o
    FILTER ( 1 > <SHORT_OR_LONG::bif:rnd>  (10, ?s,  ?o))  
  }
limit 100

View the results of the query execution here.

Related

Powered By Virtuoso