OpenLink Logo

Virtuoso Knowledge Graph Guide

Building and Using Knowledge Graphs via Virtuoso

Virtuoso provides an unrivalled platform for secure, high-performance Knowledge Graph construction. Its capabilities are deceptively simple in both creation and use, while addressing the needs of a wide range of user and operator profiles.

End Users

The following steps get you started quickly:

  1. Install Virtuoso using the native installers for Windows, macOS, or Linux, Docker Container, Nexus Repository, or a cloud virtual machine (AWS, Azure, or GCP).
  2. Start the Virtuoso server via the control panels (available for Windows and macOS) or from the command line on Linux, macOS, or Windows.
  3. Populate your Knowledge Graph using any of the following options:
    • Large Language Models (LLMs)
    • ODBC
    • Sponger Middleware Service
    • Mounted WebDAV DET (Dynamic Extension Type) folders
    • SPARQL queries leveraging the Sponger Middleware Service

Large Language Models (LLMs)

This approach assumes that you have installed and configured both the OpenLink AI Layer (OPAL) module and the Virtuoso Data Connectivity Kit for your platform as part of the installation process.

  1. Obtain an API key for your target LLM.
  2. Start the Virtuoso server.
  3. Navigate to the OPAL endpoint, for example: https://{CNAME}/chat; there's also a live instance at http://linkeddata.uriburner.com/chat.
  4. Provide your target LLM’s API key.
  5. Enable the Data Twingler Agent configuration from the OPAL UI.
  6. Issue the prompt: “With sponging enabled, provide a starting point for exploring the knowledge graph at https://virtuoso.openlinksw.com.”
  7. Repeat for other HTTP-accessible documents of interest.
  8. Ask OPAL to explore the Knowledge Graphs associated with the current Virtuoso instance.

Virtuoso Knowledge Graph Generation via ODBC Data Source Connections

This method assumes that you have installed the Virtuoso Data Connectivity Kit for your platform.

  1. Start the Virtuoso server.
  2. Use an ODBC- or JDBC-based application to connect to Virtuoso via Data Source Names (DSNs) installed with the Virtuoso ODBC drivers.
  3. Identify available ODBC DSNs for external databases.
  4. Use the Virtual Database Manager to attach tables from external databases using their respective DSNs.
  5. Use the RDF-based Linked Data Wizard in the Virtuoso Conductor (http://{CNAME}/conductor, e.g., http://localhost:8890/conductor) to guide you from table attachment through to generating virtual Knowledge Graphs that are deployed using Linked Data principles.

Virtuoso Knowledge Graph Generation via Sponger Middleware

This approach assumes that the Virtuoso Sponger Middleware modules (including transformer and meta cartridges for various document types—HTML enabled by default) are installed.

  1. Start the Virtuoso server.
  2. Identify documents whose content you want to transform into Knowledge Graph data.
  3. Use the following URL pattern in your browser to perform a best-effort transformation of a target document: http://{CNAME}/about/html/{document-url}
    Example: http://localhost:8890/about/html/http/virtuoso.openlinksw.com
    Live Instance Example: https://linkeddata.uriburner.com/about/html/http/virtuoso.openlinksw.com
  4. Repeat for other documents of interest.
  5. Begin using your newly generated Knowledge Graphs.

Virtuoso Knowledge Graph Generation via WebDAV DET (Dynamic Extension Type) Folders

This method assumes that the Virtuoso Briefcase module is installed.

  1. Start the Virtuoso server.
  2. Open the Briefcase interface at: http://{CNAME}/DAV/home/{username}/ or if you have an OpenLink Account: https://my.openlinksw.com/DAV/home/{username} with a public folder at https://my.openlinksw.com/DAV/home/{username}/Public.
  3. Create an RDF Sink DET folder, specifying the named graph identifier for the generated Knowledge Graph.
  4. Mount Virtuoso into your local operating system using its built-in WebDAV mounting functionality.
  5. Identify documents whose content you want to transform into Knowledge Graph data.
  6. Drag and drop the selected documents into your mounted WebDAV folder.
  7. Begin using your newly generated Knowledge Graphs via ODBC, JDBC, or SPARQL.

SPARQL Queries Leveraging the Sponger Middleware Service

This approach assumes that the Virtuoso Sponger Middleware modules are installed.

  1. Identify documents whose content you want to transform into Knowledge Graph data.
  2. Use the document URL(s) as Named Graph IRIs in the FROM clause of your SPARQL query to perform a best-effort transformation.

Example SPARQL query:

SPARQL
DEFINE get:soft "soft"  # Enables HTTP crawling as part of query execution
SELECT ?type (COUNT(*) AS ?entityCount) (SAMPLE(?entity) AS ?sampleEntity)
WHERE {
    GRAPH ?g {
        ?entity a ?type .
    }
}
GROUP BY ?type
ORDER BY DESC(?entityCount)

From an ODBC-compliant application, you can execute the SQL/SPARQL hybrid variant below, as Virtuoso natively supports both SQL and SPARQL:

SQL
SPARQL
DEFINE get:soft "soft"
SELECT ?type (COUNT(*) AS ?entityCount) (SAMPLE(?entity) AS ?sampleEntity)
FROM <https://virtuoso.openlinksw.com/>
WHERE {
    ?entity a ?type .
}
GROUP BY ?type
ORDER BY DESC(?entityCount)

Or embedded within SQL:

SQL
SELECT X.type, X.sampleEntity, X.entityCount
FROM (
    SPARQL
    DEFINE get:soft "soft"
    SELECT ?type (COUNT(*) AS ?entityCount) (SAMPLE(?entity) AS ?sampleEntity)
    FROM <https://virtuoso.openlinksw.com/>
    WHERE {
        ?entity a ?type .
    }
    GROUP BY ?type
    ORDER BY DESC(?entityCount)
) AS X
  1. Repeat for additional documents or add more FROM clauses for new target URLs.
  2. Begin using your generated Knowledge Graphs, for example:
    • Via Virtuoso ODBC DSNs in analytics and business intelligence tools
    • By embedding entity hyperlinks in communications, enabling recipients to click through and explore linked Knowledge Graph data

Developers

HTTP, ODBC, or JDBC provide direct access to Virtuoso through its native protocol support. The following guides demonstrate how to connect and interact with Virtuoso's RDF graph store using popular programming languages and libraries. All end-user workflows described above can be repeated programmatically using any of these supported protocols.

Java

Jena

This guide provides a focused walkthrough for integrating Virtuoso with Apache Jena. It covers project setup, connecting to the database, and performing essential data operations.

1. Dependency Management (Maven)

Add the official Virtuoso Jena Provider and JDBC driver artifacts from Maven Central to your pom.xml.

XML
<!-- For Jena 4.3.x -->
<dependency>
    <groupId>com.openlinksw</groupId>
    <artifactId>virt_jena_v4_3</artifactId>
    <version>1.35</version>
</dependency>
<dependency>
    <groupId>com.openlinksw</groupId>
    <artifactId>virtjdbc4_3</artifactId>
    <version>3.123</version>
</dependency>
2. Establishing a Connection

Use a standard JDBC connection string to instantiate a VirtGraph, which represents a connection to a named graph in Virtuoso. The log_enable=2 parameter is recommended for optimal write performance.

Java
import virtuoso.jena.driver.*;
import org.apache.jena.rdf.model.Model;

String url = "jdbc:virtuoso://localhost:1111?charset=UTF-8&log_enable=2";
String graphName = "http://example.org/my-graph";

// Create a VirtGraph instance connected to a specific named graph
VirtGraph set = new VirtGraph(graphName, url, "dba", "dba");

// Wrap the graph with the Jena Model API for easy interaction
Model model = new VirtModel(set);

// The 'model' is now ready for Jena API operations
// Remember to close the graph when finished: set.close();
3. Executing a SPARQL SELECT Query

Use the VirtuosoQueryExecutionFactory to run SPARQL queries directly against the Virtuoso engine.

Java
import org.apache.jena.query.*;

String sparqlQuery = "SELECT ?s ?p ?o FROM <http://example.org/my-graph> WHERE { ?s ?p ?o } LIMIT 10";

// Use try-with-resources for automatic cleanup
try (VirtGraph set = new VirtGraph(url, "dba", "dba");
     QueryExecution vqe = VirtuosoQueryExecutionFactory.create(sparqlQuery, set)) {
    
    ResultSet results = vqe.execSelect();
    ResultSetFormatter.out(System.out, results);
}
4. Performing Transactional Updates

For write operations, wrap your logic in a transaction and include a retry mechanism to handle potential deadlocks.

Java
// Assume 'vm' is an existing VirtModel and 'm' is a Model with new triples
while(true) {
    try {
        vm.begin().add(m).commit();
        break; // Success
    } catch (Exception e) {
        if (e.getCause() instanceof java.sql.SQLException &&
           ((java.sql.SQLException)e.getCause()).getSQLState().equals("40001")) {
            System.out.println("Deadlock detected, retrying...");
            vm.abort();
            continue; // Retry transaction
        }
        throw e; // Re-throw other exceptions
    }
}
RDF4j

This guide demonstrates how to connect a Java application to Virtuoso using the Eclipse RDF4J framework, the successor to OpenRDF Sesame.

1. Dependency Management (Maven)

Add the Virtuoso RDF4J provider to your pom.xml. You will also need the Virtuoso JDBC driver on your classpath.

XML
<dependency>
    <groupId>com.openlinksw</groupId>
    <artifactId>virt_rdf4j_v3_7</artifactId>
    <version>1.16</version>
</dependency>
2. Establishing a Repository Connection

Instantiate and initialize a VirtuosoRepository, then obtain a connection using a try-with-resources block for automatic cleanup.

Java
import virtuoso.rdf4j.driver.VirtuosoRepository;
import org.eclipse.rdf4j.repository.Repository;
import org.eclipse.rdf4j.repository.RepositoryConnection;

String jdbcUrl = "jdbc:virtuoso://localhost:1111";
Repository virtuosoRepo = new VirtuosoRepository(jdbcUrl, "dba", "dba");
virtuosoRepo.initialize();

try (RepositoryConnection conn = virtuosoRepo.getConnection()) {
    // Perform data operations using the 'conn' object
}

virtuosoRepo.shutDown();
3. Executing a SPARQL Tuple Query

Use the standard RDF4J RepositoryConnection to prepare and evaluate SPARQL queries.

Java
import org.eclipse.rdf4j.query.*;

try (RepositoryConnection conn = virtuosoRepo.getConnection()) {
    String queryString = "SELECT ?s ?p ?o WHERE { GRAPH <http://example.org/graph> { ?s ?p ?o } } LIMIT 10";
    TupleQuery tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
    
    try (TupleQueryResult result = tupleQuery.evaluate()) {
        while (result.hasNext()) {
            BindingSet bindingSet = result.next();
            // ... process result bindings
        }
    }
}

Python

RDFLib

This guide covers integrating Virtuoso with RDFlib, Python's primary library for working with RDF. The connection relies on Virtuoso's ODBC driver.

1. Environment and DSN Setup

Ensure you have the Virtuoso ODBC driver installed. The connection from RDFlib requires a Data Source Name (DSN) string. The WideAsUTF16=Y parameter is critical for correct Unicode handling.

TEXT
DSN=VOS;UID=dba;PWD=dba;WideAsUTF16=Y
2. Establishing a Connection

Use the RDFlib plugin system to get the "Virtuoso" store and initialize a ConjunctiveGraph with your DSN.

Python
from rdflib.graph import ConjunctiveGraph
from rdflib.store import Store
from rdflib.plugin import get as plugin

# Load the Virtuoso store plugin
virtuoso_store = plugin("Virtuoso", Store)
store = virtuoso_store("DSN=VOS;UID=dba;PWD=dba;WideAsUTF16=Y")

# Create a graph object backed by the Virtuoso store
graph = ConjunctiveGraph(store)
3. Querying Data

Execute SPARQL queries using the standard graph.query() method. It is essential to call graph.commit() even for read queries to release the underlying database cursor.

Python
results = graph.query("SELECT * WHERE { ?s ?p ?o } LIMIT 10")
for row in results:
    print(row)

# IMPORTANT: Release the database cursor
graph.commit()
4. Writing Data via Bulk Loading

For loading any significant amount of data, using the graph.add() method is inefficient. The recommended approach is to use Virtuoso's native bulk loader, which can be orchestrated from Python.

Python
import subprocess

# This example assumes 'isql' is in the system path
# 1. Register files in a whitelisted directory
subprocess.run(["isql", "VOS", "dba", "dba", "exec=ld_dir('/path/to/data', '*.ttl', 'http://example.org/graph');"], check=True)

# 2. Run the loader
subprocess.run(["isql", "VOS", "dba", "dba", "exec=rdf_loader_run();"], check=True)

# 3. Checkpoint the database to make the load permanent (MANDATORY)
subprocess.run(["isql", "VOS", "dba", "dba", "exec=checkpoint;"], check=True)
LlamaIndex

This guide demonstrates how to build a Knowledge Graph-powered Retrieval-Augmented Generation (RAG) system using LlamaIndex with Virtuoso as the backend graph store.

1. The Value of KGs for RAG Accuracy

Research confirms that grounding Large Language Models in the context of a Knowledge Graph dramatically improves question-answering accuracy over complex data compared to querying raw SQL databases directly. The KG provides a high-fidelity semantic map that simplifies query generation for the LLM.

2. Building a KG RAG Pipeline

The process involves using LlamaIndex to extract RDF triples from source documents and storing them in Virtuoso via the SparqlGraphStore. This populated graph can then be queried using natural language.

Python
from llama_index.graph_stores.sparql import SparqlGraphStore
from llama_index.core import KnowledgeGraphIndex, StorageContext
from llama_index.readers.web import WikipediaReader

# 1. Connect to the Virtuoso Graph Store
# Note: Use the 'sparql-auth' endpoint for write access
graph_store = SparqlGraphStore(
    endpoint_url="http://localhost:8890/sparql-auth",
    user="dba",
    password="dba"
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

# 2. Load data and build the Knowledge Graph Index
documents = WikipediaReader(html_to_text=True).load_data(pages=['Barbie (film)'])
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=2
)

# 3. Create a query engine and ask a question
query_engine = kg_index.as_query_engine()
response = query_engine.query("What is the plot of the Barbie movie?")
print(str(response))
LangChain

This guide demonstrates how to use LangChain to build a system that translates natural language questions into SPARQL queries to be executed against a Knowledge Graph.

1. Core Architecture: RAG for SPARQL Generation

The system uses a Retrieval-Augmented Generation (RAG) pattern. When a user asks a question, the system retrieves relevant schema information (classes, properties) and few-shot examples from a vector store. This context is passed to the LLM along with the question, guiding it to generate a valid and accurate SPARQL query.

2. Implementation with GraphSparqlQAChain

The GraphSparqlQAChain is a standard LangChain component for this task. The example below connects to the public DBpedia endpoint and manually provides schema context in the prompt to improve accuracy.

Python
from langchain_openai import ChatOpenAI
from langchain.chains import GraphSparqlQAChain
from langchain_community.graphs import RdfGraph

# 1. Instantiate an LLM and the graph connection
llm = ChatOpenAI(model="gpt-4o", temperature=0)
graph = RdfGraph(query_endpoint="https://dbpedia.org/sparql")

# 2. Create the QA chain
chain = GraphSparqlQAChain.from_llm(llm, graph=graph, verbose=True)

# 3. Formulate a query with inline schema hints (manual RAG)
query = """
Relevant DBpedia Knowledge Graph relationship types (relations):
?movie rdf:type dbo:Film .
?movie dbo:director ?name .

List movies directed by Spike Lee
"""

# 4. Invoke the chain and print the result
result = chain.invoke({"query": query})
print(result["result"])

JavaScript

rdflib.js

This guide demonstrates how to use rdflib.js in a Node.js environment to create, read, and update RDF data stored in Virtuoso via its SPARQL endpoint.

1. Writing Data via SPARQL UPDATE

Use the fetch API to send a POST request with a SPARQL UPDATE query to Virtuoso's /sparql endpoint.

JavaScript
const endpoint = 'http://localhost:8890/sparql';

const updateQuery = `
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
INSERT DATA {
  GRAPH <https://example.com/graph/people> {
    <https://example.com/person/bob> foaf:name "Bob" .
  }
}`;

const response = await fetch(endpoint, {
  method: 'POST',
  headers: { 'Content-Type': 'application/sparql-update' },
  body: updateQuery
});

if (response.ok) {
    console.log('Data inserted successfully.');
}
2. Reading Data via SPARQL SELECT

Send a GET request with a URL-encoded SPARQL SELECT query. Set the Accept header to application/sparql+json to receive results as JSON.

JavaScript
const selectQuery = `
SELECT ?name
FROM <https://example.com/graph/people>
WHERE { ?person foaf:name ?name }`;

const url = `http://localhost:8890/sparql?query=${encodeURIComponent(selectQuery)}`;

const response = await fetch(url, {
  headers: { 'Accept': 'application/sparql-results+json' }
});

const json = await response.json();
console.log(JSON.stringify(json, null, 2));
3. Loading Data into an rdflib.js Store

Execute a SPARQL CONSTRUCT query to fetch a graph from Virtuoso and then parse the Turtle response into a local rdflib.js graph object for client-side manipulation.

JavaScript
import * as $rdf from 'rdflib';

const store = $rdf.graph();
const graphUri = 'https://example.com/graph/people';

const constructQuery = `CONSTRUCT { ?s ?p ?o } FROM <${graphUri}> WHERE { ?s ?p ?o }`;
const url = `http://localhost:8890/sparql?query=${encodeURIComponent(constructQuery)}`;

const response = await fetch(url, { headers: { 'Accept': 'text/turtle' } });
const ttlData = await response.text();

$rdf.parse(ttlData, store, graphUri, 'text/turtle');

console.log(`Loaded ${store.length} triples into the local store.`);