Building and Using Knowledge Graphs via Virtuoso
Virtuoso provides an unrivalled platform for secure, high-performance Knowledge Graph construction. Its capabilities are deceptively simple in both creation and use, while addressing the needs of a wide range of user and operator profiles.
End Users
The following steps get you started quickly:
- Install Virtuoso using the native installers for Windows, macOS, or Linux, Docker Container, Nexus Repository, or a cloud virtual machine (AWS, Azure, or GCP).
- Start the Virtuoso server via the control panels (available for Windows and macOS) or from the command line on Linux, macOS, or Windows.
- Populate your Knowledge Graph using any of the following options:
- Large Language Models (LLMs)
- ODBC
- Sponger Middleware Service
- Mounted WebDAV DET (Dynamic Extension Type) folders
- SPARQL queries leveraging the Sponger Middleware Service
Large Language Models (LLMs)
This approach assumes that you have installed and configured both the OpenLink AI Layer (OPAL) module and the Virtuoso Data Connectivity Kit for your platform as part of the installation process.
- Obtain an API key for your target LLM.
- Start the Virtuoso server.
- Navigate to the OPAL endpoint, for example:
https://{CNAME}/chat; there's also a live instance at http://linkeddata.uriburner.com/chat. - Provide your target LLM’s API key.
- Enable the Data Twingler Agent configuration from the OPAL UI.
- Issue the prompt: “With sponging enabled, provide a starting point for exploring the knowledge graph at https://virtuoso.openlinksw.com.”
- Repeat for other HTTP-accessible documents of interest.
- Ask OPAL to explore the Knowledge Graphs associated with the current Virtuoso instance.
Virtuoso Knowledge Graph Generation via ODBC Data Source Connections
This method assumes that you have installed the Virtuoso Data Connectivity Kit for your platform.
- Start the Virtuoso server.
- Use an ODBC- or JDBC-based application to connect to Virtuoso via Data Source Names (DSNs) installed with the Virtuoso ODBC drivers.
- Identify available ODBC DSNs for external databases.
- Use the Virtual Database Manager to attach tables from external databases using their respective DSNs.
- Use the RDF-based Linked Data Wizard in the Virtuoso Conductor
(
http://{CNAME}/conductor, e.g., http://localhost:8890/conductor) to guide you from table attachment through to generating virtual Knowledge Graphs that are deployed using Linked Data principles.
Virtuoso Knowledge Graph Generation via Sponger Middleware
This approach assumes that the Virtuoso Sponger Middleware modules (including transformer and meta cartridges for various document types—HTML enabled by default) are installed.
- Start the Virtuoso server.
- Identify documents whose content you want to transform into Knowledge Graph data.
- Use the following URL pattern in your browser to perform a best-effort
transformation of a target document:
http://{CNAME}/about/html/{document-url}
Example: http://localhost:8890/about/html/http/virtuoso.openlinksw.com
Live Instance Example: https://linkeddata.uriburner.com/about/html/http/virtuoso.openlinksw.com - Repeat for other documents of interest.
- Begin using your newly generated Knowledge Graphs.
Virtuoso Knowledge Graph Generation via WebDAV DET (Dynamic Extension Type) Folders
This method assumes that the Virtuoso Briefcase module is installed.
- Start the Virtuoso server.
- Open the Briefcase interface at:
http://{CNAME}/DAV/home/{username}/or if you have an OpenLink Account: https://my.openlinksw.com/DAV/home/{username} with a public folder at https://my.openlinksw.com/DAV/home/{username}/Public. - Create an RDF Sink DET folder, specifying the named graph identifier for the generated Knowledge Graph.
- Mount Virtuoso into your local operating system using its built-in WebDAV mounting functionality.
- Identify documents whose content you want to transform into Knowledge Graph data.
- Drag and drop the selected documents into your mounted WebDAV folder.
- Begin using your newly generated Knowledge Graphs via ODBC, JDBC, or SPARQL.
SPARQL Queries Leveraging the Sponger Middleware Service
This approach assumes that the Virtuoso Sponger Middleware modules are installed.
- Identify documents whose content you want to transform into Knowledge Graph data.
- Use the document URL(s) as Named Graph IRIs in the
FROMclause of your SPARQL query to perform a best-effort transformation.
Example SPARQL query:
DEFINE get:soft "soft" # Enables HTTP crawling as part of query execution
SELECT ?type (COUNT(*) AS ?entityCount) (SAMPLE(?entity) AS ?sampleEntity)
WHERE {
GRAPH ?g {
?entity a ?type .
}
}
GROUP BY ?type
ORDER BY DESC(?entityCount)
From an ODBC-compliant application, you can execute the SQL/SPARQL hybrid variant below, as Virtuoso natively supports both SQL and SPARQL:
SPARQL
DEFINE get:soft "soft"
SELECT ?type (COUNT(*) AS ?entityCount) (SAMPLE(?entity) AS ?sampleEntity)
FROM <https://virtuoso.openlinksw.com/>
WHERE {
?entity a ?type .
}
GROUP BY ?type
ORDER BY DESC(?entityCount)
Or embedded within SQL:
SELECT X.type, X.sampleEntity, X.entityCount
FROM (
SPARQL
DEFINE get:soft "soft"
SELECT ?type (COUNT(*) AS ?entityCount) (SAMPLE(?entity) AS ?sampleEntity)
FROM <https://virtuoso.openlinksw.com/>
WHERE {
?entity a ?type .
}
GROUP BY ?type
ORDER BY DESC(?entityCount)
) AS X
- Repeat for additional documents or add more
FROMclauses for new target URLs. - Begin using your generated Knowledge Graphs, for example:
- Via Virtuoso ODBC DSNs in analytics and business intelligence tools
- By embedding entity hyperlinks in communications, enabling recipients to click through and explore linked Knowledge Graph data
Developers
HTTP, ODBC, or JDBC provide direct access to Virtuoso through its native protocol support. The following guides demonstrate how to connect and interact with Virtuoso's RDF graph store using popular programming languages and libraries. All end-user workflows described above can be repeated programmatically using any of these supported protocols.
Java
Jena
This guide provides a focused walkthrough for integrating Virtuoso with Apache Jena. It covers project setup, connecting to the database, and performing essential data operations.
1. Dependency Management (Maven)
Add the official Virtuoso Jena Provider and JDBC driver artifacts from Maven Central to
your pom.xml.
<!-- For Jena 4.3.x -->
<dependency>
<groupId>com.openlinksw</groupId>
<artifactId>virt_jena_v4_3</artifactId>
<version>1.35</version>
</dependency>
<dependency>
<groupId>com.openlinksw</groupId>
<artifactId>virtjdbc4_3</artifactId>
<version>3.123</version>
</dependency>
2. Establishing a Connection
Use a standard JDBC connection string to instantiate a VirtGraph, which
represents a connection to a named graph in Virtuoso. The log_enable=2
parameter is recommended for optimal write performance.
import virtuoso.jena.driver.*;
import org.apache.jena.rdf.model.Model;
String url = "jdbc:virtuoso://localhost:1111?charset=UTF-8&log_enable=2";
String graphName = "http://example.org/my-graph";
// Create a VirtGraph instance connected to a specific named graph
VirtGraph set = new VirtGraph(graphName, url, "dba", "dba");
// Wrap the graph with the Jena Model API for easy interaction
Model model = new VirtModel(set);
// The 'model' is now ready for Jena API operations
// Remember to close the graph when finished: set.close();
3. Executing a SPARQL SELECT Query
Use the VirtuosoQueryExecutionFactory to run SPARQL queries directly against
the Virtuoso engine.
import org.apache.jena.query.*;
String sparqlQuery = "SELECT ?s ?p ?o FROM <http://example.org/my-graph> WHERE { ?s ?p ?o } LIMIT 10";
// Use try-with-resources for automatic cleanup
try (VirtGraph set = new VirtGraph(url, "dba", "dba");
QueryExecution vqe = VirtuosoQueryExecutionFactory.create(sparqlQuery, set)) {
ResultSet results = vqe.execSelect();
ResultSetFormatter.out(System.out, results);
}
4. Performing Transactional Updates
For write operations, wrap your logic in a transaction and include a retry mechanism to handle potential deadlocks.
// Assume 'vm' is an existing VirtModel and 'm' is a Model with new triples
while(true) {
try {
vm.begin().add(m).commit();
break; // Success
} catch (Exception e) {
if (e.getCause() instanceof java.sql.SQLException &&
((java.sql.SQLException)e.getCause()).getSQLState().equals("40001")) {
System.out.println("Deadlock detected, retrying...");
vm.abort();
continue; // Retry transaction
}
throw e; // Re-throw other exceptions
}
}
RDF4j
This guide demonstrates how to connect a Java application to Virtuoso using the Eclipse RDF4J framework, the successor to OpenRDF Sesame.
1. Dependency Management (Maven)
Add the Virtuoso RDF4J provider to your pom.xml. You will also need the
Virtuoso JDBC driver on your classpath.
<dependency>
<groupId>com.openlinksw</groupId>
<artifactId>virt_rdf4j_v3_7</artifactId>
<version>1.16</version>
</dependency>
2. Establishing a Repository Connection
Instantiate and initialize a VirtuosoRepository, then obtain a connection
using a try-with-resources block for automatic cleanup.
import virtuoso.rdf4j.driver.VirtuosoRepository;
import org.eclipse.rdf4j.repository.Repository;
import org.eclipse.rdf4j.repository.RepositoryConnection;
String jdbcUrl = "jdbc:virtuoso://localhost:1111";
Repository virtuosoRepo = new VirtuosoRepository(jdbcUrl, "dba", "dba");
virtuosoRepo.initialize();
try (RepositoryConnection conn = virtuosoRepo.getConnection()) {
// Perform data operations using the 'conn' object
}
virtuosoRepo.shutDown();
3. Executing a SPARQL Tuple Query
Use the standard RDF4J RepositoryConnection to prepare and evaluate SPARQL
queries.
import org.eclipse.rdf4j.query.*;
try (RepositoryConnection conn = virtuosoRepo.getConnection()) {
String queryString = "SELECT ?s ?p ?o WHERE { GRAPH <http://example.org/graph> { ?s ?p ?o } } LIMIT 10";
TupleQuery tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
try (TupleQueryResult result = tupleQuery.evaluate()) {
while (result.hasNext()) {
BindingSet bindingSet = result.next();
// ... process result bindings
}
}
}
Python
RDFLib
This guide covers integrating Virtuoso with RDFlib, Python's primary library for working with RDF. The connection relies on Virtuoso's ODBC driver.
1. Environment and DSN Setup
Ensure you have the Virtuoso ODBC driver installed. The connection from RDFlib requires a
Data Source Name (DSN) string. The WideAsUTF16=Y parameter is critical for
correct Unicode handling.
DSN=VOS;UID=dba;PWD=dba;WideAsUTF16=Y
2. Establishing a Connection
Use the RDFlib plugin system to get the "Virtuoso" store and initialize a
ConjunctiveGraph with your DSN.
from rdflib.graph import ConjunctiveGraph
from rdflib.store import Store
from rdflib.plugin import get as plugin
# Load the Virtuoso store plugin
virtuoso_store = plugin("Virtuoso", Store)
store = virtuoso_store("DSN=VOS;UID=dba;PWD=dba;WideAsUTF16=Y")
# Create a graph object backed by the Virtuoso store
graph = ConjunctiveGraph(store)
3. Querying Data
Execute SPARQL queries using the standard graph.query() method. It is
essential to call graph.commit() even for read queries to release the
underlying database cursor.
results = graph.query("SELECT * WHERE { ?s ?p ?o } LIMIT 10")
for row in results:
print(row)
# IMPORTANT: Release the database cursor
graph.commit()
4. Writing Data via Bulk Loading
For loading any significant amount of data, using the graph.add() method is
inefficient. The recommended approach is to use Virtuoso's native bulk loader, which can
be orchestrated from Python.
import subprocess
# This example assumes 'isql' is in the system path
# 1. Register files in a whitelisted directory
subprocess.run(["isql", "VOS", "dba", "dba", "exec=ld_dir('/path/to/data', '*.ttl', 'http://example.org/graph');"], check=True)
# 2. Run the loader
subprocess.run(["isql", "VOS", "dba", "dba", "exec=rdf_loader_run();"], check=True)
# 3. Checkpoint the database to make the load permanent (MANDATORY)
subprocess.run(["isql", "VOS", "dba", "dba", "exec=checkpoint;"], check=True)
LlamaIndex
This guide demonstrates how to build a Knowledge Graph-powered Retrieval-Augmented Generation (RAG) system using LlamaIndex with Virtuoso as the backend graph store.
1. The Value of KGs for RAG Accuracy
Research confirms that grounding Large Language Models in the context of a Knowledge Graph dramatically improves question-answering accuracy over complex data compared to querying raw SQL databases directly. The KG provides a high-fidelity semantic map that simplifies query generation for the LLM.
2. Building a KG RAG Pipeline
The process involves using LlamaIndex to extract RDF triples from source documents and
storing them in Virtuoso via the SparqlGraphStore. This populated graph can
then be queried using natural language.
from llama_index.graph_stores.sparql import SparqlGraphStore
from llama_index.core import KnowledgeGraphIndex, StorageContext
from llama_index.readers.web import WikipediaReader
# 1. Connect to the Virtuoso Graph Store
# Note: Use the 'sparql-auth' endpoint for write access
graph_store = SparqlGraphStore(
endpoint_url="http://localhost:8890/sparql-auth",
user="dba",
password="dba"
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
# 2. Load data and build the Knowledge Graph Index
documents = WikipediaReader(html_to_text=True).load_data(pages=['Barbie (film)'])
kg_index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
max_triplets_per_chunk=2
)
# 3. Create a query engine and ask a question
query_engine = kg_index.as_query_engine()
response = query_engine.query("What is the plot of the Barbie movie?")
print(str(response))
LangChain
This guide demonstrates how to use LangChain to build a system that translates natural language questions into SPARQL queries to be executed against a Knowledge Graph.
1. Core Architecture: RAG for SPARQL Generation
The system uses a Retrieval-Augmented Generation (RAG) pattern. When a user asks a question, the system retrieves relevant schema information (classes, properties) and few-shot examples from a vector store. This context is passed to the LLM along with the question, guiding it to generate a valid and accurate SPARQL query.
2. Implementation with GraphSparqlQAChain
The GraphSparqlQAChain is a standard LangChain component for this task. The
example below connects to the public DBpedia endpoint and manually provides schema
context in the prompt to improve accuracy.
from langchain_openai import ChatOpenAI
from langchain.chains import GraphSparqlQAChain
from langchain_community.graphs import RdfGraph
# 1. Instantiate an LLM and the graph connection
llm = ChatOpenAI(model="gpt-4o", temperature=0)
graph = RdfGraph(query_endpoint="https://dbpedia.org/sparql")
# 2. Create the QA chain
chain = GraphSparqlQAChain.from_llm(llm, graph=graph, verbose=True)
# 3. Formulate a query with inline schema hints (manual RAG)
query = """
Relevant DBpedia Knowledge Graph relationship types (relations):
?movie rdf:type dbo:Film .
?movie dbo:director ?name .
List movies directed by Spike Lee
"""
# 4. Invoke the chain and print the result
result = chain.invoke({"query": query})
print(result["result"])
JavaScript
rdflib.js
This guide demonstrates how to use rdflib.js in a Node.js environment to create, read, and update RDF data stored in Virtuoso via its SPARQL endpoint.
1. Writing Data via SPARQL UPDATE
Use the fetch API to send a POST request with a SPARQL UPDATE query to
Virtuoso's /sparql endpoint.
const endpoint = 'http://localhost:8890/sparql';
const updateQuery = `
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
INSERT DATA {
GRAPH <https://example.com/graph/people> {
<https://example.com/person/bob> foaf:name "Bob" .
}
}`;
const response = await fetch(endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/sparql-update' },
body: updateQuery
});
if (response.ok) {
console.log('Data inserted successfully.');
}
2. Reading Data via SPARQL SELECT
Send a GET request with a URL-encoded SPARQL SELECT query. Set the Accept
header to application/sparql+json to receive results as JSON.
const selectQuery = `
SELECT ?name
FROM <https://example.com/graph/people>
WHERE { ?person foaf:name ?name }`;
const url = `http://localhost:8890/sparql?query=${encodeURIComponent(selectQuery)}`;
const response = await fetch(url, {
headers: { 'Accept': 'application/sparql-results+json' }
});
const json = await response.json();
console.log(JSON.stringify(json, null, 2));
3. Loading Data into an rdflib.js Store
Execute a SPARQL CONSTRUCT query to fetch a graph from Virtuoso and then parse the Turtle
response into a local rdflib.js graph object for client-side manipulation.
import * as $rdf from 'rdflib';
const store = $rdf.graph();
const graphUri = 'https://example.com/graph/people';
const constructQuery = `CONSTRUCT { ?s ?p ?o } FROM <${graphUri}> WHERE { ?s ?p ?o }`;
const url = `http://localhost:8890/sparql?query=${encodeURIComponent(constructQuery)}`;
const response = await fetch(url, { headers: { 'Accept': 'text/turtle' } });
const ttlData = await response.text();
$rdf.parse(ttlData, store, graphUri, 'text/turtle');
console.log(`Loaded ${store.length} triples into the local store.`);