OpenLink Logo Virtuoso Knowledge Graph Developer Guide

Building RDF Knowledge Graphs with Virtuoso

This guide provides a comprehensive, hands-on resource for developers looking to build and interact with high-performance RDF Knowledge Graphs using OpenLink Virtuoso. It covers programmatic access through various popular languages and libraries, offering practical code examples for connecting to Virtuoso, performing data operations, and leveraging advanced features.

HTTP, ODBC, or JDBC provide direct access to Virtuoso through its native protocol support. The following guides demonstrate how to connect and interact with Virtuoso's RDF graph store using popular programming languages and libraries.

REST Interactions (cURL)

Conceptual Overview: At its core, Virtuoso supports the standardized SPARQL 1.1 Protocol over HTTP. This means you do not strictly need a language-specific driver; any tool capable of sending HTTP requests (like cURL, Postman, or `fetch`) can interact with the Knowledge Graph. This section demonstrates the raw HTTP interactions that underpin the language libraries below.

Basic SPARQL Protocol

1. Executing a Query (HTTP GET)
BASH
# Select 5 distinct concepts from the graph
curl -G "http://localhost:8890/sparql" \
  --data-urlencode "query=SELECT DISTINCT ?Concept WHERE {[] a ?Concept} LIMIT 5" \
  --data-urlencode "format=application/json"
2. Inserting Data (HTTP POST with Auth)

Virtuoso supports multiple authentication mechanisms. Below are examples for standard Digest Authentication and OAuth 2.0.

BASH
# Option A: Digest Authentication (Default dba/dba)
curl --digest -u dba:dba -X POST "http://localhost:8890/sparql" \
  --data-urlencode "query=INSERT DATA { GRAPH <http://example.org/curl-test> { <http://example.org/s> <http://example.org/p> 'Digest Auth' } }"

# Option B: OAuth 2.0 Bearer Token
curl -H "Authorization: Bearer {YOUR_ACCESS_TOKEN}" -X POST "http://localhost:8890/sparql" \
  --data-urlencode "query=INSERT DATA { GRAPH <http://example.org/curl-test> { <http://example.org/s> <http://example.org/p> 'OAuth Token' } }"

Public Endpoint Example (DBpedia)

Querying the public DBpedia endpoint to list movies directed by Spike Lee.

BASH
curl -G "https://dbpedia.org/sparql" \
  --data-urlencode "default-graph-uri=http://dbpedia.org" \
  --data-urlencode "query=SELECT ?movie ?name WHERE { ?movie dbo:director dbr:Spike_Lee ; rdfs:label ?name . FILTER (LANG(?name) = 'en') } LIMIT 10" \
  --data-urlencode "format=application/json"

Virtuoso Sponger Middleware

The Sponger extracts metadata from non-RDF sources and converts it to RDF entities on the fly. The URL pattern is https://{CNAME}/about/html/{TargetURL}.

BASH
# Generate a Knowledge Graph from the Virtuoso product page using URIBurner
curl -L "https://linkeddata.uriburner.com/about/html/http/virtuoso.openlinksw.com"
Sponging with Aggregation (Interactive Auth)

This example uses the DEFINE get:soft "soft" pragma to dynamically fetch (sponge) data from the source URL during query execution if it doesn't already exist. By omitting the password in the -u flag, cURL will pause and prompt you to handle the authentication challenge interactively.

BASH
# Interactive Authentication: cURL will prompt for the 'dba' password
# The query aggregates entities by type found on the target page
curl -G --digest -u dba "http://localhost:8890/sparql" \
  --data-urlencode "query=DEFINE get:soft 'soft' SELECT ?type (COUNT(*) AS ?entityCount) (SAMPLE(?entity) AS ?sampleEntity) FROM <https://virtuoso.openlinksw.com/> WHERE { ?entity a ?type } GROUP BY ?type ORDER BY DESC(?entityCount)" \
  --data-urlencode "format=application/json"

Java

Jena

Conceptual Overview: Apache Jena is a comprehensive Java framework for building RDF based Knowledge Graphs. Conceptually, the Virtuoso Jena Provider acts as a translation layer, allowing you to use standard Jena classes (like Model and Graph) while the data resides in high-performance Virtuoso storage. This means you write standard Jena code, but gain the scalability of Virtuoso's DBMS engine instead of relying on in-memory or file-based storage.

This guide provides a focused walkthrough for integrating Virtuoso with Apache Jena. It covers project setup, connecting to the database, and performing essential data operations.

1. Dependency Management (Maven)

Add the official Virtuoso Jena Provider and JDBC driver artifacts from Maven Central to your pom.xml.

XML
<!-- For Jena 4.3.x -->
<dependency>
    <groupId>com.openlinksw</groupId>
    <artifactId>virt_jena_v4_3</artifactId>
    <version>1.35</version>
</dependency>
<dependency>
    <groupId>com.openlinksw</groupId>
    <artifactId>virtjdbc4_3</artifactId>
    <version>3.123</version>
</dependency>
2. Establishing a Connection

Use a standard JDBC connection string to instantiate a VirtGraph, which represents a connection to a named graph in Virtuoso. The log_enable=2 parameter is recommended for optimal write performance.

Java
import virtuoso.jena.driver.*;
import org.apache.jena.rdf.model.Model;

String url = "jdbc:virtuoso://localhost:1111?charset=UTF-8&log_enable=2";
String graphName = "http://example.org/my-graph";

// Create a VirtGraph instance connected to a specific named graph
VirtGraph set = new VirtGraph(graphName, url, "dba", "dba");

// Wrap the graph with the Jena Model API for easy interaction
Model model = new VirtModel(set);

// The 'model' is now ready for Jena API operations
// Remember to close the graph when finished: set.close();
3. Executing a SPARQL SELECT Query

Use the VirtuosoQueryExecutionFactory to run SPARQL queries directly against the Virtuoso engine.

Java
import org.apache.jena.query.*;

String sparqlQuery = "SELECT ?s ?p ?o FROM <http://example.org/my-graph> WHERE { ?s ?p ?o } LIMIT 10";

// Use try-with-resources for automatic cleanup
try (VirtGraph set = new VirtGraph(url, "dba", "dba");
     QueryExecution vqe = VirtuosoQueryExecutionFactory.create(sparqlQuery, set)) {
    
    ResultSet results = vqe.execSelect();
    ResultSetFormatter.out(System.out, results);
}
4. Performing Transactional Updates

For write operations, wrap your logic in a transaction and include a retry mechanism to handle potential deadlocks.

Java
// Assume 'vm' is an existing VirtModel and 'm' is a Model with new triples
while(true) {
    try {
        vm.begin().add(m).commit();
        break; // Success
    } catch (Exception e) {
        if (e.getCause() instanceof java.sql.SQLException &&
           ((java.sql.SQLException)e.getCause()).getSQLState().equals("40001")) {
            System.out.println("Deadlock detected, retrying...");
            vm.abort();
            continue; // Retry transaction
        }
        throw e; // Re-throw other exceptions
    }
}
Complete Runnable Example: Data Selection (Read)

This class connects to Virtuoso and executes a SPARQL Select query.

Java
import virtuoso.jena.driver.*;
import org.apache.jena.query.*;

public class VirtuosoJenaSelect {
    public static void main(String[] args) {
        String url = "jdbc:virtuoso://localhost:1111?charset=UTF-8&log_enable=2";
        String user = "dba";
        String pwd = "dba";
        String graphName = "http://example.org/jena-graph";

        VirtGraph set = new VirtGraph(graphName, url, user, pwd);

        String query = "SELECT * FROM <" + graphName + "> WHERE { ?s ?p ?o } LIMIT 10";
        
        try (QueryExecution vqe = VirtuosoQueryExecutionFactory.create(query, set)) {
            System.out.println("Executing query: " + query);
            ResultSet results = vqe.execSelect();
            ResultSetFormatter.out(System.out, results);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            set.close();
        }
    }
}
Complete Runnable Example: Data Insertion (Write)

This class connects to Virtuoso and inserts data using the Bulk Update Handler.

Java
import virtuoso.jena.driver.*;
import org.apache.jena.rdf.model.*;

public class VirtuosoJenaInsert {
    public static void main(String[] args) {
        String url = "jdbc:virtuoso://localhost:1111?charset=UTF-8&log_enable=2";
        String user = "dba";
        String pwd = "dba";
        String graphName = "http://example.org/jena-graph";

        System.out.println("Connecting to " + url + " ...");
        VirtGraph set = new VirtGraph(graphName, url, user, pwd);

        // Prepare Data
        System.out.println("Creating sample data...");
        Model m = ModelFactory.createDefaultModel();
        Resource subject = ResourceFactory.createResource("http://example.org/alice");
        Property predicate = ResourceFactory.createProperty("http://xmlns.com/foaf/0.1/name");
        Literal object = ResourceFactory.createPlainLiteral("Alice");
        m.add(subject, predicate, object);

        // Perform Insert
        System.out.println("Inserting triples...");
        set.getBulkUpdateHandler().add(m.getGraph());
        
        set.close();
        System.out.println("Data inserted successfully.");
    }
}

RDF4j

Conceptual Overview: Eclipse RDF4j offers a powerful, modular architecture for RDF storage and querying. The Virtuoso RDF4j Provider implements the Repository interface, enabling your Java applications to treat Virtuoso as a standard RDF4j repository. This abstraction allows you to leverage the full suite of RDF4j tools (parsers, query builders) while delegating the heavy lifting of query execution and data persistence to Virtuoso.

This guide demonstrates how to connect a Java application to Virtuoso using the Eclipse RDF4J framework, the successor to OpenRDF Sesame.

1. Dependency Management (Maven)

Add the Virtuoso RDF4J provider to your pom.xml. You will also need the Virtuoso JDBC driver on your classpath.

XML
<dependency>
    <groupId>com.openlinksw</groupId>
    <artifactId>virt_rdf4j_v3_7</artifactId>
    <version>1.16</version>
</dependency>
2. Establishing a Repository Connection

Instantiate and initialize a VirtuosoRepository, then obtain a connection using a try-with-resources block for automatic cleanup.

Java
import virtuoso.rdf4j.driver.VirtuosoRepository;
import org.eclipse.rdf4j.repository.Repository;
import org.eclipse.rdf4j.repository.RepositoryConnection;

String jdbcUrl = "jdbc:virtuoso://localhost:1111";
Repository virtuosoRepo = new VirtuosoRepository(jdbcUrl, "dba", "dba");
virtuosoRepo.initialize();

try (RepositoryConnection conn = virtuosoRepo.getConnection()) {
    // Perform data operations using the 'conn' object
}

virtuosoRepo.shutDown();
3. Executing a SPARQL Tuple Query

Use the standard RDF4J RepositoryConnection to prepare and evaluate SPARQL queries.

Java
import org.eclipse.rdf4j.query.*;

try (RepositoryConnection conn = virtuosoRepo.getConnection()) {
    String queryString = "SELECT ?s ?p ?o WHERE { GRAPH <http://example.org/graph> { ?s ?p ?o } } LIMIT 10";
    TupleQuery tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
    
    try (TupleQueryResult result = tupleQuery.evaluate()) {
        while (result.hasNext()) {
            BindingSet bindingSet = result.next();
            // ... process result bindings
        }
    }
}
Complete Runnable Example: Data Selection (Read)

This class executes a SPARQL Tuple Query and prints the bindings.

Java
import virtuoso.rdf4j.driver.VirtuosoRepository;
import org.eclipse.rdf4j.repository.Repository;
import org.eclipse.rdf4j.repository.RepositoryConnection;
import org.eclipse.rdf4j.query.*;

public class VirtuosoRDF4JSelect {
    public static void main(String[] args) {
        String jdbcUrl = "jdbc:virtuoso://localhost:1111";
        String user = "dba";
        String password = "dba";
        
        Repository virtuosoRepo = new VirtuosoRepository(jdbcUrl, user, password);
        virtuosoRepo.initialize();
        
        try (RepositoryConnection conn = virtuosoRepo.getConnection()) {
            String queryString = "SELECT * FROM <http://example.org/rdf4j-graph> WHERE { ?s ?p ?o } LIMIT 10";
            TupleQuery tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
            
            System.out.println("Executing query...");
            try (TupleQueryResult result = tupleQuery.evaluate()) {
                while (result.hasNext()) {
                    BindingSet bindingSet = result.next();
                    System.out.println("Found: " + bindingSet);
                }
            }
        } finally {
            virtuosoRepo.shutDown();
        }
    }
}
Complete Runnable Example: Data Insertion (Write)

This class initializes a repository and adds a triple to a specific named graph.

Java
import virtuoso.rdf4j.driver.VirtuosoRepository;
import org.eclipse.rdf4j.repository.Repository;
import org.eclipse.rdf4j.repository.RepositoryConnection;
import org.eclipse.rdf4j.model.ValueFactory;
import org.eclipse.rdf4j.model.IRI;

public class VirtuosoRDF4JInsert {
    public static void main(String[] args) {
        String jdbcUrl = "jdbc:virtuoso://localhost:1111";
        String user = "dba";
        String password = "dba";
        
        Repository virtuosoRepo = new VirtuosoRepository(jdbcUrl, user, password);
        virtuosoRepo.initialize();
        
        try (RepositoryConnection conn = virtuosoRepo.getConnection()) {
            ValueFactory vf = conn.getValueFactory();
            IRI subject = vf.createIRI("http://example.org/bob");
            IRI predicate = vf.createIRI("http://xmlns.com/foaf/0.1/name");
            org.eclipse.rdf4j.model.Literal object = vf.createLiteral("Bob");
            IRI context = vf.createIRI("http://example.org/rdf4j-graph");
            
            System.out.println("Inserting triple...");
            conn.add(subject, predicate, object, context);
            System.out.println("Done.");
        } finally {
            virtuosoRepo.shutDown();
        }
    }
}

Python

RDFLib

Conceptual Overview: RDFLib is the de facto standard library for working with RDF in Python. The integration with Virtuoso is achieved through a specific Store plugin that routes operations over an ODBC connection. Conceptually, this allows your Python scripts to manipulate a ConjunctiveGraph object as if it were local, while transparently translating read/write operations into SQL/SPARQL commands sent to the Virtuoso database server.

This guide covers integrating Virtuoso with RDFLib, Python's primary library for working with RDF.

Option 1: ODBC Connection (Native/High-Performance)

The connection relies on Virtuoso's ODBC driver. This is the preferred method for high-performance applications.

1. Environment and DSN Setup

Ensure you have the Virtuoso ODBC driver installed. The connection from RDFlib requires a Data Source Name (DSN) string. The WideAsUTF16=Y parameter is critical for correct Unicode handling.

TEXT
DSN=Local Virtuoso;UID=dba;PWD=dba;WideAsUTF16=Y
2. Establishing a Connection

Use the RDFlib plugin system to get the "Virtuoso" store and initialize a ConjunctiveGraph with your DSN.

Python
from rdflib.graph import ConjunctiveGraph
from rdflib.store import Store
from rdflib.plugin import get as plugin

# Load the Virtuoso store plugin
virtuoso_store = plugin("Virtuoso", Store)
store = virtuoso_store("DSN=Local Virtuoso;UID=dba;PWD=dba;WideAsUTF16=Y")

# Create a graph object backed by the Virtuoso store
graph = ConjunctiveGraph(store)
3. Querying Data

Execute SPARQL queries using the standard graph.query() method. It is essential to call graph.commit() even for read queries to release the underlying database cursor.

Python
results = graph.query("SELECT * WHERE { ?s ?p ?o } LIMIT 10")
for row in results:
    print(row)

# IMPORTANT: Release the database cursor
graph.commit()
4. Writing Data via Bulk Loading

For loading any significant amount of data, using the graph.add() method is inefficient. The recommended approach is to use Virtuoso's native bulk loader, which can be orchestrated from Python.

Python
import subprocess

# This example assumes 'isql' is in the system path
# 1. Register files in a whitelisted directory
subprocess.run(["isql", "Local Virtuoso", "dba", "dba", "exec=ld_dir('/path/to/data', '*.ttl', 'http://example.org/graph');"], check=True)

# 2. Run the loader
subprocess.run(["isql", "Local Virtuoso", "dba", "dba", "exec=rdf_loader_run();"], check=True)

# 3. Checkpoint the database to make the load permanent (MANDATORY)
subprocess.run(["isql", "Local Virtuoso", "dba", "dba", "exec=checkpoint;"], check=True)
Complete Runnable Example: ODBC (Select)

A Python script that connects to Virtuoso via ODBC, queries data, and prints the results. Best for high-performance direct connections.

Python
import sys
from rdflib import ConjunctiveGraph, Namespace, Literal
from rdflib.store import Store
from rdflib.plugin import get as plugin

def run_odbc_example():
    # Configuration - Ensure DSN is configured in odbc.ini
    dsn_string = "DSN=Local Virtuoso;UID=dba;PWD=dba;WideAsUTF16=Y"
    
    try:
        # 1. Initialize Connection
        print(f"Connecting to Virtuoso via ODBC with: {dsn_string}")
        Virtuoso = plugin("Virtuoso", Store)
        store = Virtuoso(dsn_string)
        graph = ConjunctiveGraph(store)
        
        # 2. Query Data
        query = "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 5"
        print(f"Executing query: {query}")
        
        results = graph.query(query)
        
        for row in results:
            print(f"Result: {row}")
            
        # 3. Cleanup
        # Committing read transactions releases the cursor
        graph.commit() 
        print("Success.")
        
    except ImportError:
        print("Error: 'virtuoso' plugin not found. Is 'pyodbc' and the virtuoso driver installed?")
    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    run_odbc_example()

Option 2: HTTP Connection (SPARQL Protocol)

While ODBC offers high-performance bulk operations, RDFLib can also connect via standard HTTP using SPARQLStore. This removes the requirement for system-level ODBC drivers, making it easier to run in containerized or restricted environments.

Python
from rdflib import Graph, URIRef, Literal
from rdflib.plugins.stores.sparqlstore import SPARQLUpdateStore

# Configuration
query_endpoint = "http://localhost:8890/sparql"
update_endpoint = "http://localhost:8890/sparql"
auth_credentials = ("dba", "dba") 

# Initialize Store with HTTP Basic/Digest Auth support
store = SPARQLUpdateStore(
    query_endpoint=query_endpoint,
    update_endpoint=update_endpoint,
    auth=auth_credentials
)

# Bind to Graph
# Note: Providing an identifier is mandatory for Virtuoso INSERTs
graph = Graph(store, identifier="http://example.org/http-graph")

# Operations (Standard RDFLib API)
graph.add((URIRef("http://ex.org/s"), URIRef("http://ex.org/p"), Literal("HTTP Val")))
print("Triple added via HTTP.")
Complete Runnable Example: HTTP (Select)

A Python script that performs a SPARQL Select query via HTTP.

Python
import argparse
from rdflib import Graph
from rdflib.plugins.stores.sparqlstore import SPARQLStore

def run_http_select():
    parser = argparse.ArgumentParser(description="RDFLib HTTP Select Example")
    parser.add_argument(
        "--query-url", 
        default="http://localhost:8890/sparql", 
        help="SPARQL Query Endpoint"
    )
    args = parser.parse_args()

    print(f"Connecting to Query Endpoint: {args.query_url}...")

    try:
        # Initialize Read-Only Store with JSON return format for robustness
        store = SPARQLStore(
            query_endpoint=args.query_url,
            returnFormat="json"
        )
        graph = Graph(store)
        
        query = "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 5"
        print(f"Executing query: {query}")
        
        for row in graph.query(query):
            print(row)
            
        print("Query complete.")
        
    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    run_http_select()
Complete Runnable Example: HTTP (Insert)

A Python script that performs a SPARQL Update via HTTP. Uses the same endpoint for query and update for simplicity, useful for troubleshooting.

Python
import argparse
from rdflib import Graph, URIRef, Literal
from rdflib.plugins.stores.sparqlstore import SPARQLUpdateStore

def run_http_insert():
    parser = argparse.ArgumentParser(description="RDFLib HTTP Insert Example")
    # Note: Using the same endpoint for query and update is a common troubleshooting pattern
    parser.add_argument("--endpoint", default="http://localhost:8890/sparql", help="SPARQL Endpoint")
    parser.add_argument("--user", default="dba", help="Database Username")
    parser.add_argument("--password", default="dba", help="Database Password")
    args = parser.parse_args()

    print(f"Connecting to Endpoint: {args.endpoint}...")

    try:
        # Initialize Store with Authentication
        store = SPARQLUpdateStore(
            query_endpoint=args.endpoint,
            update_endpoint=args.endpoint,
            auth=(args.user, args.password),
            returnFormat="json"
        )

        # Bind to specific Named Graph
        # This is critical: Virtuoso requires a named graph context for inserts
        g = Graph(store, identifier=URIRef("urn:graphs:example"))
        
        print("Inserting triple via HTTP...")
        g.add((
            URIRef("#this"),
            URIRef("#label"),
            Literal("hello named graph")
        ))
        
        print("Insert succeeded.")
        
    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    run_http_insert()
Complete Runnable Example: HTTP (Schema.org & Namespaces)

A derivative example showcasing how to bind namespaces and use schema.org terms for structured data insertion.

Python
import argparse
from rdflib import Graph, URIRef, Literal, Namespace
from rdflib.namespace import RDF
from rdflib.plugins.stores.sparqlstore import SPARQLUpdateStore

def run_schema_insert():
    parser = argparse.ArgumentParser(description="RDFLib Schema.org Insert Example")
    parser.add_argument("--endpoint", default="http://localhost:8890/sparql", help="SPARQL Endpoint")
    parser.add_argument("--user", default="dba", help="Database Username")
    parser.add_argument("--password", default="dba", help="Database Password")
    args = parser.parse_args()

    try:
        # 1. Initialize Store
        store = SPARQLUpdateStore(
            query_endpoint=args.endpoint,
            update_endpoint=args.endpoint,
            auth=(args.user, args.password),
            returnFormat="json"
        )

        # 2. Initialize Graph with Named Graph identifier
        g = Graph(store, identifier=URIRef("urn:graphs:schema-example"))

        # 3. Define and Bind Namespaces
        SCHEMA = Namespace("http://schema.org/")
        g.bind("schema", SCHEMA)

        # 4. Create Data (Person Entity)
        person_uri = URIRef("http://example.org/person/jane")
        
        print(f"Adding Schema.org data for: {person_uri}")
        
        g.add((person_uri, RDF.type, SCHEMA.Person))
        g.add((person_uri, SCHEMA.name, Literal("Jane Doe")))
        g.add((person_uri, SCHEMA.jobTitle, Literal("Knowledge Engineer")))
        g.add((person_uri, SCHEMA.url, URIRef("http://example.org/jane")))

        print("Insert succeeded.")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    run_schema_insert()

LlamaIndex

Conceptual Overview: LlamaIndex bridges the gap between Large Language Models (LLMs) and your external data. When integrating with Virtuoso, LlamaIndex uses the SparqlGraphStore class to offload knowledge storage. Conceptually, this turns Virtuoso into the 'long-term memory' for your AI application, allowing the LLM to retrieve precise, structured facts via SPARQL queries rather than relying solely on vector similarity or limited context windows.

This guide demonstrates how to build a Knowledge Graph-powered Retrieval-Augmented Generation (RAG) system using LlamaIndex with Virtuoso as the backend graph store.

1. The Value of KGs for RAG Accuracy

Research confirms that grounding Large Language Models in the context of a Knowledge Graph dramatically improves question-answering accuracy over complex data compared to querying raw SQL databases directly. The KG provides a high-fidelity semantic map that simplifies query generation for the LLM.

2. Building a KG RAG Pipeline

The process involves using LlamaIndex to extract RDF triples from source documents and storing them in Virtuoso via the SparqlGraphStore. This populated graph can then be queried using natural language.

Python
from llama_index.graph_stores.sparql import SparqlGraphStore
from llama_index.core import KnowledgeGraphIndex, StorageContext
from llama_index.readers.web import WikipediaReader

# 1. Connect to the Virtuoso Graph Store
# Note: Use the 'sparql' endpoint with auth for write access
graph_store = SparqlGraphStore(
    endpoint_url="http://localhost:8890/sparql",
    user="dba",
    password="dba"
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

# 2. Load data and build the Knowledge Graph Index
documents = WikipediaReader(html_to_text=True).load_data(pages=['Barbie (film)'])
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=2
)

# 3. Create a query engine and ask a question
query_engine = kg_index.as_query_engine()
response = query_engine.query("What is the plot of the Barbie movie?")
print(str(response))
Complete Runnable Example

A script to initialize a SparqlGraphStore and ask a question. Note: You must set your OPENAI_API_KEY in your environment variables for this to work.

Python
import os
from llama_index.graph_stores.sparql import SparqlGraphStore
from llama_index.core import KnowledgeGraphIndex, StorageContext, Document

def run_rag_example():
    if not os.environ.get("OPENAI_API_KEY"):
        print("Please set OPENAI_API_KEY environment variable.")
        return

    # 1. Configure Virtuoso Connection
    endpoint = "http://localhost:8890/sparql"
    graph_store = SparqlGraphStore(
        endpoint_url=endpoint,
        user="dba",
        password="dba"
    )
    
    # 2. Prepare Data
    text = "OpenLink Virtuoso is a high-performance graph database. It supports SPARQL and SQL."
    documents = [Document(text=text)]
    
    # 3. Create Index (Extracts triples and stores in Virtuoso)
    print("Building Index...")
    storage_context = StorageContext.from_defaults(graph_store=graph_store)
    index = KnowledgeGraphIndex.from_documents(
        documents,
        storage_context=storage_context,
        max_triplets_per_chunk=2
    )
    
    # 4. Query
    print("Querying...")
    query_engine = index.as_query_engine()
    response = query_engine.query("What does Virtuoso support?")
    print(f"Answer: {response}")

if __name__ == "__main__":
    run_rag_example()

LangChain

Conceptual Overview: LangChain is a framework for orchestrating complex LLM workflows. Its integration with Virtuoso leverages the GraphSparqlQAChain (or similar graph chains) to convert natural language into SPARQL. Conceptually, this setup treats the Knowledge Graph as a read-only tool that the LLM can query to verify facts or gather structured data, effectively grounding the model's responses in the authoritative data stored within Virtuoso.

This guide demonstrates how to use LangChain to build a system that translates natural language questions into SPARQL queries to be executed against a Knowledge Graph.

1. Core Architecture: RAG for SPARQL Generation

The system uses a Retrieval-Augmented Generation (RAG) pattern. When a user asks a question, the system retrieves relevant schema information (classes, properties) and few-shot examples from a vector store. This context is passed to the LLM along with the question, guiding it to generate a valid and accurate SPARQL query.

2. Implementation with GraphSparqlQAChain

The GraphSparqlQAChain is a standard LangChain component for this task. The example below connects to the public DBpedia endpoint and manually provides schema context in the prompt to improve accuracy.

Python
from langchain_openai import ChatOpenAI
from langchain.chains import GraphSparqlQAChain
from langchain_community.graphs import RdfGraph

# 1. Instantiate an LLM and the graph connection
llm = ChatOpenAI(model="gpt-4o", temperature=0)
graph = RdfGraph(query_endpoint="https://dbpedia.org/sparql")

# 2. Create the QA chain
chain = GraphSparqlQAChain.from_llm(llm, graph=graph, verbose=True)

# 3. Formulate a query with inline schema hints (manual RAG)
query = """
Relevant DBpedia Knowledge Graph relationship types (relations):
?movie rdf:type dbo:Film .
?movie dbo:director ?name .

List movies directed by Spike Lee
"""

# 4. Invoke the chain and print the result
result = chain.invoke({"query": query})
print(result["result"])
Complete Runnable Example

A complete script using the `GraphSparqlQAChain`. Requires `OPENAI_API_KEY`.

Python
import os
from langchain_openai import ChatOpenAI
from langchain.chains import GraphSparqlQAChain
from langchain_community.graphs import RdfGraph

def run_langchain_qa():
    if not os.environ.get("OPENAI_API_KEY"):
        print("Please set OPENAI_API_KEY.")
        return

    # 1. Connect to Graph (Using DBpedia for public demo, or local Virtuoso)
    # For local: url="http://localhost:8890/sparql"
    url = "https://dbpedia.org/sparql" 
    print(f"Connecting to graph at {url}...")
    graph = RdfGraph(query_endpoint=url)

    # 2. Initialize LLM
    llm = ChatOpenAI(model="gpt-4o", temperature=0)

    # 3. Build Chain
    chain = GraphSparqlQAChain.from_llm(llm, graph=graph, verbose=True)

    # 4. Run Query
    question = "Who directed the movie 'Inception'?"
    print(f"Question: {question}")
    
    try:
        result = chain.invoke({"query": question})
        print(f"Answer: {result['result']}")
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    run_langchain_qa()

JavaScript

rdflib.js

Conceptual Overview: rdflib.js brings RDF based Knowledge Graph capabilities to the JavaScript ecosystem (Node.js and browser). Unlike the ODBC/JDBC drivers used in other languages, `rdflib.js` interacts with Virtuoso primarily through the standardized HTTP SPARQL Protocol. Conceptually, this decouples your application from the database driver, treating Virtuoso as a web-accessible endpoint for performing CRUD operations via SPARQL Update and Select queries.

This guide demonstrates how to use rdflib.js in a Node.js environment to create, read, and update RDF data stored in Virtuoso via its SPARQL endpoint.

1. Writing Data via SPARQL UPDATE

Use the fetch API to send a POST request with a SPARQL UPDATE query to Virtuoso's /sparql endpoint.

JavaScript
const endpoint = 'http://localhost:8890/sparql';

const updateQuery = `
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
INSERT DATA {
  GRAPH <https://example.com/graph/people> {
    <https://example.com/person/bob> foaf:name "Bob" .
  }
}`;

const response = await fetch(endpoint, {
  method: 'POST',
  headers: { 'Content-Type': 'application/sparql-update' },
  body: updateQuery
});

if (response.ok) {
    console.log('Data inserted successfully.');
}
2. Reading Data via SPARQL SELECT

Send a GET request with a URL-encoded SPARQL SELECT query. Set the Accept header to application/sparql+json to receive results as JSON.

JavaScript
const selectQuery = `
SELECT ?name
FROM <https://example.com/graph/people>
WHERE { ?person foaf:name ?name }`;

const url = `http://localhost:8890/sparql?query=${encodeURIComponent(selectQuery)}`;

const response = await fetch(url, {
  headers: { 'Accept': 'application/sparql-results+json' }
});

const json = await response.json();
console.log(JSON.stringify(json, null, 2));
3. Loading Data into an rdflib.js Store

Execute a SPARQL CONSTRUCT query to fetch a graph from Virtuoso and then parse the Turtle response into a local rdflib.js graph object for client-side manipulation.

JavaScript
import * as $rdf from 'rdflib';

const store = $rdf.graph();
const graphUri = 'https://example.com/graph/people';

const constructQuery = `CONSTRUCT { ?s ?p ?o } FROM <${graphUri}> WHERE { ?s ?p ?o }`;
const url = `http://localhost:8890/sparql?query=${encodeURIComponent(constructQuery)}`;

const response = await fetch(url, { headers: { 'Accept': 'text/turtle' } });
const ttlData = await response.text();

$rdf.parse(ttlData, store, graphUri, 'text/turtle');

console.log(`Loaded ${store.length} triples into the local store.`);
Complete Runnable Example

A Node.js script to perform a SPARQL Select. Ensure node-fetch is available if using older Node versions.

JavaScript
async function runExample() {
    const sparqlEndpoint = 'http://localhost:8890/sparql';
    const query = `
        SELECT DISTINCT ?Concept WHERE {[] a ?Concept} LIMIT 5
    `;

    const url = `${sparqlEndpoint}?query=${encodeURIComponent(query)}`;
    console.log(`Fetching from ${sparqlEndpoint}...`);

    try {
        const response = await fetch(url, {
            method: 'GET',
            headers: {
                'Accept': 'application/sparql-results+json'
            }
        });

        if (!response.ok) {
            throw new Error(`HTTP error! status: ${response.status}`);
        }

        const data = await response.json();
        console.log("Query Results:");
        data.results.bindings.forEach(binding => {
            console.log(JSON.stringify(binding));
        });
    } catch (error) {
        console.error("Error executing query:", error);
    }
}

runExample();

Frequently Asked Questions

What is the difference between SQL and SPARQL?

SQL is designed for Relational Database Management Systems (RDBMS) where data is stored in tables with fixed schemas. SPARQL is the standard query language for Graph databases like Virtuoso, allowing you to query relationships across flexible, schema-less data structures represented as RDF triples.

Do I need to install Virtuoso locally to use these examples?

No. While local installation is common for development, these drivers can connect to any accessible Virtuoso instance (remote or local) provided you have the correct host, port, and network access.

How do I handle authentication securely?

For basic connections, avoid hardcoding credentials by using environment variables. However, for robust security, the Virtuoso Commercial Edition offers a comprehensive, multi-protocol authentication layer. It supports modern standards including Digest Authentication, WebAuthN and Passkeys, OpenID Connect (OIDC) with OAuth 2.0, standard OAuth 2.0, as well as NetID+TLS and WebID+TLS for certificate-based identity.

How are Knowledge Graphs protected in Virtuoso?

Using fine-grained Attribute-Based Access Control (ABAC), i.e., sophisticated access policies described directly in RDF, allowing for dynamic and context-aware security rules at the data level.

Why should I use the Bulk Loader instead of SPARQL INSERT?

SPARQL INSERT is transactional and logged, making it slower for massive datasets. The Bulk Loader bypasses some of this overhead for high-performance data ingestion.

Can I use these libraries with the Open Source edition of Virtuoso?

Yes, the client libraries and APIs described in this guide are compatible with both the Open Source and Commercial editions of Virtuoso.

Glossary

RDF (Resource Description Framework)

The standard data model for data interchange on the Web, representing data as subject-predicate-object triples.

SPARQL

The standard query language and protocol for Linked Open Data and RDF databases.

Triple

The atomic data entity in RDF, consisting of a Subject, Predicate, and Object.

IRI (Internationalized Resource Identifier)

A unique identifier for a resource, similar to a URL but capable of handling international characters.

Graph

A collection of triples. In Virtuoso, a "Named Graph" is a way to compartmentalize data.

Endpoint

A network service (usually HTTP) that accepts SPARQL queries and returns results.

Linked Data

A method of publishing structured data so that it can be interlinked and become more useful.

Virtuoso Installation, Usage & Connectivity Guides