Deploy your own queryable knowledge graph using Virtuoso PAGO on Microsoft Azure
A cloud-hosted, Pay-As-You-Go (PAGO) edition of a preconfigured Virtuoso instance that includes a pre-loaded and optimized DBpedia Knowledge Graph.
Accessing the public DBpedia endpoint is subject to a "Fair Use" policy, which can limit query complexity and frequency, hindering the development of large-scale or high-performance applications.
The DBpedia Snapshot on Azure provides a dedicated, high-performance Virtuoso instance with a pre-loaded DBpedia knowledge graph, removing public endpoint limitations.
Complete DBpedia dataset ready for SPARQL queriesโno data loading or configuration required.
Virtuoso is pre-configured for optimal RDF data querying, with clear guides for performance tuning.
Pay only for what you useโdeploy Virtual Machines on demand and stop or de-allocate them when finished.
Core components powering the DBpedia Snapshot on Azure
The largest collaborative and freely available structured data source extracted from Wikipedia, containing semantic information about millions of entities.
A high-performance, multi-model database providing native RDF storage, SPARQL query processing, and linked data publishing.
A global cloud platform providing Virtual Machines, scalable storage, and a marketplace for pre-configured application offerings.
Quick start guide to instantiate your DBpedia instance from the Azure Marketplace
Ensure you have an active Azure Cloud subscription account.
In the Azure Marketplace, search for "DBpedia" to locate the PAGO offer and click "GET IT NOW".
Click "Create" and configure the basic settings for your Virtual Machine, including Resource Group, VM name, Region, and your SSH public key.
Accept the defaults for Disks and Networking (or customize as needed), then click "Review + create". Once validation passes, click "Create" to start the deployment.
When deployment is complete, go to the resource page to find the Public IP address, which you will use to connect via SSH and access web interfaces.
Initial setup and ongoing server management via SSH
Access your VM instance via SSH using your key and the public IP address:
ssh -i {your-pem-file} azureuser@{Public IP address}
It is strongly recommended to run system updates after connecting:
sudo apt-get upgrade
Retrieve the randomly generated initial password for the dba user:
sudo cat /opt/virtuoso/database/.initial-password
Connect to the Virtuoso SQL command-line interface on port 1111:
/opt/virtuoso/bin/isql 1111
Control the Virtuoso service using these commands:
# Start the service
sudo service virtuoso start
# Stop the service
sudo service virtuoso stop
# Restart the service
sudo service virtuoso restart
# Check the current status
sudo service virtuoso status
Access your DBpedia instance through multiple web interfaces
http://{ip-address}/resource/DBpedia
Browse DBpedia entities as Linked Data.
http://{ip-address}/fct
Navigate DBpedia using interactive faceted search.
http://{ip-address}/conductor
Web-based administration UI for Virtuoso.
Tuning Virtuoso for optimal query performance on your Azure VM
Edit /opt/virtuoso/database/virtuoso.ini and increase the NumberOfBuffers
and MaxDirtyBuffers parameters to match your VM's available memory. Below are some
examples:
# For B2MS (8 GB RAM)
NumberOfBuffers = 680000
MaxDirtyBuffers = 500000
# For B4MS (16 GB RAM)
NumberOfBuffers = 1360000
MaxDirtyBuffers = 1000000
After saving your changes, restart the Virtuoso service for them to take effect:
sudo service virtuoso restart
Steps to follow if the Virtuoso server fails to start
Run sudo service virtuoso status to see details from the service manager.
Check the log file at /opt/virtuoso/database/virtuoso.log for specific error
messages.
Ensure that a stale lock file does not exist. If it does, remove it:
sudo rm /opt/virtuoso/database/virtuoso.lck
Try starting the server again: sudo service virtuoso start
Check the status again. If it is running, attempt to connect via the web or SQL interfaces.
Common questions about the DBpedia Snapshot on Azure
It's a Pay-As-You-Go (PAGO) offering on the Azure Marketplace that provides a pre-configured Virtuoso instance with the DBpedia 2025-06 Snapshot Knowledge Graph already loaded and optimized.
It's designed for architects, system integrators, and developers who need a high-performance, scalable, and private instance of the DBpedia Knowledge Graph for their applications or services.
The only prerequisite is an active Microsoft Azure cloud subscription.
Once your VM is running, navigate to http://{Your-Public-IP-Address}/sparql in a
web browser. You can write and execute SPARQL queries directly or send HTTP requests to this
URL from your application.
After connecting to your VM via SSH, run the following command to view the randomly generated
password: sudo cat /opt/virtuoso/database/.initial-password
For best performance, you should edit the virtuoso.ini configuration file and
increase the values for NumberOfBuffers and MaxDirtyBuffers to
match the RAM available in your chosen Azure VM size. See the Performance Tuning section for examples.
Please follow the steps outlined in the Troubleshooting section. This involves checking the service status and log files, ensuring no lock file is present, and attempting a restart.
Key concepts and terminology
A cloud computing platform by Microsoft for building, testing, deploying, and managing applications and services through Microsoft-managed data centers.
A pricing model where you are charged only for the resources you consume, allowing you to scale costs with usage without upfront commitments.
A large-scale, community-driven knowledge base extracted from Wikipedia. It contains structured data about millions of entities in RDF format.
An enterprise-grade, multi-model database that provides native RDF storage, SPARQL query support, and Linked Data publishing capabilities.
A web service that accepts SPARQL queries and returns structured results, enabling programmatic access to RDF knowledge graphs.
A network of real-world entities (like people, places, and events) and the relationships between them, typically stored as an RDF graph.
Official documentation, guides, and related links