Virtuoso Open-Source Wiki
Virtuoso Open-Source, OpenLink Data Spaces, and OpenLink Ajax Toolkit
Advanced Search
Help?
Location: / Dashboard / Main / VirtBulkRDFLoader

Bulk Loading RDF Source Files into one or more Graph IRIs

This document details how large RDF large data sets files can be bulk uploaded into Virtuoso. The data sets can consist of multiple files loaded into a single or multiple graphs. Note that before loading large datasets, the Virtuoso Server should be configured to use sufficient memory and other system resources as detailed in the Virtuoso RDF Performance Tuning Guide, or the load may take an unacceptably long time.

  1. If your Virtuoso release is prior to the commercial 06.02.3129 or open source 6.1.3 releases then the Virtuoso Bulk Loader functions need to be loaded manually.
  2. Register the file(s) to be loaded by running the ld_dir (loads from specified directory) or ld_dir_all (loads from specified directory and all its sub-directories) functions from isql :

    SQL> ld_dir ('<source-filename-or-directory>', '<file name pattern>', 'graph iri');

    — or —

    SQL> ld_dir_all ('<source-filename-or-directory>', '<file name pattern>', 'graph iri');

    — e.g., —

    SQL> ld_dir ('/path/to/files', '*.n3', 'http://dbpedia.org');

  3. The name of the RDF graph into which the datasets should be loaded can be specified through a text file placed in the same source directory as the source data files, which will override the graph name specified in the ld_dir() or ld_dir_all() function call. The content of a file with the same name as a data file which with the .graph filename extension will be used for that data file. The content of a file named global.graph will be used for any and all other data files in that directory. Note: if the third parameter (graph_iri) of ld_dir() or ld_dir_all() is null, any datasets that do not have a .graph file will not be loaded.

    <source-file>.<ext> <source-file>.<ext>.graph

    — e.g., —

    myfile.n3 ;; RDF data myfile.n3.graph ;; Contains Graph IRI name into which RDF data from myfile.n3 will be loaded global.graph ;; Contains Graph IRI name into which RDF data from any files that do not have a specific graph name file will be loaded

  4. Place the graph IRI in the file, e.g., http://dbpedia.org .
  5. Finally, perform Bulk Data Load by executing:

    SQL> rdf_loader_run ();

  6. The table DB.DBA.load_list can be used to check the list of datasets loaded and the graph IRIs into which they have been loaded:

    SQL> select * from DB.DBA.load_list; ll_file ll_graph ll_state ll_started ll_done ll_host ll_work_time ll_error VARCHAR NOT NULL VARCHAR INTEGER TIMESTAMP TIMESTAMP INTEGER INTEGER VARCHAR _____________________________________________________________________________________________________________________________________ ./dump/d1/file1.n3 http://file1 2 2010.10.20 9:21.18 0 2010.10.20 9:21.18 0 0 NULL NULL ./dump/d2/file2.n3 http://file2 2 2010.10.20 9:21.18 0 2010.10.20 9:21.18 0 0 NULL NULL ./dump/file.n3 http://file 2 2010.10.20 9:21.18 0 2010.10.20 9:21.18 0 0 NULL NULL 3 Rows. -- 1 msec. SQL>

Related

Powered By Virtuoso