Bulk Loading RDF Source Files into one or more Graph IRIs
This document details how large RDF large data sets files can be bulk uploaded into Virtuoso. The data sets can consist of multiple files loaded into a single or multiple graphs. Note that before loading large datasets, the Virtuoso Server should be configured to use sufficient memory and other system resources as detailed in the Virtuoso RDF Performance Tuning Guide, or the load may take an unacceptably long time.
- If your Virtuoso release is prior to the commercial
06.02.3129or open source6.1.3releases then the Virtuoso Bulk Loader functions need to be loaded manually. - Register the file(s) to be loaded by running the
ld_dir(loads from specified directory) orld_dir_all(loads from specified directory and all its sub-directories) functions fromisql:
SQL> ld_dir ('<source-filename-or-directory>', '<file name pattern>', 'graph iri');
— or —
SQL> ld_dir_all ('<source-filename-or-directory>', '<file name pattern>', 'graph iri');
— e.g., —
SQL> ld_dir ('/path/to/files', '*.n3', 'http://dbpedia.org');
- The name of the RDF graph into which the datasets should be loaded can be specified through a text file placed in the same source directory as the source data files, which will override the graph name specified in the
ld_dir()orld_dir_all()function call. The content of a file with the same name as a data file which with the.graphfilename extension will be used for that data file. The content of a file namedglobal.graphwill be used for any and all other data files in that directory. Note: if the third parameter (graph_iri) ofld_dir()orld_dir_all()isnull, any datasets that do not have a.graphfile will not be loaded.
<source-file>.<ext> <source-file>.<ext>.graph
— e.g., —
myfile.n3 ;; RDF data myfile.n3.graph ;; Contains Graph IRI name into which RDF data from myfile.n3 will be loaded global.graph ;; Contains Graph IRI name into which RDF data from any files that do not have a specific graph name file will be loaded
- Place the graph IRI in the file, e.g.,
http://dbpedia.org. - Finally, perform Bulk Data Load by executing:
SQL> rdf_loader_run ();
- The table
DB.DBA.load_listcan be used to check the list of datasets loaded and the graph IRIs into which they have been loaded:
SQL> select * from DB.DBA.load_list; ll_file ll_graph ll_state ll_started ll_done ll_host ll_work_time ll_error VARCHAR NOT NULL VARCHAR INTEGER TIMESTAMP TIMESTAMP INTEGER INTEGER VARCHAR _____________________________________________________________________________________________________________________________________ ./dump/d1/file1.n3 http://file1 2 2010.10.20 9:21.18 0 2010.10.20 9:21.18 0 0 NULL NULL ./dump/d2/file2.n3 http://file2 2 2010.10.20 9:21.18 0 2010.10.20 9:21.18 0 0 NULL NULL ./dump/file.n3 http://file 2 2010.10.20 9:21.18 0 2010.10.20 9:21.18 0 0 NULL NULL 3 Rows. -- 1 msec. SQL>