Virtuoso Open-Source Wiki
Virtuoso Open-Source, OpenLink Data Spaces, and OpenLink Ajax Toolkit
Advanced Search
Help?
Location: / Dashboard / Main / VirtBulkRDFLoader / VirtRDFBulkLoaderWithDelete

Virtuoso RDF Bulk Loader "with_delete" option

What

The Virtuoso RDF Bulk Loader provides a special option called "with_delete" that can be use in place of the graph name (i.e. ll_graph in the load_list table), for force a given data graph name specified in the specified NQUAD file/dataset to be reloaded i.e. updated with any new triples if the graph already exists in the database.

Why

This option is requirement to provider high performance loading of bulk updates of graphs on a par with the bulk insertion of the same/similar data.

Prerequisites

  • A Virtuoso commercial release 6.5.xxxx or greater is required
  • The "with_delete" option only work with NQUAD datasets where the graph name is specified in the dataset and this is a cluster only feature currently, although shall be made to work on single server instances also.
  • Ensure the Virtuoso server is running with a Default transaction isolation level of 2, read committed, by adding the following setting to the "[Parameters]" section of the Virtuoso configuration file and restart the Virtuoso server:

    DefaultIsolation = 2

    The following lock mode settings should be set before using the "with_delete" option:

    cl_exec ('__dbf_set (''lock_escalation_pct'', 200)'); cl_exec ('__dbf_set (''enable_distinct_key_dup_no_lock'', 1)');

How

Using the ld_dir() or ld_dir_all() commands set the "ll_graph" column of the "load_list" table to "with_delete" for each dataset file specified in "ll_file" that is known to require an update/reload. Once all are set run the "rdf_loader_run()" or "cl_exec('rdf_ld_srv()')" commands to enable the update/reload to commence. As many "rdf_loader_run()" or "cl_exec('rdf_ld_srv()')" commands can be invoked as threads/cores are available across the machines the Virtuoso cluster is being run on for fast parallel loading of the datasets, as would typically be done for the initial bulk load of the datasets.

Note that all RDF loader threads can be stopped using the following command at which point all currently running threads will be allowed to complete and then exit:

rdf_load_stop()

Limitation on use

The following points should be noted:

  • The datasets files cannot contain multiple graphs of the same but different triples within to be loaded as this will result in unpredictable triple counts depending which datasets files is being loaded on a given thread, which is un-deterministic.
  • The following command can be used to create a diagnostic log of the "with_delete" activity, writing it to a log file called "g_log.txt" on each cluster instance, for analysis.

    cl_exec ('__dbf_set (''enable_g_replace_log'',1)')

Related

Powered By Virtuoso