Virtuoso RDF Bulk Loader "with_delete" option
What
The Virtuoso RDF Bulk Loader provides a special option called "with_delete" that can be use in place of the graph name (i.e.
ll
Why
This option is requirement to provider high performance loading of bulk updates of graphs on a par with the bulk insertion of the same/similar data.
Prerequisites
- A Virtuoso commercial release 6.5.xxxx or greater is required
- The "with_delete" option only work with NQUAD datasets where the graph name is specified in the dataset and this is a cluster only feature currently, although shall be made to work on single server instances also.
- Ensure the Virtuoso server is running with a Default transaction isolation level of 2, read committed, by adding the following setting to the "[Parameters]" section of the Virtuoso configuration file and restart the Virtuoso server:
DefaultIsolation = 2
The following lock mode settings should be set before using the "with_delete" option:
cl_exec ('__dbf_set (''lock_escalation_pct'', 200)'); cl_exec ('__dbf_set (''enable_distinct_key_dup_no_lock'', 1)');
How
Using the ld
Note that all RDF loader threads can be stopped using the following command at which point all currently running threads will be allowed to complete and then exit:
rdf_load_stop()
Limitation on use
The following points should be noted:
- The datasets files cannot contain multiple graphs of the same but different triples within to be loaded as this will result in unpredictable triple counts depending which datasets files is being loaded on a given thread, which is un-deterministic.
- The following command can be used to create a diagnostic log of the "with_delete" activity, writing it to a log file called "g_log.txt" on each cluster instance, for analysis.
cl_exec ('__dbf_set (''enable_g_replace_log'',1)')