Virtuoso RDF Bulk Loader "with_delete" option
What
The Virtuoso RDF Bulk Loader provides a special option called "with_delete" that can be use in place of the graph name (i.e.
ll
Why
This option is requirement to provide high performance loading of data in bulk that includes application of updates, inserts, and deletions on existing graphs while remaining on a par with the bulk insertion of the same/similar data.
Note the "with_delete" option only work with NQUAD datasets where the graph name is specified in the dataset and this is a cluster only feature.
Prerequisites
- A Virtuoso commercial release 06.04.3134 or greater is required
- The "with_delete" option only work with NQUAD datasets where the graph name is specified in the dataset.
- Ensure the Virtuoso server is running with a Default transaction isolation level of 2, read committed, by adding the following setting to the "[Parameters]" section of the Virtuoso configuration file and restart the Virtuoso server:
DefaultIsolation = 2
The following lock mode settings should be set before using the "with_delete" option:
cl_exec ('__dbf_set (''lock_escalation_pct'', 200)'); cl_exec ('__dbf_set (''enable_distinct_key_dup_no_lock'', 1)');
How
Using the ld
Note that all RDF loader threads can be stopped using the following command at which point all currently running threads will be allowed to complete and then exit:
rdf_load_stop()
Limitation on use
The following points should be noted:
- When using the "with_delete" option there has to be enough memory allocated to Virtuoso based on the calculation of: 200 bytes per quad. This is relevant particularly when loading larger graphs which can have a significant impact on the memory requirements when loading such graphs.
- The datasets files cannot contain multiple graphs of the same but different triples within to be loaded as this will result in unpredictable triple counts depending which datasets files is being loaded on a given thread, which is un-deterministic.
- The following command can be used to create a diagnostic log of the "with_delete" activity, writing it to a log file called "g_log.txt" on each cluster instance, for analysis.
cl_exec ('__dbf_set (''enable_g_replace_log'',1)')