Attributes | Values |
---|
has container
| |
Date Created
| |
maker
| |
topic
| |
described by
| |
seeAlso
| |
Date Modified
| |
link
| |
id
| - f3e652f9fa1120501e5749b9909652ac
|
content
| - %META:TOPICPARENT{name="VirtBulkRDFLoader"}%
---++Virtuoso RDF Bulk Loader "with_delete" option
---+++What
The Virtuoso RDF Bulk Loader provides a special option called "with_delete" that can be use in place of the graph name (i.e. ll<nop>_graph in the load<nop>_list table), for force a given data graph name specified in the specified NQUAD file/dataset to be reloaded i.e. updated with any new triples if the graph already exists in the database.
---+++Why
This option is requirement to provide high performance loading of data in bulk that includes application of updates, inserts, and deletions on existing graphs while remaining on a par with the bulk insertion of the same/similar data.
Note the "with_delete" option only work with NQUAD datasets where the graph name is specified in the dataset.
---+++Prerequisites
* A Virtuoso commercial release 06.04.3134 or greater is required.
* The "with_delete" option is available in cluster mode only in release 6.x and in both cluster and single server mode in release 7.x .
* The "with_delete" option only work with NQUAD datasets where the graph name is specified in the dataset.
* Ensure the Virtuoso server is running with a [[http://docs.openlinksw.com/virtuoso/databaseadmsrv.html#configsrvstupfiles][Default transaction isolation level]] of 2, read committed, by adding the following setting to the "[Parameters]" section of the Virtuoso configuration file and restart the Virtuoso server:
<verbatim>
DefaultIsolation = 2
</verbatim>
The following lock mode settings should be set before using the "with_delete" option:
<verbatim>
cl_exec ('__dbf_set (''lock_escalation_pct'', 200)');
cl_exec ('__dbf_set (''enable_distinct_key_dup_no_lock'', 1)');
</verbatim>
---+++How
Using the ld<nop>_dir() or ld<nop>_dir<nop>_all() commands set the "ll<nop>_graph" column of the "load<nop>_list" table to "with<nop>_delete" for each dataset file specified in "ll<nop>_file" that is known to require an update/reload. Once all are set run the "rdf<nop>_loader<nop>_run()" or "cl<nop>_exec('rdf<nop>_ld<nop>_srv()')" commands to enable the update/reload to commence. As many "rdf<nop>_loader<nop>_run()" or "cl<nop>_exec('rdf<nop>_ld<nop>_srv()')" commands can be invoked as threads/cores are available across the machines the Virtuoso cluster is being run on for fast parallel loading of the datasets, as would typically be done for the initial bulk load of the datasets.
Note that all RDF loader threads can be stopped using the following command at which point all currently running threads will be allowed to complete and then exit:
<verbatim>
rdf_load_stop()
</verbatim>
---+++Limitation on use
The following points should be noted:
* When using the "with_delete" option there has to be enough memory allocated to Virtuoso based on the calculation of: 200 bytes per quad. This is relevant particularly when loading larger graphs which can have a significant impact on the memory requirements when loading such graphs.
* The datasets files cannot contain multiple graphs of the same but different triples within to be loaded as this will result in unpredictable triple counts depending which datasets files is being loaded on a given thread, which is un-deterministic.
* The following command can be used to create a diagnostic log of the "with_delete" activity, writing it to a log file called "g_log.txt" on each cluster instance, for analysis.
<verbatim>
cl_exec ('__dbf_set (''enable_g_replace_log'',1)')
</verbatim>
---+++Related
* [[VirtBulkRDFLoader][Virtuoso RDF Bulk Loader]]
|
Title
| - VirtRDFBulkLoaderWithDelete
|
has creator
| |
is described using
| |
atom:source
| |
atom:updated
| |
atom:title
| - VirtRDFBulkLoaderWithDelete
|
links to
| |
atom:author
| |
label
| - VirtRDFBulkLoaderWithDelete
|
topic
| |
atom:published
| |
type
| |
is topic
of | |
is interest
of | |