www.openlinksw.com
docs.openlinksw.com

Book Home

Contents
Preface

RDF Data Access and Data Management

Data Representation
SPARQL
Extensions
RDF Graphs Security
Linked Data Views over RDBMS Data Source
Automated Generation of RDF Views over Relational Data Sources
Examples of Linked Data Views
RDF Insert Methods in Virtuoso
RDFizer Middleware (Sponger)
Virtuoso Faceted Browser Installation and configuration
Virtuoso Faceted Web Service
Linked Data
Inference Rules & Reasoning
RDF and Geometry
RDF Performance Tuning
General RDF Index Scheme Index Scheme Selection Erroneous Cost Estimates and Explicit Join Order Get All Graphs Dump and Reload Graphs Dump RDF View Graph to n3 Loading RDF Using SPARUL DBpedia Benchmark RDF Store Benchmarks Fast Approximate RDF Graph Diff and Patch RDB2RDF Triggers
RDF Data Access Providers (Drivers)
RDF Graph Replication

14.15. RDF Performance Tuning

For RDF query performance, we have the following possible questions:

14.15.1. General

When running with large data sets, one should configure the Virtuoso process to use between 2/3 to 3/5 of system RAM and to stripe storage on all available disks. See NumberOfBuffers, MaxDirtyBuffers, and Striping INI file parameters.

; default installation
NumberOfBuffers          = 2000
MaxDirtyBuffers          = 1200	

Typical sizes for the NumberOfBuffers and MaxDirtyBuffers (3/4 of NumberOfBuffers) parameters in the Virtuoso configuration file (virtuoso.ini) for various memory sizes are as follows, with each buffer consisting of 8K bytes:

Table: 14.15.1.1. recommended NumberOfBUffers and MaxDirtyBuffers
System RAM NumberOfBuffers MaxDirtyBuffers
2 GB 170000 130000
4 GB 340000 250000
8 GB 680000 500000
16 GB 1360000 1000000
32 GB 2720000 2000000
48 GB 4000000 3000000
64 GB 5450000 4000000


14.15.2. RDF Index Scheme

Starting with version 6.00.3126 the default RDF index scheme consists of 2 full indices over RDF quads plus 3 partial indices. This index scheme is generally adapted to all kinds of workloads, regardless of whether queries generally specify a graph.

Alternate indexing schemes are possible but will not be generally needed. For upgrading old databases with a different index scheme see the corresponding documentation.

The index scheme consists of the following indices:

This index scheme is created by the following statements:

create table DB.DBA.RDF_QUAD (
  G IRI_ID_8,
  S IRI_ID_8,
  P IRI_ID_8,
  O any,
  primary key (P, S, O, G)
  )
alter index RDF_QUAD on DB.DBA.RDF_QUAD partition (S int (0hexffff00));

create distinct no primary key ref bitmap index RDF_QUAD_SP on RDF_QUAD (S, P) partition (S int (0hexffff00));
create bitmap index RDF_QUAD_POGS on RDF_QUAD (P, O, G, S) partition (O varchar (-1, 0hexffff));
create distinct no primary key ref bitmap index RDF_QUAD_GS on RDF_QUAD (G, S) partition (S int (0hexffff00));
create distinct no primary key ref index RDF_QUAD_OP on RDF_QUAD (O, P) partition (O varchar (-1, 0hexffff));
;

The idea is to favor queries where the predicate is specified in triple patterns. The entire quad can be efficiently accessed when P and either or both S and O are known. This has the advantage of clustering data by the predicate which improves working set: A page read from disk will only have entries pertaining to the same predicate, chances of accessing other entries of the page are thus higher than if the page held values for arbitrary predicates. For less frequent cases where only S is known, as in describe, the distinct P's of the S are found in the SP index. These SP pairs are then used for accessing the PSOG index to get the O and G. For cases where only the G is known, as in dropping a graph, the distinct S's of the G are found in the GS index. The P's of the S are then found in the SP index. After this the whole quad is found in the PSOG index.

The SP, OP and GS indices do not store duplicates. If a S has many values of the P, there is only one entry. Extries are not deleted from SP, OP or GS. This does not lead to erroneous results since a full index, that is either POSG or PSOG is always consulted in order to know if a quad actually exists. For cases of updating data, most often a graph is entirely dropped and a substantially similar graph is inserted in its place. The SP, OP and GS indices get to stay relatively unaffected.

Still over time specially if there are frequent updates and values do not repeat between consecutive states the SP, OP and GS indices will get polluted which may affect performance. Dropping and recreating the index will then remedy the situation.

For cases where this is not practical the index scheme should only have full indices, i.e. each key holds all columns of the primary key of the quad. This will be the case if the distinct no primary key ref options are not specified in the create index statement. In such cases all indices remain in strict sync across deletes.

Many RDF workloads have bulk load and read intensive access patterns with few deletes. The default index scheme is optimized for these. With these situations this scheme offers significant space savings resulting in better working set. Typically this layout takes 60-70% of the space of a layout with 4 full indices.


14.15.3. Index Scheme Selection

The indexes in place on the RDF_QUAD table can greatly affect the performance of SPARQL queries, as can be determined by running the STATISTICS command on the table as follows:

SQL> STATISTICS DB.DBA.RDF_QUAD;
Showing SQLStatistics of table(s) 'DB.DBA.RDF_QUAD'
TABLE_QUALIFIER  TABLE_OWNER      TABLE_NAME       NON_UNIQUE  INDEX_QUALIFIER  INDEX_NAME       TYPE        SEQ_IN_INDEX  COLUMN_NAME      COLLATION  CARDINALITY  PAGES       FILTER_CONDITION
VARCHAR          VARCHAR          VARCHAR          SMALLINT    VARCHAR          VARCHAR          SMALLINT    SMALLINT    VARCHAR          VARCHAR  INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

DB               DBA              RDF_QUAD         NULL        NULL             NULL             0           NULL        NULL             NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           1           P                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           2           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           3           O                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         0           DB               RDF_QUAD         3           4           G                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GS      3           1           G                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_GS      3           2           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_OP      3           1           O                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_OP      3           2           P                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           1           P                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           2           O                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           3           G                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_POGS    3           4           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_SP      3           1           S                NULL     NULL        NULL        NULL
DB               DBA              RDF_QUAD         1           DB               RDF_QUAD_SP      3           2           P                NULL     NULL        NULL        NULL

15 Rows. -- 24 msec.
SQL> 	

With only one index (OGPS) created by default, if the graph is always given, as with one or more FROM or FROM NAMED clauses, and there are no patterns where only graph and predicate are given, then the default indices should be sufficient. If predicate and graph are given but subject is not, then it is sometimes useful to add:

CREATE BITMAP INDEX RDF_QUAD_PGOS
  ON DB.DBA.RDF_QUAD (G, P, O, S) 
  PARTITION (O VARCHAR (-1, 0hexffff));
Note:

Note: If the server version is pre-5.0.7, leave out the partitioning clause.

Making the PGOS index can help in some cases even if it is not readily apparent from the queries that one is needed. This is so, for example, if the predicate by itself is selective, i.e. there is a predicate that occurs in only a few triples.

If the graph itself is not given in the queries, then the default index scheme will be unworkable. For this, the appropriate scheme is:

CREATE TABLE RDF_QUAD
  ( 
    G IRI_ID_8, 
    S IRI_ID_8, 
    P IRI_ID_8, 
    O ANY, 
    PRIMARY KEY (S, P, O, G)
  )
ALTER INDEX RDF_QUAD 
  ON RDF_QUAD 
  PARTITION (S INT (0hexffff00));

CREATE BITMAP INDEX RDF_QUAD_OPGS
  ON DB.DBA.RDF_QUAD (O, P, G, S) 
  PARTITION (O VARCHAR (-1, 0hexffff));

CREATE BITMAP INDEX RDF_QUAD_POGS
  ON DB.DBA.RDF_QUAD (P, O, G, S)
  PARTITION (O VARCHAR (-1, 0hexffff));

CREATE BITMAP INDEX RDF_QUAD_GPOS 
  ON DB.DBA.RDF_QUAD (G, P, O, S)
  PARTITION (O VARCHAR (-1, 0hexffff));
Note:

Note: If the server version is pre-5.0.7, leave out the PARTITION clauses and the ALTER INDEX statement.

Public web service endpoints are proven to be sources of especially bad queries. While local application develpers can obtain instructions from database administrator and use ISQL access to the database in order to tune execution plans, "external" clients do not know details of configuration and/or lacks appropriate skills. The most common problem is that public endpoints usually get requests that does not mention the required graph, because that queries were initially written for use with triple stores. If the web service provides access to a single graph (or to a short list of graphs) then it is strongly recommended to configure it by adding a row into DB.DBA.SYS_SPARQL_HOST:

CREATE TABLE "DB"."DBA"."SYS_SPARQL_HOST"
(
  "SH_HOST" VARCHAR,         -- host pattern
  "SH_GRAPH_URI" VARCHAR,    -- default graph
  "SH_USER_URI" VARCHAR,     -- reserved for any use in applications
  "SH_DEFINES" LONG VARCHAR, -- list of 'define xx:yy "zz" ' defines as a string 
  PRIMARY KEY ("SH_HOST")
);

You can find detailed descriptions of the table columns here.

The idea is that if the client specifies default graph in the request or uses named graphs and group graph patterns then he is probably smarter than average and will provide meaningful queries. If no graph names are specified then the query will benefit from preset graph because this will give the compiler some more indexes to choose from -- indexes that begin with G.

Sometimes web service endpoint is used to access data of only one application, not all data in the system. In that case one may wish to declare a separate storage that consists of only RDF Views made by that application and define input:storage in appropriate row of DB.DBA.SYS_SPARQL_HOST.


14.15.4. Erroneous Cost Estimates and Explicit Join Order

The selectivity of triple patterns is determined at query compile time from sampling the data. It is possible that misleading data is produced. To see if the cardinality guesses are generally valid, look at the query plan with explain ().

Below is a sample from the LUBM qualification data set in the Virtuoso distribution. After running make test in binsrc/test/lubm, there is a loaded database with the data. Start a server in the same directory to see the data.

SQL> EXPLAIN 
  ('SPARQL 
  PREFIX  ub:  <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
  SELECT *
  FROM <lubm>
  WHERE { ?x  rdf:type  ub:GraduateStudent }
  ');

REPORT
VARCHAR
_______________________________________________________________________________

{

Precode:
      0: $25 "callret" := Call __BOX_FLAGS_TWEAK (<constant (lubm)>, <constant (1)>)
      5: $26 "lubm" := Call DB.DBA.RDF_MAKE_IID_OF_QNAME_SAFE ($25 "callret")
      12: $27 "callret" := Call __BOX_FLAGS_TWEAK (<constant (http://www.w3.org/1999/02/22-rdf-syntax-ns#type)>, <constant (1)>)
      17: $28 "-ns#type" := Call DB.DBA.RDF_MAKE_IID_OF_QNAME_SAFE ($27 "callret")
      24: $29 "callret" := Call __BOX_FLAGS_TWEAK (<constant (http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#GraduateStudent)>, <constant (1)>)
      29: $30 "owl#GraduateStudent" := Call DB.DBA.RDF_MAKE_IID_OF_QNAME_SAFE ($29 "callret")
      36: BReturn 0
from DB.DBA.RDF_QUAD by RDF_QUAD_OGPS    1.9e+03 rows
Key RDF_QUAD_OGPS  ASC ($32 "s-3-1-t0.S")
<col=415 O = $30 "owl#GraduateStudent"> , <col=412 G = $26 "lubm"> , <col=414 P = $28 "-ns#type">
row specs: <col=415 O LIKE <constant (T)>>

Current of: <$34 "<DB.DBA.RDF_QUAD s-3-1-t0>" spec 5>

After code:
      0: $35 "x" := Call ID_TO_IRI ($32 "s-3-1-t0.S")
      5: BReturn 0
Select ($35 "x", <$34 "<DB.DBA.RDF_QUAD s-3-1-t0>" spec 5>)
}

22 Rows. -- 1 msec.

This finds the graduate student instances in the LUBM graph. First the query converts the IRI literals to IDs. Then, using a match of OG on OGPS, it finds the IRIs of the graduate students. Then, it converts the IRI ID to return to the string form.

The cardinality estimate of 1.9e+03 rows is on the FROM line.

Doing an EXPLAIN() on the queries will show the cardinality estimates. To drill down further, one can split the query into smaller chunks and see the estimates for these, up to doing it at the triple pattern level. To indicate a variable that is bound but whose value is not a literal known at compile time, one can use the parameter marker ??.

SQL> EXPLAIN 
  ('
      SPARQL 
      DEFINE  sql:table-option "order"  
      PREFIX  ub:  <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
      SELECT *
      FROM <lubm>
      WHERE { ?x  rdf:type  ?? }
  ');

This will not know the type but will know that a type will be provided. So instead of guessing 1900 matches, this will guess a smaller number, which is obviously less precise. Thus literals are generally better.

In some cases, generally to work around an optimization error, one can specify an explicit JOIN order. This is done with the sql:select-option "order" clause in the SPARQL query prefix.

SQL> SELECT SPARQL_to_sql_text 
  ('  
      DEFINE sql:select-option "order" 
      PREFIX  ub:  <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
      SELECT *
      FROM <lubm>
      WHERE
        {
          ?x  rdf:type        ub:GraduateStudent                                       .
          ?x  ub:takesCourse  <http://www.Department0.University0.edu/GraduateCourse0>
        }
  ');

shows the SQL text with the order option at the end.

If an estimate is radically wrong then this should be reported as a bug.

If there is a FROM with a KEY on the next line and no column specs then this is a full table scan. The more columns are specified the less rows will be passed to the next operation in the chain. In the example above, there are three columns whose values are known before reading the table and these columns are leading columns of the index in use so column specs are:

<col=415 O = $30 "owl#GraduateStudent"> , 
<col=412 G = $26 "lubm"> , 
<col=414 P = $28 "-ns#type">
Note:

Note: A KEY with only a row spec is a full table scan with the row spec applied as a filter. This is usually not good unless this is specifically intended.

If queries are compiled to make full table scans when this is not specifically intended, this should be reported as a bug. The explain () output and the query text should be included in the report.

Consider:

SQL> EXPLAIN 
  ('
      SPARQL 
      DEFINE sql:select-option "order, loop" 
      PREFIX  ub:  <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
      SELECT *
      FROM <lubm>
      WHERE
        {
          ?x  ub:takesCourse  ?c                  .
          ?x  rdf:type        ub:GraduateStudent
        }
  ');

One will see in the output that the first table access is to retrieve all in the LUBM graph which take some course and then later to check if this is a graduate student. This is obviously not the preferred order but the sql:select-option "order" forces the optimizer to join from left to right.

It is very easy to end up with completely unworkable query plans in this manner but if the optimizer really is in error, then this is the only way of overriding its preferences. The effect of sql:select-option is pervasive, extending inside unions, optionals, subqueries etc within the statement.

We note that if, in the above query, both the course taken by the student and the type of the student are given, the query compilation will be, at least for all non-cluster cases, an index intersection. This is not overridden by the sql:select-option clause since an index intersection is always a safe guess, regardless of the correctness of the cardinality guesses of the patterns involved.


14.15.5. Get All Graphs

In order to get all graphs URIs, one might use the Virtuoso DB.DBA.SPARQL_SELECT_KNOWN_GRAPHS() built-in function.


14.15.6. Dump and Reload Graphs

In order to dump all graphs with extension exclude those graphs of a certain type by using a sparql ask query, you may use the following script sequence:
CREATE PROCEDURE dump_graphs 
  ( IN  dir               VARCHAR  :=  'dumps'   , 
    IN  file_length_limit INTEGER  :=  1000000000
  )
{
    DECLARE inx INT;
  inx := 1;
    SET ISOLATION = 'uncommitted';
    FOR ( SELECT * 
            FROM ( SPARQL DEFINE input:storage "" 
                   SELECT DISTINCT ?g { GRAPH ?g { ?s ?p ?o } . 
                                        FILTER ( ?g != virtrdf: ) 
                                      } 
                 ) AS sub OPTION ( LOOP )) DO
      {
        dump_one_graph ( "g", 
                         sprintf ('%s/graph%06d_', dir, inx), 
                         file_length_limit
                       );
      inx := inx + 1;
    }
}
;


CREATE PROCEDURE dump_one_graph 
  ( IN  srcgraph           VARCHAR  , 
    IN  out_file           VARCHAR  , 
    IN  file_length_limit  INTEGER  := 1000000000
  )
{
    DECLARE  file_name  varchar;
    DECLARE  env, ses      any;
    DECLARE  ses_len, 
             max_ses_len, 
             file_len, 
             file_idx      integer;
    SET ISOLATION = 'uncommitted';
  max_ses_len := 10000000;
  file_len := 0;
  file_idx := 1;
    file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
    string_to_file ( file_name || '.graph', 
                     srcgraph, 
                     -2
                   );
    string_to_file ( file_name, 
                     sprintf ( '# Dump of graph <%s>, as of %s\n', 
                               srcgraph, 
                               CAST (NOW() AS VARCHAR)
                             ), 
                     -2
                   );
  env := vector (dict_new (16000), 0, '', '', '', 0, 0, 0, 0);
  ses := string_output ();
    FOR (SELECT * FROM ( SPARQL DEFINE input:storage "" 
                         SELECT ?s ?p ?o { GRAPH `iri(?:srcgraph)` { ?s ?p ?o } } 
                       ) AS sub OPTION (LOOP)) DO
    {
      http_ttl_triple (env, "s", "p", "o", ses);
      ses_len := length (ses);
        IF (ses_len > max_ses_len)
        {
          file_len := file_len + ses_len;
            IF (file_len > file_length_limit)
            {
              http (' .\n', ses);
              string_to_file (file_name, ses, -1);
              file_len := 0;
              file_idx := file_idx + 1;
                file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
                string_to_file ( file_name, 
                                 sprintf ( '# Dump of graph <%s>, as of %s (part %d)\n', 
                                           srcgraph, 
                                           CAST (NOW() AS VARCHAR), 
                                           file_idx), 
                                 -2
                               );
	      env := vector (dict_new (16000), 0, '', '', '', 0, 0, 0, 0);
            }
            ELSE
            string_to_file (file_name, ses, -1);
          ses := string_output ();
        }
    }
    IF (LENGTH (ses))
    {
      http (' .\n', ses);
      string_to_file (file_name, ses, -1);
    }
}
;

create procedure load_graphs (in dir varchar := 'dumps/')
{
  declare arr any;
  declare g varchar;

  arr := sys_dirlist (dir, 1);
  log_enable (2, 1);
  foreach (varchar f in arr) do
    {
      if (f like '*.ttl')
	{
	  declare continue handler for sqlstate '*'
	    {
	      log_message (sprintf ('Error in %s', f));
	    };
  	  g := file_to_string (dir || '/' || f || '.graph');
	  DB.DBA.TTLP_MT (file_open (dir || '/' || f), g, g, 255);
	}
    }
  exec ('checkpoint');
}
;

14.15.6.1. Examples for Dump and Reload Graphs

SQL>dump_one_graph ('http://daas.openlinksw.com/data#', './bbc_data_', 1000000000); SQL>dump_one_graph ('http://daas.openlinksw.com/data#', './bbc_data_');

14.15.7. Dump RDF View Graph to n3

The RDF_QM_TREE_DUMP procedure and its associated procedures below are used for dumping one or more RDFView Graphs in a Virtuoso server to a set of turtle ttl dataset files in the specified dump directory. The dump generation is made as fast as possible by grouping mappings by underlying tables so many properties from neighbor database columns can be extracted in one table scan. The size of the generated files is limited to 5MB. The dump process creates internal stored procedures; their texts are saved in file .dump_procedures.sql in the directory of dump files for debugging purposes.

Note that the dump directory must be included in the DirsAllowed parameter of the Virtuoso configuration file (e.g., virtuoso.ini), or the server will not be allowed to create nor access the dataset file(s).

The Virtuoso RDF bulk loader scripts can then be used to load the dumped datasets for the RDFView graphs directly into a Virtuoso RDF QUAD store.

14.15.7.1. Parameters


14.15.7.2. Procedure Code

CREATE PROCEDURE DB.DBA.RDF_QM_TREE_DUMP 
  ( in  dest_dir  VARCHAR, 
    in  graph_iri VARCHAR := NULL, 
    in  storage   VARCHAR := NULL, 
    in  root      VARCHAR := NULL
  )
{
 DECLARE all_qms, 
         grouped_qmvs, 
         launcher_text  ANY;
 DECLARE grp_ctr, 
         qm_ctr, 
         qm_count       INTEGER;
 DECLARE sql_file, 
         launcher_name  VARCHAR;
 IF (NOT (dest_dir LIKE '%/'))
   dest_dir := dest_dir || '/';
 sql_file := dest_dir || '.dump_procedures.sql';
 IF (storage IS NULL)
   storage := 'http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage';
 string_to_file (
   sql_file, 
   '-- This file contains procedure created by DB.DBA.RDF_QM_TREE_DUMP() for storage ' 
      || COALESCE (storage, 'NULL') 
      || ' and root quad map ' 
      || COALESCE (root, 'NULL') 
      || '\n\n', 
   -2);
 all_qms := dict_list_keys (DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (graph_iri, storage, root), 2);
 grouped_qmvs := DB.DBA.RDF_QM_GROUP_BY_SOURCE_TABLES (all_qms);
 launcher_name := 'RDF_QM_TREE_DUMP_BATCH_' || md5 (serialize (graph_iri) || storage || serialize (root));
 launcher_text := string_output ();
 http ('CREATE PROCEDURE DB.DBA."' || launcher_name || '" (in dest_dir VARCHAR)\n{\n', launcher_text);
 FOR (grp_ctr := length (grouped_qmvs); grp_ctr > 0; grp_ctr := grp_ctr-2)
   {
     DECLARE tables, qms, proc_text ANY;
     DECLARE group_key, proc_name, dump_prefix, cmt VARCHAR;
     tables := grouped_qmvs [grp_ctr-2];
     qms := grouped_qmvs [grp_ctr-1];
     qm_count := length (qms);
     group_key := md5 (serialize (graph_iri) || storage || serialize (root) || serialize (tables));
     proc_name := 'RDF_QM_TREE_DUMP_GRP_' || group_key;
     proc_text := string_output ();
     cmt := sprintf ('%d quad maps on join of', qm_count);
     FOREACH (VARCHAR t IN tables) DO cmt := cmt || ' ' || t;
     http ('  --  ' || cmt || '\n', launcher_text);
     http ('  DB.DBA."' || proc_name || '" (dest_dir);\n', launcher_text);
     http ('CREATE PROCEDURE DB.DBA."' || proc_name || '" (in dest_dir VARCHAR)\n', proc_text);
     http ('{\n', proc_text);
     http ('  -- ' || cmt || '\n', proc_text);
     http ('  DECLARE ses, env ANY;\n', proc_text);
     http ('  DECLARE file_ctr, cmt_len INTEGER;\n', proc_text);
     http ('  file_ctr := 0;\n', proc_text);
     http ('  dbg_obj_princ (' || WS.WS.STR_SQL_APOS (cmt) || ', '', file '', file_ctr);\n', proc_text);
     http ('  ses := string_output ();\n', proc_text);
     http ('  http (' || WS.WS.STR_SQL_APOS ('#' || cmt || '\n') || ', ses);\n', proc_text);
     http ('  env := VECTOR (dict_new (16000), 0, '''', '''', '''', 0, 0, 0, 0);\n', proc_text);
     http ('  cmt_len := LENGTH (ses);\n', proc_text);
     http ('  FOR (SPARQL DEFINE input:storage <' || storage || '>\n', proc_text);
     http ('    SELECT ?s1, ?p1, ?o1\n', proc_text);
     IF (graph_iri IS NOT NULL)
       {
         http ('    WHERE { GRAPH <', proc_text); http_escape (graph_iri, 12, proc_text, 1, 1); http ('> {\n', proc_text);
       }
     ELSE
       http ('    WHERE { GRAPH ?g1 {\n', proc_text);
     FOR (qm_ctr := 0; qm_ctr < qm_count; qm_ctr := qm_ctr + 1)
       {
         IF (qm_ctr > 0) http ('            UNION\n', proc_text);
         http ('            { quad map <' || qms[qm_ctr] || '> { ?s1 ?p1 ?o1 } }\n', proc_text);
       }
     http ('          } } ) DO {\n', proc_text);
     http ('      http_ttl_triple (env, "s1", "p1", "o1", ses);\n', proc_text);
     http ('      IF (LENGTH (ses) > 5000000)\n', proc_text);
     http ('        {\n', proc_text);
     http ('          http ('' .\\n'', ses);\n', proc_text);
     http ('          string_to_file (sprintf (''%s' || group_key || '_%05d.ttl'', dest_dir, file_ctr), ses, -2);\n', proc_text);
     http ('          file_ctr := file_ctr + 1;\n', proc_text);
     http ('          dbg_obj_princ (' || WS.WS.STR_SQL_APOS (cmt) || ', '', file '', file_ctr);\n', proc_text);
     http ('          ses := string_output ();\n', proc_text);
     http ('          http (' || WS.WS.STR_SQL_APOS ('#' || cmt || '\n') || ', ses);\n', proc_text);
     http ('          env := VECTOR (dict_new (16000), 0, '''', '''', '''', 0, 0, 0, 0);\n', proc_text);
     http ('        }\n', proc_text);
     http ('    }\n', proc_text);
     http ('  IF (LENGTH (ses) > cmt_len)\n', proc_text);
     http ('    {\n', proc_text);
     http ('      http ('' .\\n'', ses);\n', proc_text);
     http ('      string_to_file (sprintf (''%s' || group_key || '_%05d.ttl'', dest_dir, file_ctr), ses, -2);\n', proc_text);
     http ('    }\n', proc_text);
     http ('}\n', proc_text);
     proc_text := string_output_string (proc_text);
     string_to_file (sql_file, proc_text || ';\n\n' , -1);
     EXEC (proc_text);
   }
 http ('}\n', launcher_text);
 launcher_text := string_output_string (launcher_text);
 string_to_file (sql_file, launcher_text || ';\n\n' , -1);
 EXEC (launcher_text);
 CALL ('DB.DBA.' || launcher_name)(dest_dir);
}
;

CREATE FUNCTION DB.DBA.RDF_QM_CONTENT_OF_QM_TREE 
  ( in  graph_iri  VARCHAR := NULL,
    in  storage    VARCHAR := NULL, 
    in  root       VARCHAR := NULL, 
    in  dict       ANY := NULL
  ) returns ANY
{
 DECLARE res, subqms any;
 DECLARE graphiri varchar;
 graphiri := DB.DBA.JSO_SYS_GRAPH();
 IF (storage IS NULL)
   storage := 'http://www.openlinksw.com/schemas/virtrdf#DefaultQuadStorage';
 DB.DBA.RDF_QM_ASSERT_STORAGE_FLAG (storage, 0);
 IF (dict IS NULL)
   dict := dict_new ();
 IF (root IS NULL)
   {
     subqms := ((SELECT DB.DBA.VECTOR_AGG (sub."qmiri")
         FROM (
           SPARQL DEFINE input:storage ""
           SELECT DISTINCT (str(?qm)) AS ?qmiri
           WHERE { GRAPH `iri(?:graphiri)` {
                     { `iri(?:storage)` virtrdf:qsUserMaps ?lst .
                       ?lst ?p ?qm .
                       FILTER (0 = bif:strstr (str(?p), str(rdf:_)))
                     } UNION {
                       `iri(?:storage)` virtrdf:qsDefaultMap ?qm .
                     } } } ) AS sub ) );
     FOREACH (varchar qmid IN subqms) DO
       DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (graph_iri, storage, qmid, dict);
     RETURN dict;
   }
 DB.DBA.RDF_QM_ASSERT_JSO_TYPE (root, 'http://www.openlinksw.com/schemas/virtrdf#QuadMap');
 IF (graph_iri IS NOT NULL AND
   EXISTS ((SPARQL DEFINE input:storage ""
       SELECT (1) WHERE {
           GRAPH `iri(?:graphiri)` {
               `iri(?:root)` virtrdf:qmGraphRange-rvrFixedValue ?g .
               FILTER (str (?g) != str(?:graph_iri))
             } } ) ) )
   RETURN dict;
 IF (NOT EXISTS ((SPARQL DEFINE input:storage ""
       SELECT (1) WHERE {
           GRAPH `iri(?:graphiri)` {
               `iri(?:root)` virtrdf:qmMatchingFlags virtrdf:SPART_QM_EMPTY .
             } } ) ) )
   dict_put (dict, root, 1);
 subqms := ((SELECT DB.DBA.VECTOR_AGG (sub."qmiri")
     FROM (
       SPARQL DEFINE input:storage ""
       SELECT DISTINCT (str(?qm)) as ?qmiri
       WHERE { GRAPH `iri(?:graphiri)` {
   		`iri(?:root)` virtrdf:qmUserSubMaps ?lst .
               ?lst ?p ?qm .
               FILTER (0 = bif:strstr (str(?p), str(rdf:_)))
             } } ) AS sub ) );
 FOREACH (VARCHAR qmid IN subqms) DO
   DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (graph_iri, storage, qmid, dict);
 RETURN dict;
}
;

CREATE FUNCTION DB.DBA.RDF_QM_GROUP_BY_SOURCE_TABLES (in qms ANY) returns ANY
{
 DECLARE res ANY;
 DECLARE ctr INTEGER;
 DECLARE graphiri VARCHAR;
 graphiri := DB.DBA.JSO_SYS_GRAPH();
 res := dict_new (LENGTH (qms) / 20);
 FOREACH (VARCHAR qmiri IN qms) DO
   {
     DECLARE tbls, acc ANY;
     tbls := ((SELECT DB.DBA.VECTOR_AGG (sub."tbl")
         FROM (SELECT subsub."tbl"
           FROM (
             SPARQL DEFINE input:storage ""
             SELECT DISTINCT ?tbl
             WHERE { GRAPH `iri(?:graphiri)` {
                       { `iri(?:qmiri)` virtrdf:qmTableName ?tbl .
                       } UNION {
                         `iri(?:qmiri)` virtrdf:qmATables ?atbls .
                         ?atbls ?p ?atbl .
                         ?atbl virtrdf:qmvaTableName ?tbl
                       } UNION {
                         `iri(?:qmiri)` ?fldmap ?qmv .
                         ?qmv virtrdf:qmvATables ?atbls .
                         ?atbls ?p ?atbl .
                         ?atbl virtrdf:qmvaTableName ?tbl .
                       } } } ) subsub
           ORDER BY 1 ) AS sub ) );
     acc := dict_get (res, tbls);
     IF (acc IS NULL)
       vectorbld_init (acc);
     vectorbld_acc (acc, qmiri);
     dict_put (res, tbls, acc);
   }
 res := dict_to_vector (res, 2);
 FOR (ctr := LENGTH (res); ctr > 0; ctr := ctr-2)
   {
     DECLARE acc ANY;
     acc := aref_set_0 (res, ctr-1);
     vectorbld_final (acc);
     aset_zap_arg (res, ctr-1, acc);
   }
 RETURN res;
}
;

--test dbg_obj_princ (DB.DBA.RDF_QM_GROUP_BY_SOURCE_TABLES (dict_list_keys (DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (null), 2)));
--test dbg_obj_princ (dict_list_keys (DB.DBA.RDF_QM_CONTENT_OF_QM_TREE (null), 2));
--test DB.DBA.RDF_QM_TREE_DUMP ('dump/demo', null, null, null);
--test DB.DBA.RDF_QM_TREE_DUMP ('dump/tpch', 'http://localhost:8600/tpch', null, null);	


14.15.8. Loading RDF

There are many functions for loading RDF text, in RDF/XML and Turtle.

For loading RDF/XML, the best way is to split the data to be loaded into multiple streams and load these in parallel using RDF_LOAD_RDFXML (). To avoid running out of rollback space for large files and in order to have multiple concurrent loads not interfere with each other, the row autocommit mode should be enabled.

For example,

log_enable (2);
-- switch row-by-row autocommit on and logging off for this session
DB.DBA.RDF_LOAD_RDFXML (file_to_string_output ('file.xml'), 'base_uri', 'target_graph');
-- more files here ...
checkpoint;

Loading a file with text like the above with isql will load the data. Since the transaction logging is off, make a manual checkpoint at the end to ensure that data is persisted upon server restart since there is no roll forward log.

If large amounts of data are to be loaded, run multiple such streams in parallel. One may have for example 6 streams for 4 cores. This means that if up to two threads wait for disk, there is still work for all cores.

Having substantially more threads than processors or disks is not particularly useful.

There exist multithreaded load functions which will load one file on multiple threads: the DB.DBA.TTLP_MT() function and the DB.DBA.RDF_LOAD_RDFXML_MT() function. Experience shows that loading multiple files on one thread per file is better.

For loading Turtle, some platforms may have a non-reentrant Turtle parser. This means that only one load may run at a time. One can try this by calling ttlp () from two sessions at the same time. If these do not execute concurrently, then the best way may be to try ttlp_mt and see if this runs faster than a single threaded ttlp call.

14.15.8.1. RDF Bulk Load Utility

The RDF loader utility facilitates parallel bulk loading of multiple RDF files. The utility maintains a database table containing a list of files to load and the status of each file, whether not loaded, loaded or loaded with error. The table also records load start and end times.

One must have a dba group login for using this and the virtuoso.ini file access control list must be set up so that the Virtuoso server can open the files to load.

Files are added to the load list with the function ld_dir:

ld_dir (in dir_path varchar, in file_mask varchar, in target_graph varchar);

The file mask is a SQL like pattern to match against the files in the directory. For example:

ld_dir ('/data8/2848260', '%.gz', 'http://bsbm.org');

would load the RDF in all files ending in .gz from the directory given as first parameter. The RDF would be loaded in the http://bsbm.org graph.

If NULL is given for the graph, each file may go to a different graph specified in a separate file with the name of the RDF source file plus the extension .graph.

A .graph file contains the target graph URI without any other content or whitespace.

The layout of the load_list table is as follows:

create table DB.DBA.LOAD_LIST (
  ll_file varchar,
  ll_graph varchar,
  ll_state int default 0, -- 0 not started, 1 going, 2 done
  ll_started datetime,
  ll_done datetime,
  ll_host int,
  ll_work_time integer,
  ll_error varchar,
  primary key (ll_file))
alter index LOAD_LIST on DB.DBA.LOAD_LIST partition (ll_file varchar)
create index LL_STATE on DB.DBA.LOAD_LIST (ll_state, ll_file, ll_graph) partition (ll_state int)
;

This table may be checked at any time during bulk load for the progress of the load. ll_state is 1 for files being loaded and 2 for files whose loading has finished. ll_error is NULL if the load finished without error, else it is the error message.

In order to load data from the files in load_list, run as dba:

DB.DBA.rdf_loader_run (); 	

One may run several of these commands on parallel sessions for better throughput.

On a cluster one can do:

cl_exec ('rdf_ld_srv ()'); 

This will start one rdf_loader_run on each node of the cluster. Note that in such a setting all the server processes must see the same files at the same path.

On an isql session one may execute rdf_loader_run () & several times, forking a new isql for each such command, similarly to what a Unix shell does.

Because this load is non-transactional and non-logged, one must do an explicit checkpoint after the load to guarantee a persistent state.

On a single server do:

checkpoint; 

On a cluster do:

  
cl_exec ('checkpoint');

The server(s) are online and can process queries and transactions while a bulk load is in progress. Periodic checkpoints may occur during the load but the state is guaranteed to be consistent only after running a checkpoint after all the bulk load threads have finished.

A bulk load should not be forcibly stopped. To make a controlled stop, run:

rdf_load_stop ();

This will cause the files being loaded at the time to finish load but no new loads will start until explicitly started with rdf_loader_run ().

Specially note that on a cluster the database will be inconsistent if one server process does a checkpoint and another does not. Thus guaranteeing a checkpoint on all is necessary. This is easily done with an isql script with the following content:

ld_dir ('/data8/2848260', '%.gz', 'http://bsbm.org');

-- Record CPU time 
select getrusage ()[0] + getrusage ()[1];

rdf_loader_run () &
rdf_loader_run () &
rdf_loader_run () &
rdf_loader_run () &
rdf_loader_run () &
rdf_loader_run () &
rdf_loader_run () &
rdf_loader_run () &

wait_for_children;
checkpoint;

-- Record CPU time
select getrusage ()[0] + getrusage ()[1];

For a cluster, the equivalent is:

ld_dir ('/data8/2848260', '%.gz', 'http://bsbm.org');

cl_exec ('DB.DBA.RDF_LD_SRV (2)');

cl_exec ('checkpoint');

rdf_loader_run () recognizes several file types, including .ttl, .nt, .xml, .rdf, .owl, .nq, .n4, and others. Internally the function uses DB.DBA.ttlp() or DB.DBA.rdf_load_rdfxml, as appropriate.

See the next section for detailed description of the rdf_loader_run () function.


14.15.8.2. Loading LOD RDF data

To load the rdf data to LOD instance, perform the following steps:


14.15.8.3. Loading UniProt RDF data

To load the uniprot data, create a function for example such as:

create function DB.DBA.UNIPROT_LOAD (in log_mode integer := 1)
{
  DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output('filename1'),'http://base_uri_1', 'destination_graph_1', log_mode, 3);
  DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output('filename2'),'http://base_uri_2', 'destination_graph_2', log_mode, 3);
  ...
  DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output('filename9'),'http://base_uri_9', 'destination_graph_9', log_mode, 3);
}

If you are starting from blank database and you can drop it and re-create in case of error signaled, use it this way:

checkpoint;
checkpoint_interval(6000);
DB.DBA.UNIPROT_LOAD (0),
checkpoint;
checkpoint_interval(60);

If the database contains important data already and there's no way to stop it and backup before the load then use:

checkpoint;
checkpoint_interval(6000);
DB.DBA.UNIPROT_LOAD (),
checkpoint;
checkpoint_interval(60);

Note that the 'number of threads' parameter of DB.DBA.RDF_LOAD_RDFXML() mentions threads used to process data from file, an extra thread will read the text and parse it, so for 4 CPU cores there's no need in parameter value greater than 3. Three processing threads per one parsing tread is usually good ratio because parsing is usually three times faster than the rest of loading so CPU loading is well balanced. If for example you are using 2 x Quad Xeon, then you can choose between 8 single-threaded parsers or 2 parsers with 3 processing threads each. With 4 cores you may simply load file after file with 3 processing threads. The most important performance tuning is to set the [Parameters] section of virtuoso configuration file:

NumberOfBuffers = 1000000
MaxDirtyBuffers = 800000
MaxCheckpointRemap = 1000000
DefaultIsolation = 2

Note: these numbers are reasonable for 16 GB RAM Linux box. Usually when there are no such massive operations as loading huge database, you can set up the values as:

NumberOfBuffers = 1500000
MaxDirtyBuffers = 1200000
MaxCheckpointRemap = 1500000
DefaultIsolation = 2
See Also:

Virtuoso Configuration Options

Tip:

Thus after loading all data you may wish to shutdown, tweak and start server again. If you have ext2fs or ext3fs filesystem, then it's better to have enough free space on disk not to make it more than 80% full. When it's almost full it may allocate database file badly, resulting in measurable loss of disk access speed. That is not Virtuoso-specific fact, but a common hint for all database-like applications with random access to big files.

Here is an example of using awk file for splitting big file smaller ones:

BEGIN {
	file_part=1000
	e_line = "</rdf:RDF>"
        cur=0
        cur_o=0
	file=0
	part=file_part
      }
	{
	    res_file_i="res/"FILENAME
	    line=$0
	    s=$1
	    res_file=res_file_i"_"file".rdf"

	    if (index (s, "</rdf:Description>") == 1)
	    {
		cur=cur+1
		part=part-1
	    }

	    if (part > 0)
	    {
	    	print line >> res_file
	    }

	    if (part == 0)
	    {
#		print "===================== " cur
	    	print line >> res_file
		print e_line >> res_file
		close (res_file)
		file=file+1
		part=file_part
	    	res_file=res_file_i"_"file".rdf"
		system ("cp beg.txt " res_file)
	    }
        }
END { }

14.15.8.4. Loading DBPedia RDF data

You can use the following script as an example for loading DBPedia RDF data in Virtuoso:

#!/bin/sh

PORT=$1
USER=$2
PASS=$3
file=$4
g=$5
LOGF=`basename $0`.log

if [ -z "$PORT" -o -z "$USER" -o -z "$PASS" -o -z "$file" -o -z "$g" ]
then
  echo "Usage: `basename $0` [DSN] [user] [password] [ttl-file] [graph-iri]"
  exit
fi

if [ ! -f "$file" -a ! -d "$file" ]
then
    echo "$file does not exists"
    exit 1
fi

mkdir READY 2>/dev/null
rm -f $LOGF $LOGF.*

echo "Starting..."
echo "Logging into: $LOGF"

DOSQL ()
{
    isql $PORT $USER $PASS verbose=on banner=off prompt=off echo=ON errors=stdout exec="$1" > $LOGF
}

LOAD_FILE ()
{
    f=$1
    g=$2
    echo "Loading $f (`cat $f | wc -l` lines) `date \"+%H:%M:%S\"`" | tee -a $LOG

    DOSQL "ttlp_mt (file_to_string_output ('$f'), '', '$g', 17); checkpoint;" > $LOGF

    if [ $? != 0 ]
    then
	echo "An error occurred, please check $LOGF"
	exit 1
    fi

    line_no=`grep Error $LOGF | awk '{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }'`
    newf=$f.part
    inx=1
    while [ ! -z "$line_no" ]
    do
	cat $f |  awk "BEGIN { i = 1 } { if (i==$line_no) { print \$0; exit; } i = i + 1 }"  >> bad.nt
	line_no=`expr $line_no + 1`
	echo "Retrying from line $line_no"
	echo "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> ." > tmp.nt
	cat $f |  awk "BEGIN { i = 1 } { if (i>=$line_no) print \$0; i = i + 1 }"  >> tmp.nt
	mv tmp.nt $newf
	f=$newf
	mv $LOGF $LOGF.$inx
	DOSQL "ttlp_mt (file_to_string_output ('$f'), '', '$g', 17); checkpoint;" > $LOGF

	if [ $? != 0 ]
    then
	    echo "An error occurred, please check $LOGF"
	    exit 1
    fi
	line_no=`grep Error $LOGF | awk '{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }'`
	inx=`expr $inx + 1`
    done
    rm -f $newf 2>/dev/null
    echo "Loaded.  "
}

echo "======================================="
echo "Loading started."
echo "======================================="

if [ -f "$file" ]
then
    LOAD_FILE $file $g
    mv $file READY 2>> /dev/null
elif [ -d "$file" ]
then
    for ff in `find $file -name '*.nt'`
    do
	LOAD_FILE $ff $g
	mv $ff READY 2>> /dev/null
    done
else
   echo "The input is not file or directory"
fi
echo "======================================="
echo "Final checkpoint."
DOSQL "checkpoint;" > temp.res
echo "======================================="
echo "Check bad.nt file for skipped triples."
echo "======================================="

exit 0


14.15.8.5. Loading Bio2RDF data

The shell script below was used to import files in n3 notation into OpenLink Virtuoso RDF storage.

When an syntax error it will cut content from next line and will retry. This was used on ubuntu linux to import bio2rdf and freebase dumps.

Note it uses gawk, so it must be available on system where is tried. Also for recovery additional disk space is needed at max the size of original file.

#!/bin/bash

PASS=$1
f=$2
g=$3

# Usage
if [ -z "$PASS" -o -z "$f" -o -z "$g" ]
then
  echo "Usage: $0 [password] [ttl-file] [graph-iri]"
  exit
fi

if [ ! -f "$f" ]
then
    echo "$f does not exists"
    exit
fi

# Your port here
PORT=1111  #`inifile -f dbpedia.ini -s Parameters -k ServerPort`
if test -z "$PORT"
then
    echo "Cannot find INI and inifile command"
    exit
fi

# Initial run
isql $PORT dba $PASS verbose=on banner=off prompt=off echo=ON errors=stdout exec="ttlp_mt (file_to_string_output ('$f'), '', '$g'); checkpoint;" > $0.log

# If disconnect etc.
if [ $? != 0 ]
then
    echo "An error occurred, please check $0.log"
    exit
fi

# Check for error
line_no=`grep Error $0.log | awk '{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }'`
newf=$f.part
inx=1

# Error recovery
while [ ! -z "$line_no" ]
do
    cat $f |  awk "BEGIN { i = 0 } { if (i==$line_no) { print \$0; exit; } i = i + 1 }"  >> bad.nt
    line_no=`expr $line_no + 1`
    echo "Retrying from line $line_no"
    cat $f |  awk "BEGIN { i = 0 } { if (i>=$line_no) print \$0; i = i + 1 }"  > tmp.nt
    mv tmp.nt $newf
    f=$newf
    mv $0.log $0.log.$inx
    # Run the recovered part
    isql $PORT dba $PASS verbose=on banner=off prompt=off echo=ON errors=stdout exec="ttlp_mt (file_to_string_output ('$f'), '', '$g'); checkpoint;" > $0.log

    if [ $? != 0 ]
    then
	echo "An error occurred, please check $0.log"
	exit
    fi
   line_no=`grep Error $0.log | awk '{ match ($0, /line [0-9]+/, x) ; match (x[0], /[0-9]+/, y); print y[0] }'`
   inx=`expr $inx + 1`
done


14.15.9. Using SPARUL

Since SPARUL updates are generally not meant to be transactional, it is best to run these in log_enable (2) mode, which commits every operation as it is done. This prevents one from running out of rollback space. Also for bulk updates, transaction logging can be turned off. If so, one should do a manual checkpoint after the operation to ensure persistence across server restart since there is no roll forward log.

To have a roll forward log and row by row autocommit, one may use log_enable (3). This will write constantly into the log which takes extra time. Having no logging and doing a checkpoint when the whole work is finished is faster.

Many SPARUL operations can be run in parallel in this way. If they are independent with respect to their input and output, they can run in parallel and row by row autocommit will ensure they do not end up waiting for each others' locks.


14.15.10. DBpedia Benchmark

We ran the DBpedia benchmark queries again with different configurations of Virtuoso. Comparing numbers given by different parties is a constant problem. In the case reported here, we loaded the full DBpedia 3, all languages, with about 198M triples, onto Virtuoso v5 and Virtuoso Cluster v6, all on the same 4 core 2GHz Xeon with 8G RAM. All databases were striped on 6 disks. The Cluster configuration was with 4 processes in the same box. We ran the queries in two variants:

The times below are for the sequence of 5 queries. As there is a query in the set that specifies no condition on S or O and only P, thus cannot be done with the default indices With Virtuoso v5. With Virtuoso Cluster v6 it can, because v6 is more space efficient. So we added the index:

create bitmap index rdf_quad_pogs on rdf_quad (p, o, g, s);
Virtuoso v5 with gspo, ogps, pogs Virtuoso Cluster v6 with gspo, ogps Virtuoso Cluster v6 with gspo, ogps, pogs
cold 210 s 136 s 33.4 s
warm 0.600 s 4.01 s 0.628 s

Now let us do it without a graph being specified. Note that alter index is valid for v6 or higher. For all platforms, we drop any existing indices, and:

create table r2 (g iri_id_8, s, iri_id_8, p iri_id_8, o any, primary key (s, p, o, g))
alter index R2 on R2 partition (s int (0hexffff00));

log_enable (2);
insert into r2 (g, s, p, o) SELECT g, s, p, o from rdf_quad;

drop table rdf_quad;
alter table r2 rename RDF_QUAD;
create bitmap index rdf_quad_opgs on rdf_quad (o, p, g, s) partition (o varchar (-1, 0hexffff));
create bitmap index rdf_quad_pogs on rdf_quad (p, o, g, s) partition (o varchar (-1, 0hexffff));
create bitmap index rdf_quad_gpos on rdf_quad (g, p, o, s) partition (o varchar (-1, 0hexffff));

The code is identical for v5 and v6, except that with v5 we use iri_id (32 bit) for the type, not iri_id_8 (64 bit). We note that we run out of IDs with v5 around a few billion triples, so with v6 we have double the ID length and still manage to be vastly more space efficient.

With the above 4 indices, we can query the data pretty much in any combination without hitting a full scan of any index. We note that all indices that do not begin with s end with s as a bitmap. This takes about 60% of the space of a non-bitmap index for data such as DBpedia.

If you intend to do completely arbitrary RDF queries in Virtuoso, then chances are you are best off with the above index scheme.

Virtuoso v5 with gspo, ogps, pogs Virtuoso Cluster v6 with gspo, ogps, pogs
warm 0.595 s 0.617 s

The cold times were about the same as above, so not reproduced.

It is in the SPARQL spirit to specify a graph and for pretty much any application, there are entirely sensible ways of keeping the data in graphs and specifying which ones are concerned by queries. This is why Virtuoso is set up for this by default.

On the other hand, for the open web scenario, dealing with an unknown large number of graphs, enumerating graphs is not possible and questions like which graph of which source asserts x become relevant. We have two distinct use cases which warrant different setups of the database, simple as that.

The latter use case is not really within the SPARQL spec, so implementations may or may not support this.

Once the indices are right, there is no difference between specifying a graph and not specifying a graph with the queries considered. With more complex queries, specifying a graph or set of graphs does allow some optimizations that cannot be done with no graph specified. For example, bitmap intersections are possible only when all leading key parts are given.

The best warm cache time is with v5; the five queries run under 600 ms after the first go. This is noted to show that all-in-memory with a single thread of execution is hard to beat.

Cluster v6 performs the same queries in 623 ms. What is gained in parallelism is lost in latency if all operations complete in microseconds. On the other hand, Cluster v6 leaves v5 in the dust in any situation that has less than 100% hit rate. This is due to actual benefit from parallelism if operations take longer than a few microseconds, such as in the case of disk reads. Cluster v6 has substantially better data layout on disk, as well as fewer pages to load for the same content.

This makes it possible to run the queries without the pogs index on Cluster v6 even when v5 takes prohibitively long.

The purpose is to have a lot of RAM and space-efficient data representation.

For reference, the query texts specifying the graph are below. To run without specifying the graph, just drop the FROM <http://dbpedia.org> from each query. The returned row counts are indicated below each query's text.

SQL>SPARQL
SELECT ?p ?o
FROM <http://dbpedia.org>
WHERE
  {
    <http://dbpedia.org/resource/Metropolitan_Museum_of_Art> ?p ?o .
  };

p                                                                                 o
VARCHAR                                                                           VARCHAR
_______________________________________________________________________________

http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://umbel.org/umbel/ac/Artifact
http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://dbpedia.org/class/yago/MuseumsInNewYorkCity
http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://dbpedia.org/class/yago/ArtMuseumsAndGalleriesInTheUnitedStates
http://www.w3.org/1999/02/22-rdf-syntax-ns#type                                   http://dbpedia.org/class/yago/Museum103800563
..
-- 335 rows

SQL>SPARQL
PREFIX p: <http://dbpedia.org/property/>
SELECT ?film1 ?actor1 ?film2 ?actor2
FROM <http://dbpedia.org>
WHERE
  {
  ?film1 p:starring <http://dbpedia.org/resource/Kevin_Bacon> .
  ?film1 p:starring ?actor1 .
  ?film2 p:starring ?actor1 .
    ?film2 p:starring ?actor2 .
};

film1                                       actor1                                    film2                                        ctor2
VARCHAR                                     VARCHAR                                   VARCHAR                                      ARCHAR
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/Kevin_Bacon
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/Meryl_Streep
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/Joseph_Mazzello
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/David_Strathairn
http://dbpedia.org/resource/The_River_Wild  http://dbpedia.org/resource/Kevin_Bacon   http://dbpedia.org/resource/The_River_Wild   http://dbpedia.org/resource/John_C._Reilly
...
--  23910 rows

SQL>SPARQL
PREFIX p: <http://dbpedia.org/property/>
SELECT ?artist ?artwork ?museum ?director
FROM <http://dbpedia.org>
WHERE
  {
  ?artwork p:artist ?artist .
  ?artwork p:museum ?museum .
    ?museum p:director ?director
  };

artist                                          artwork                                              museum                                                                            director
VARCHAR                                         VARCHAR                                              VARCHAR                                                                           VARCHAR
_______________________________________________

http://dbpedia.org/resource/Paul_C%C3%A9zanne   http://dbpedia.org/resource/The_Basket_of_Apples     http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Paul_Signac         http://dbpedia.org/resource/Neo-impressionism        http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Georges_Seurat      http://dbpedia.org/resource/Neo-impressionism        http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Edward_Hopper       http://dbpedia.org/resource/Nighthawks               http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
http://dbpedia.org/resource/Mary_Cassatt        http://dbpedia.org/resource/The_Child%27s_Bath       http://dbpedia.org/resource/Art_Institute_of_Chicago                              James Cuno
..
-- 303 rows

SQL>SPARQL
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?s ?homepage
FROM <http://dbpedia.org>
WHERE
  {
   <http://dbpedia.org/resource/Berlin> geo:lat ?berlinLat .
   <http://dbpedia.org/resource/Berlin> geo:long ?berlinLong .
   ?s geo:lat ?lat .
   ?s geo:long ?long .
   ?s foaf:homepage ?homepage .
   FILTER (
     ?lat        <=     ?berlinLat + 0.03190235436 &&
     ?long       >=     ?berlinLong - 0.08679199218 &&
     ?lat        >=     ?berlinLat - 0.03190235436 &&
     ?long       <=     ?berlinLong + 0.08679199218) };

s                                                                                 homepage
VARCHAR                                                                           VARCHAR
_______________________________________________________________________________

http://dbpedia.org/resource/Berlin_University_of_the_Arts                         http://www.udk-berlin.de/
http://dbpedia.org/resource/Berlin_University_of_the_Arts                         http://www.udk-berlin.de/
http://dbpedia.org/resource/Berlin_Zoological_Garden                              http://www.zoo-berlin.de/en.html
http://dbpedia.org/resource/Federal_Ministry_of_the_Interior_%28Germany%29        http://www.bmi.bund.de
http://dbpedia.org/resource/Neues_Schauspielhaus                                  http://www.goya-berlin.com/
http://dbpedia.org/resource/Bauhaus_Archive                                       http://www.bauhaus.de/english/index.htm
http://dbpedia.org/resource/Canisius-Kolleg_Berlin                                http://www.canisius-kolleg.de
http://dbpedia.org/resource/Franz%C3%B6sisches_Gymnasium_Berlin                   http://www.fg-berlin.cidsnet.de
..
-- 48 rows

SQL>SPARQL
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX p: <http://dbpedia.org/property/>
SELECT ?s ?a ?homepage
FROM <http://dbpedia.org>
WHERE
  {
   <http://dbpedia.org/resource/New_York_City> geo:lat ?nyLat .
   <http://dbpedia.org/resource/New_York_City> geo:long ?nyLong .
   ?s geo:lat ?lat .
   ?s geo:long ?long .
   ?s p:architect ?a .
   ?a foaf:homepage ?homepage .
   FILTER (
     ?lat        <=     ?nyLat + 0.3190235436 &&
     ?long       >=     ?nyLong - 0.8679199218 &&
     ?lat        >=     ?nyLat - 0.3190235436 &&
     ?long       <=     ?nyLong + 0.8679199218) };
s                                                                                 a               homepage
VARCHAR                                                                           VARCHAR              VARCHAR
_______________________________________________________________________________

http://dbpedia.org/resource/GE_Building                                           http://dbpedia.org/resource/Associated_Architects              http://www.associated-architects.co.uk
http://dbpedia.org/resource/Giants_Stadium                                        http://dbpedia.org/resource/HNTB              http://www.hntb.com/
http://dbpedia.org/resource/Fort_Tryon_Park_and_the_Cloisters                     http://dbpedia.org/resource/Frederick_Law_Olmsted              http://www.asla.org/land/061305/olmsted.html
http://dbpedia.org/resource/Central_Park                                          http://dbpedia.org/resource/Frederick_Law_Olmsted              http://www.asla.org/land/061305/olmsted.html
http://dbpedia.org/resource/Prospect_Park_%28Brooklyn%29                          http://dbpedia.org/resource/Frederick_Law_Olmsted              http://www.asla.org/land/061305/olmsted.html
http://dbpedia.org/resource/Meadowlands_Stadium                                   http://dbpedia.org/resource/360_Architecture              http://oakland.athletics.mlb.com/oak/ballpark/new/faq.jsp
http://dbpedia.org/resource/Citi_Field                                            http://dbpedia.org/resource/HOK_Sport_Venue_Event              http://www.hoksve.com/
http://dbpedia.org/resource/Citigroup_Center                                      http://dbpedia.org/resource/Hugh_Stubbins_Jr.              http://www.klingstubbins.com
http://dbpedia.org/resource/150_Greenwich_Street                                  http://dbpedia.org/resource/Fumihiko_Maki              http://www.pritzkerprize.com/maki2.htm
http://dbpedia.org/resource/Freedom_Tower                                         http://dbpedia.org/resource/David_Childs              http://www.som.com/content.cfm/www_david_m_childs
http://dbpedia.org/resource/7_World_Trade_Center                                  http://dbpedia.org/resource/David_Childs              http://www.som.com/content.cfm/www_david_m_childs
http://dbpedia.org/resource/The_New_York_Times_Building                           http://dbpedia.org/resource/Renzo_Piano              http://www.rpbw.com/
http://dbpedia.org/resource/Trump_World_Tower                                     http://dbpedia.org/resource/Costas_Kondylis              http://www.kondylis.com

13 Rows. -- 2183 msec.

14.15.11. RDF Store Benchmarks

14.15.11.1. Introduction

In a particular RDF Store Benchmarks there is difference if the queries are executed with specified graph or with specified multiple graphs. As Virtuoso is quad store, not triple store with many tables, it runs queries inefficiently if graphs are specified and there are no additional indexes except pre-set GSPO and OGPS. Proper use of the FROM clause or adding indexes with graph column will contribute for better results.


14.15.11.2. Using bitmap indexes

If is known in advance for the current RDF Store Benchmarks that some users will not indicate specific graphs then should be done:

Both methods do not require any changes in query texts

You can create other indexes as well. Bitmap indexes are preferable, but if O is the last column, then the index can not be bitmap, so it could be, for e.g.:

create index RDF_QUAD_PSGO on DB.DBA.RDF_QUAD (P, S, G, O);

but cannot be:

create bitmap index RDF_QUAD_PSGO on DB.DBA.RDF_QUAD (P, S, G, O);


14.15.12. Fast Approximate RDF Graph Diff and Patch

Two algorithms described below resemble "unified diff" and "patch by unified diff" but they work on RDF graphs, not on plain texts.

They work reasonably for graphs composed from CBDs (concise bounded descriptions) of some subjects, if these subjects are either "named" IRIs or can be identified by values of their inverse functional properties.

Many sorts of commonly used graphs match these restrictions, including all graphs without blank nodes, most of FOAF files, graphs that can be "pretty-printed" in JSON, most of dumps of relational databases etc.

The basic idea is as simple as zipper:

An IRI in left graph (say, G1) matches to same IRI in right graph (G2) as pin to box. The same is true for literals too.

Functional and inverse functional properties are teeth that form chains, algorithm "moves sliders" along these chains, incrementally connecting more and more nodes.

If there is a match of this sort (O1 in G1 matches O2 in G2) and the matched nodes are values of same inverse functional property P (there are { S1 P O1 } in G1 and { S2 P O2 } in G2) then we guess that S1 matches S2.

If S1 in G1 matches S2 in G2 and the matched nodes are subjects of same functional property P ( there are { S1 P N1 } in G1 and { S2 P N2 } in G2 ) then we guess that N1 matches N2, now it's possible to try same interaction on triples where N1 and N2 are in subject position, that's how slides move. A typical example of a long zipper is closed list with matched heads.

14.15.12.1. Make a Diff And Use It


14.15.12.2. Collect Functional And Inverse Functional Properties

Lists of functional properties can be retrieved from an ontology graph by query like:

SPARQL define output:valmode "LONG"
SELECT (<LONG::sql:VECTOR_AGG(?s)) 
FROM <my-ontology-graph>
WHERE 
  { 
    ?s a owl:functionalProperty 
  }

Inverse functional properties could be retrieved by a similar query, but unfortunately the ontology may mention so called NULL values that can be property values for many subjects. Current implementation of diff and patch does not recognize NULL values so they can cause patch with "false alarm" errors. The workaround is to retrieve only properties that have no NULL values declared:

SPARQL define output:valmode "LONG"
SELECT (<LONG::sql:VECTOR_AGG(?s)) 
FROM <my-ontology-graph>
WHERE
  {
    ?s a owl:inverseFunctionalProperty .
    OPTIONAL { ?s owl:nullIFPValue ?v }
    FILTER (!Bound(?v)) 
  }

If no ontology is available then appropriate predicates can be obtained from sample graphs using DB.DBA.RDF_GRAPH_COLLECT_FP_LIST.


14.15.12.3. Implementation-Specific Extensions of GUO Ontology

Note: This section contains implementation details that are needed only if you want to write your own patch or diff procedure, you don't have to worry about internals if you want to use existing procedures.

Basic GUO ontology is not expressive enough to work with blank nodes, so some custom extensions $ are needed.

In the rest of the description:

@prefix guo: <http://webr3.org/owl/guo#>

is assumed.

The diff contains one node of rdf:type guo:diff.

For debugging purpose it has properties guo:graph1 and guo:graph2 that corespond to gfrom and gto arguments of DB.DBA.RDF_SUO_DIFF_TTL.

The diff also contains zero or more nodes of rdf:type guo:UpdateInstruction. These nodes are as described in basic GUO ontology, but guo:target_graph is now optional, guo:target_subject can be a blank node and objects of predicates "inside" values of guo:insert and guo:delete can also be blank nodes. These blank nodes are "placeholders" for values, calculated according to the most important GUO extension - rule nodes.

There are eight sorts of rule nodes, four for gfrom side of diff and four similar for gto side. Out of four sorts related to one side, two are for functional properties and two similar are for inverse functional properties. Thus rdf:type-s of these nodes are:

guo:from-rule-FP0, 
guo:from-rule-FP1, 
guo:from-rule-IFP0, 
guo:from-rule-IFP1

and

guo:to-rule-FP0, 
guo:to-rule-FP1, 
guo:to-rule-IFP , 
guo:to-rule-IFP1 .

Each rule node has property guo:order that is an non-negative integer.

These integers enumerate all guo:from-rule-... nodes, starting from zero.

When patch procedure works, these rules are used in this order, the result of each rule is a blank node that either exists in the graph or just created.

All results are remembered for use in the rest of the patch procedure.

Similarly, other sequence of these integers enumerate all guo:to-rule-... nodes, also starting from zero.

Consider a sequence of guo:from-rule-... nodes, because guo:to-rule- nodes have identical properties.

A rule node can have zero or more values of guo:dep property, each value is a bnode that is rule node that should be calculated before the current one.

Every rule has exactly one predicate guo:path that is a blank node. Each property of this blank node describes one possible "move of slider": predicate to follow is in predicate position and a node to start from is in object position. An IRI or a literal in object position is used as is, a blank node in object position should be of type guo:from-rule-... and have smaller guo:order so it refers to already calculated result bnode of some preceding rule.

Rule of form:

R a guo:from-rule-IFP1 ;
  guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .

searches for a unique blank node _:Rres that is a common subject of triples:

 _:Rres P1 O1
 _:Rres P2 O2
  . . .
 _:Rres Pn On

in the gfrom graph.

If subjects differ in these triples or some triples are not found or the subject is not a blank node then an appropriate error is logged and rule fails, otherwise _:Rres is remembered as the result of the rule.

Similarly, rule of form:

R a guo:from-rule-FP1 ;
  guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .

searches for a unique blank node _:Rres that is a common object of triples:

 O1 P1 _:Rres
 O2 P2 _:Rres
  . . .
 On Pn _:Rres

in the gfrom graph.

Rule of form:

R a guo:from-rule-IFP0 ;
  guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .

ensures that the gfrom graph does not contain any triple like:

 _:Rres P1 O1
 _:Rres P2 O2

or

_:Rres Pn On

It is an error if something exists. If nothing found then the result of the rule is newly created unique blank node. That's how patch procedure creates new blank nodes when it inserts "totally new" data.

Similarly, rule of form:

R a guo:from-rule-IFP0 ;
  guo:path [ P1 O1 ; P2 O2 ; ... ; Pn On ] .

ensures that the gfrom graph does not contain any triple like:

O1 P1 _:Rres
O2 P2 _:Rres

or

 On Pn _:Rres

Current version of patch procedure does not use rules guo:to-rule-... , however they can be used by custom procedure of few sorts. First, these rules can be used to produce a "reversed diff". Next, these rules can be used to validate the result of the patch - if the patch can not be reverted then the result is "suspicious".



14.15.13. RDB2RDF Triggers

RDF Views have many advantages, if compared to static dumps of the database in RDF triples. However, they does not solve few problems. First, inference is supported only for physically stored triples, so one had to chose between convenience of inference and convenience of RDF Views. Next, algorithms that selects triples with non-constant graphs and predicates tend to produce enormous texts of SQL queries if RDF Views are complicated enough. Finally, there may be a need in export of big and fresh static RDF dump but preparing this dump would take too much time via both RDF Views and traditional methods.

The solution is set of triggers on source tables of an RDF View that edit parts of physical dump on each change of source data. Unlike RDF Views that cost nothing while not queried, these triggers add a significant overhead on any data manipulation on sources, continuously. To compensate this, the dump should be in an intensive use and not replaceable by RDF Views. In other cases, do not add these triggers.

It is next to impossible to write such triggers by hands so a small API is provided to generate SQL texts from metadata of RDF Views.

First of all, views in an RDF storage does not work in full isolation from each other. Some of them may partially disable others due to OPTION(EXCLUSIVE) and some may produce one triple in different ways. As a result, triggers are not made on per-view basis. Instead, a special RDF storage is introduced, namely virtrdf:SyncToQuads , all required triples are added to it and triggers are created for the whole storage. Typically an RDF View is created in some other storage, e.g., virtrdf:DefaultQuadStorage and then added to virtrdf:SyncToQuads via:

sparql alter quad storage virtrdf:SyncToQuads {
   create <my_rdf_view> using storage virtrdf:DefaultQuadStorage };

The following example procedure copies all user-defined RDF Views from default quad storage to virtrdf:SyncToQuads:

create procedure DB.DBA.RDB2RDF_COPY_ALL_RDF_VIEWS_TO_SYNC ()
{
  for (sparql define input:storage ""
    select (bif:aref(bif:sprintf_inverse (str(?idx), bif:concat (str(rdf:_), "%d"), 0), 0)) ?qm
    from virtrdf:
    where { virtrdf:DefaultQuadStorage-UserMaps ?idx ?qm . ?qm a virtrdf:QuadMap }
    order by asc (bif:sprintf_inverse (bif:concat (str(rdf:_), "%d"), str (?idx), 1)) ) do
    exec (sprintf ('sparql alter quad storage virtrdf:SyncToQuads { create <%s> using storage virtrdf:DefaultQuadStorage }', "qm"));
}
;

When the virtrdf:SyncToQuads storage is fully prepared, two API functions can be used:

In some cases, RDF Views are complicated enough so that BEFORE UPDATE and AFTER DELETE triggers are required in additional to the minimal set. In this case, sparql_rdb2rdf_codegen calls will return a vector of two string sessions, not single string session, and both sessions are sql texts to inspect or execute. In this case, the BEFORE trigger will not delete obsolete quads from RDF_QUAD table, instead it will create records in a special table RDF_QUAD_DELETE_QUEUE as guesses what can be deleted. The AFTER trigger will re-check these guesses, delete related quads if needed and shorten the RDF_QUAD_DELETE_QUEUE.

The extra activity of triggers on RDF_QUAD, RDF_OBJ, RDF_QUAD_DELETE_QUEUE and other tables and indexes of the storage of "physical" triples may cause deadlocks so the application should be carefully checked for proper support of deadlocks if they were very seldom before turning RDB2RDF triggers on. In some cases, the whole processing of RDB2RDF can be moved to a separate server and connected to the main workhorse server via replication.

The following example functions create texts of all triggers, save them to files in for further studying and try to load them. That's probably quite bad scenario for a production database, because it's better to read procedures before loading them, especially if they're triggers, especially if some of them may contain errors.

-- This creates one or two files with one or two triggers or other texts and try to load the 
generated sql texts.
create procedure DB.DBA.RDB2RDF_EXEC_CODEGEN1_FOR_TABLE 
(  in dump_prefix varchar, 
   in tbl varchar, 
   in dump_id any, 
   in txt any )
{
  declare fname varchar;
  declare stat, msg varchar;
  if (isinteger (dump_id))
    dump_id := cast (dump_id as varchar);
  if (__tag of vector = __tag (txt))
    {
      DB.DBA.RDB2RDF_EXEC_CODEGEN1_FOR_TABLE (dump_prefix, tbl, dump_id, txt[0]);
      DB.DBA.RDB2RDF_EXEC_CODEGEN1_FOR_TABLE (dump_prefix, tbl, dump_id || 'p' , txt[1]);
      return;
    }
  if (__tag of varchar <> __tag (txt))
    txt := string_output_string (txt);
  fname := sprintf ('%s_Rdb2Rdf.%s.%s.sql', dump_prefix, tbl, dump_id);
  string_to_file (fname, txt || '\n;\n', -2);
  if ('0' = dump_id)
    return;
  stat := '00000';
  msg := '';
  exec (txt, stat, msg);
  if ('00000' <> stat)
    {
      string_to_file (fname, '\n\n- - - - - 8< - - - - -\n\nError ' || stat || ' ' || msg, -1);
      if (not (subseq (msg, 0, 5) in ('SQ091')))
        signal (stat, msg);
    }
}
;

-- This creates and loads all triggers, init procedure and debug dump related to one table.
create procedure DB.DBA.RDB2RDF_PREPARE_TABLE (in dump_prefix varchar, in tbl varchar)
{
  declare ctr integer;
  for (ctr := 0; ctr <= 4; ctr := ctr+1 )
    DB.DBA.RDB2RDF_EXEC_CODEGEN1_FOR_TABLE (dump_prefix, tbl, ctr, sparql_rdb2rdf_codegen (tbl, ctr));
}
;

-- This creates and loads all triggers, init procedure and debug dump related to all tables used by and RDF View.
create procedure DB.DBA.RDB2RDF_PREPARE_ALL_TABLES (in dump_prefix varchar)
{
  declare tbl_list any;
  tbl_list := sparql_rdb2rdf_list_tables (0);
  foreach (varchar tbl in tbl_list) do
    {
      DB.DBA.RDB2RDF_PREPARE_TABLE (dump_prefix, tbl);
    }
}
;

The following combination of calls prepares all triggers for all RDF Views of the default storage:

DB.DBA.RDB2RDF_COPY_ALL_RDF_VIEWS_TO_SYNC ();
DB.DBA.RDB2RDF_PREPARE_ALL_TABLES (cast (now() as varchar));

This does not copy the initial state of RDB2RDF graphs to the physical storage, because this can be dangerous for existing RDF data and even if all procedures will work as expected then they may produce huge amounts of RDF data, run out of transaction log limits and thus require application-specific precautions. It is also possible to make initial loading by a SPARUL statements like:

SPARQL 
INSERT IN <snapshot-graph> { ?s ?p ?o } 
FROM <snapshot-htaph> 
WHERE 
 { quad map <id-of-rdf-view> 
 	  { ?s ?p ?o } 
 };