The topic of column-wise storage has not escaped us. We are not convinced that this is good for RDF. There is a point to this for business intelligence data warehouses, no doubt, although one could argue that one could get the same IO benefit with suitably selected covering indices but this is more design work. Column storage fits in less space and is more versatile For unexpected workloads.

But we can look at the RDF case in specific. You have a quad of G, S, P, O. You have a one part index on each and you have a unique row number for each quad. Given the row number, you must get the G, S, P, and O, and given any one of these, you must get the row numbers where this occurs. If there were multi-part keys, then this would be a row store with covering indices, like Virtuoso's RDF store.

Each datum is stored 8 times. What is nice is that one can use any combination of selection criteria with equal ease and in the same working set. With the RDF workload, you end up typically referencing all parts of each quad. It is not like in the business intelligence case where the typical query accesses 4 columns of the 15 column history table. Of the 4 RDF quad keys, at least 2 are generally given. So this becomes a merge intersection of two or three indices and random lookups for the unspecified columns. Complicated control path, even if the engine is meant to do this thing alone.

We'll have to try this. We could set up Virtuoso with 4 bitmap indices, each column to row ID and then a table with the 4 columns. Then we'd get bitmap ANDs for multi-column criteria and would have to get the row by row ID. As long as we run in memory, this should perform like a column store, close enough. We get the row with all the columns once, so we compensate for the fact that a column store has a special means for dereferencing the row ID for any column.

If we optimized this specially, which would not be so terribly hard, we'd have a column store. The main new thing would be making a special index by row ID that would have the ID just once per index leaf and a bitmap for dense allocation of row IDs. The rest is not too different.

For now, we will watch. If this is the next big thing, we can get there in little time.