Details

Orri Erling

Subscribe

Post Categories

Recent Articles

Display Settings

articles per page.
order.
Showing posts in all categories RefreshRefresh
Some Interesting VLDB 2009 Papers (2 of 5)

Intel on Hash Join

Intel and Oracle had measured hash and sort merge joins on Intel Core i7. The result was that hash join with both tables partitioned to match CPU cache was still the best but that sort/merge would catch up with more SIMD instructions in the future.

We should probably experiment with this but the most important partitioning of hash joins is still between cluster nodes. Within the process, we will see. The tradeoff of doing all in cache-sized partitions is larger intermediate results which in turn will impact the working set of disk pages in RAM. For one-off queries this is OK; for online use this has an effect.

1000 TABLE Queries

SAP presented a paper about federating relational databases. Queries would be expressed against VIEWs defined over remote TABLEs, UNIONed together and so forth. Traditional methods of optimization would run out of memory; a single 1000 TABLE plan is already a big thing. Enumerating multiple variations of such is not possible in practice. So the solution was to plan in two stages — first arrange the subqueries and derived TABLEs, and then do the JOIN orders locally. Further, local JOIN orders could even be adjusted at run time based on the actual data. Nice.

Oracle Subqueries and New Implementation of LOBs

Oracle presented some new SQL optimizations, combining and inlining subqueries and derived TABLEs. We do fairly similar things and might extend the repertoire of tricks in the direction outlined by Oracle as and when the need presents itself. This further confirms that SQL and other query optimization is really an incremental collection of specially recognized patterns. We still have not found any other way of doing it.

Another interesting piece by Oracle was about their re-implementation of large object support, where they compared LOB loading to file system and raw device speeds.

Amadeus CRS booking system, steady query time for arbitrary single table queries

There was a paper about a memory-resident database that could give steady time for any kind of single-table scan query. The innovation was to not use indices, but to have one partition of the table per processor core, all in memory. Then each core would have exactly two cursors — one reading, the other writing. The write cursor should keep ahead of the read cursor. Like this, there would be no read/write contention on pages, no locking, no multiple threads splitting a tree at different points, none of the complexity of a multithreaded database engine. Then, when the cursor would hit a row, it would look at the set of queries or updates and add the result to the output if there was a result. The data indexes the queries, not the other way around. We have done something similar for detecting changes in a full text corpus but never thought of doing queries this way.

Well, we are all about JOINs so this is not for us, but it deserves a mention for being original and clever. And indeed, anything one can ask about a table will likely be served with great predictability.

Greenplum

Google's chief economist said that the winning career choice would be to pick a scarce skill that made value from something that was plentiful. For the 2010s this career is that of the statistician/data analyst. We've said it before — the next web is analytics for all. The Greenplum talk was divided between the Fox use case, with 200TB of data about ads, web site traffic, and other things, growing 5TB a day. The message was that cubes and drill down are passé, that it is about complex statistical methods that have to run in the database, that the new kind of geek is the data geek, whose vocation it is to consume and spit out data, discover things in it, and so forth.

The technical part was about Greenplum, a SQL database running on a cluster with a PostgreSQL back-end. The interesting points were embedding MapReduce into SQL, and using relational tables for arrays and complex data types — pretty much what we also do. Greenplum emphasized scale-out and found column orientation more like a nice-to-have.

MonetDB, optimizing database for CPU cache

The MonetDB people from CWI in Amsterdam gave a 10 year best paper award talk about optimizing database for CPU cache. The key point was that if data is stored as columns, it ought also to be transferred as columns inside the execution engine. Materialize big chunks of state to cut down on interpretation overhead and use cache to best effect. They vector for CPU cache; we vector for scale-out, since the only way to ship operations is to ship many at a time. So we might as well vector also in single servers. This could be worth an experiment. Also we regularly visit the topic of column storage. But we are not yet convinced that it would be better than row-style covering indices for RDF quads. But something could certainly be tried, given time.

# PermaLink Comments [0]
09/01/2009 11:46 GMT Modified: 09/01/2009 17:32 GMT
Comparison of Open Source Databases with TPC D Queries

Last time we talked about database engine and transactions. Now we have come to the realm of query processing in our revisiting of the DBMS side of Virtuoso.

Now the well established, respectable standard benchmark for the basics of query processing is TPC D with its derivatives H and R. So we have, for testing how different SQL optimizers manage the 22 queries, run a mini version of the D queries with a 1% scale database, some 30M of data, all in memory. This basically catches whether SQL implementations miss some of the expected tricks and how efficient in memory loop and hash joins and aggregation are.

When we get to our next stop, high volume I/O, we will run the same with D databases in the 10G ballpark.

The databases were tested on the same machine, with warm cache, taking the best run of 3. All had full statistics and were running with read committed isolation, where applicable. The data was generated using the procedures from the Virtuoso test suite. The Virtuoso version tested was 5.0, to be released shortly. The MySQL was 5.0.27, the PostgreSQL was 8.1.6.

Query Query Times in Milliseconds
Virtuoso PostgreSQL MySQL MySQL with InnoDB
Q1 206 763 312 198
Q2 4 6 3 3
Q3 13 51 254 64
Q4 4 16 24 60
Q5 15 22 64 68
Q6 9 70 189 65
Q7 52 143 211 84
Q8 29 31 13 11
Q9 36 114 97 61
Q10 32 51 117 57
Q11 16 9 12 10
Q12 8 21 18 130
Q13 18 74 - -
Q14 7 21 418 1425
Q15 14 43 389 122
Q16 16 22 18 25
Q17 1 54 26 10
Q18 82 120 - -
Q19 19 8 2 17
Q20 7 15 66 52
Q21 34 86 524 278
Q22 4 323 3311 805
Total (msec) 626 2063 6068 3545

We lead by a fair margin but MySQL is hampered by obviously getting some execution plans wrong and not doing Q13 and Q18 at all, at least not under several tens of seconds; so we left these out of the table in the interest of having comparable totals.

As usual, we also ran the workload on Oracle 10g R2. Since Oracle does not like their numbers being published without explicit approval, we will just say that we are even with them within the parameters described above. Oracle has a more efficient decimal type so it wins where that is central, as on Q1. Also it seems to notice that the GROUP BYs of Q18 are produced in order of grouping columns, so it needs no intermediate table for storing the aggregates. If we addressed these matters, we'd lead by some 15% whereas now we are even. A faster decimal arithmetic implementation may be in the release after next.

In the next posts, we will look at IO and disk allocation, and also return to RDF and LUBM.

# PermaLink Comments [0]
02/05/2007 11:45 GMT Modified: 04/17/2008 21:04 GMT
         
Powered by OpenLink Virtuoso Universal Server
Running on Linux platform