Raghu Ramakrishnan of Yahoo! gave a keynote about PNUTS, the Yahoo solution for managing massive user data, from front page preferences to mail to social networks.
Dynamic scale, wide area replication, and high availability are the issues. Transactions on multiple records, complex queries, and absolute consistency at all times are traded off. Also, the programming interfaces are lower level than with SQL. Replication and consistency rules are choices for the application developer; the platform offers some basic alternatives. Implementation-wise, there is a MySQL back-end and all the partitioning, query routing, replication, and balancing take place in a layer of front-ends.
Now what do we say to this?
In the Yahoo! case, even if complex queries were possible, which they are not, one would probably keep them off the online system since latency and availability are everything. A latency of some tens of milliseconds is however acceptable, which is not so terrible for single record operations: There is time for a couple of messages on the data center network and even maybe for a disk read.
PNUTS is probably the fastest way of getting to the desired beachhead of simple access to data at infinite scale in multiple geographies. In the identical situation, I might have done something similar.
But we are in a different situation, concerned with complex queries, a highly-normalized schema-last situation, i.e., index on everything, large objects normalized away, as is done in RDF. Then we are also in the relational situation. Infinite scale, fault tolerance, and wide-area replication do come up regularly in user needs. The applications for which people would like RDF are not only complex reasoning things but very big metadata stores for user generated content, social networks, and the like.
Which of the PNUTS principles could we apply?
-
Division in tablets: When a partition of the data grows too big, it should split.
-
Migration of partitions: as capacity/demand change, partitions should migrate so as to equalize load.
-
High availability: This is divided in two — on one hand inside the data center; on the other between data centers. Inside the data center, storing partitions in duplicate and running them synchronously is possible. This is manifestly impossible in wide area settings, though. For this, we need a log-shipping style of asynchronous replication. But how does one deal with split networks and transfer of replication mastery?
PNUTS determines the master copy record by record. This makes sense when the record, for example, corresponds to a user. For RDF, doing this by the triple would be prohibitive. Doing this by the graph, or by the subject of a set of triples across all graphs, would be better. We would agree with PNUTS that transferring mastery by the storage chunk is not desired, as the chunk will contain arbitrary unrelated data.
The eventual consistency mechanisms can be generalized to RDF readily enough. In a social RDF application, the graph is the most likely unit of data ownership and update authorization, so the graph would also be the unit of eventual consistency. Keeping a separate data structure listing recent inserts/deletes to a graph with timestamps would serve for establishing consistency. The size of this would be a small fraction of the size of the graph.
RDF cannot do anything without joining between partitions, whereas for PNUTS the join between partitions is an application matter. But then PNUTS does have an extra step of RPC between the PNUTS infrastructure and the back-end. Doing query routing in the back-end gets rid of this. RDF does remain more dependent on even performance and short interconnect latencies, though. It also likely takes more space. But the essential consistency and availability features can be generalized to it, providing the merge of semi-structured data at infinite scale and availability with complex query.
At any rate, repartitioning-on-demand and partition-migration remain the key agenda items for us, confirmed over and over at VLDB.