The Wild World of SIGMOD

I’m on my way home from SIGMOD 2009, my first experience attending a conference on databases. Actually, it was a my first experience attending two conferences on databases, since SIGMOD was held in Providence concurrently with PODS.

Ed Chi, Jeff Heer, and I were invited to SIGMOD for a session in which we shared our perspectives with the database community on Human-Computer Interaction with Information. Yes, database people care about HCIR too! As the SIGMOD organizers correctly pointed out, people interested in HCI us don’t often show up at database conferences, and I am both grateful and impressed that they took the intiative to remedy that. In a similar spirit, they invited Martin Wattenberg and Fernanda Viégas to deliver a joint keynote about visualization. Even for those of us who were already familiar with their Many Eyes work, it was a delightful presentation.

Of course, it was a great opportunity for me to learn what database people normally worry about. The conference opened with a kaynote by Hasso Plattner, co-founder of software giant SAP. The main take-away of his presentation was that column stores and multi-core computation have improved the efficiency of databases by at least two orders of magnitude, opening a new world of possibilities in information access.

Column stores are pretty hot in this community. I didn’t make it to the research session devoted to them (and which included the paper that received the best-paper award), but I did get to attend the presentation that has attracted the most attention outside SIGMOD, “A Comparison of Approaches to Large-Scale Data Analysis“, a paper by seven authors that compares Hadoop (the open-source implementation of Google’s MapReduce approach), an unspecified commercial row-storage (i.e., conventional) relational database, and the Vertica column-store databases. MIT Professor Sam Madden did the presentation, but the author most indentified with this work is probably Michael Stonebraker. Indeed, Madden had a number of slides where he asked WWMSS? (“What would Mike Stonebreaker say?”), with pithy quotes like “Hadoop is ‘go slow’ for OLAP.” Madden delivered an excellent presentation, but his analysis, which was less than favorable to Hadoop, did rile up some of the audience. Specifically, Berkeley professor Joe Hellerstein suggested that the comparisons were “using the wrong y-axis” by comparing the approaches based on processing time. It would indeed be interesting to compare the development time that was required to use each of the tool the authors compared.

Some other talks I attended and enjoyed:

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets
Similarity Caching
Indexing Uncertain Data
Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers
Incremental Maintenance of Length Normalized Indexes for Approximate String Matching
Why Not? (work on helping a user understand why an expected record does *not* appear in query results)
Query by Output (a database approach reminiscent of query-by-example in information retrieval)

I also saw two really nice demos:

All in all, I enjoyed three fun and intellectually stimulating days, complete with great food and a harbor cruise in Newport. I’m grateful to the SIGMOD organizers for the invitation to spend a few days in their world, and look forward to integrating what I learned here into my own work.

I think that the column store thing is partly hype. We have had column-oriented indexes forever (bitmap indexes are not new! projection indexes are not exactly rocket science).

That is not to say that Vertica can’t kill Oracle. I’m sure it can. But I don’t that the conceptual gap is what they make it out.

Nevertheless, thinking about the data in a column-oriented way is much more interesting (to me, as a researcher).

See for example (yes, it is a plug):

Daniel Lemire, Owen Kaser, Kamel Aouiche, Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering (to appear).

http://arxiv.org/abs/0901.3751

http://www.slideshare.net/lemire/all-about-bitmap-indexes-and-sorting-them

LikeLike

Share this:

Related

By Daniel Tunkelang

6 replies on “The Wild World of SIGMOD”