Rob, Matt, and I just wrapped up our trip to Copenhagen for the EADS Summer School on Hashing at the University of Copenhagen and it was a blast! The lineup of speakers was, simply put, unbeatable: Rasmus Pagh, Graham Cormode, Michael Mitzenmacher, Mikkel Thorup, Alex Andoni, Haim Kaplan, John Langford, and Suresh Venkatasubramanian. There’s a good chance that any paper you’ve read on hashing, sketching, or streaming has one of them as a co-author or is heavily influenced by their work. The format was three hour-and-a-half lectures for four days, with exercises presented at the end of each lecture. (Slides can be found here. They might also post videos.)
Despite the depth of the material, almost all of it was approachable with some experience in undergraduate math and computer science. We would strongly recommend both of Michael Mitzenmacher’s talks (1, 2) for an excellent overview of Bloom Filters and Cuckoo hashing that are, in my opinion, significantly better and more in depth than any other out there. Specifically, the Bloom Filter talk presents very elegantly the continuum of Bloom Filter to Counting Bloom Filter to Count-Min Sketch (with “conservative update”) to the Stragglers Problem and Invertible Bloom Filters to, finally, extremely recent work called Odd Sketches.
Similarly, Mikkel Thorup’s two talks on hashing (1, 2) do a very thorough job of examining the hows and whys of integer hashing, all the way from the simplest multiply-mod-prime schemes all the way to modern work on tabulation hashing. And if you haven’t heard of tabulation hashing, and specifically twisted tabulation hashing, get on that because (1) it’s amazing that it doesn’t produce trash given how simple it is, (2) it’s unbelievably fast, and (3) it has been proven to provide the guarantees required for almost all of the interesting topics we’ve discussed on the blog in the past: Bloom Filters, Count-Min sketch, HyperLogLog, chaining/linear-probing/cuckoo hash tables, and so on. We really appreciated how much attention Mikkel devoted to practicality of implementation and to empirical performance when discussing hashing algorithms. It warms our heart to hear a leading academic in this field tout the number of nanoseconds it takes to hash an item as vocally as the elegance of the proof behind it!
We love this “Summer School” format because it delivers the accumulated didactic insight of the field’s top researchers and educators to both old techniques and brand new ones. (And we hope by now that everyone reading our blog appreciates how obsessed we are with teaching and clarifying interesting algorithms and data structures!) Usually most of this insight (into origins, creative process, stumbling blocks, intermediate results, inspiration, etc.) only comes out in conversation or lectures, and even worse is withheld or elided at publishing time for the sake of “clarity” or “elegance”, which is a baffling rationale given how useful these “notes in the margin” have been to us. The longer format of the lectures really allowed for useful “digressions” into the history or inspiration for topics or proofs, which is a huge contrast to the 10-minute presentations given at a conference like SODA. (Yes, obviously the objective of SODA is to show a much greater breadth of work, but it really makes it hard to explain or digest the context of new work.)
In much the same way, the length of the program really gave us the opportunity to have great conversations with the speakers and attendees between sessions and over dinner. We can’t emphasize this enough: if your ambition to is implement and understand cutting edge algorithms and data structures then the best bang for your buck is to get out there and meet the researchers in person. We’re incredibly lucky to call most of the speakers our friends and to regularly trade notes and get pointers to new research. They have helped us time and again when we’ve been baffled by inconsistent terminology or had a hunch that two pieces of research were “basically saying the same thing”. Unsurprisingly, they are also the best group of people to talk to when it comes to understanding how to foster a culture of successful research. For instance, Mikkel has a great article on how to systematically encourage and reward research article that appears in the March 2013 issue of CACM (pay-wall’d). Also worthwhile is his guest post on Bertrand Meyer’s blog.
If Mikkel decides to host another one of these, we cannot recommend attending enough. (Did we mention it was free?!) Thanks again Mikkel, Rasmus, Graham, Alex, Michael, Haim, and John for organizing such a great program and lecturing so eloquently!