Foundation Capital and Aggregate Knowledge Sponsor Streaming/Sketching Conference

We, along with our friends at Foundation Capital, are pleased to announce a 1 day mini-conference on streaming and sketching algorithms in Big Data.  We have gathered an amazing group of speakers from academia and industry to give talks.  If you are a reader of this blog we would love to have you come!  The conference will be on 6/20 (Thursday) from 10 AM to 5:30 PM at the 111 Minna Gallery in San Francisco and attendance is limited. Breakfast and lunch included!

There will also be a happy hour afterwards if you cannot make the conference or just want a beer.  

The speaker list includes:

Muthu Muthukrishnan

The Count-Min Sketch, 10 Years Later

The Count-Min Sketch is a data structure for indexing data streams in very small space. In a decade since its introduction, it has found many uses in theory and practice, with data streaming systems and beyond. This talk will survey the developments.

Muthu Muthukrishnan is a Professor at Rutgers University and a Research Scientist at Microsoft, India. His research interest is in development of data stream theory and systems, as well as online advertising systems.

David P. Woodruff

Sketching as a Tool for Numerical Linear Algebra

The talk will focus on how sketching techniques from the data stream literature can be used to speed up well-studied algorithms for problems occurring in numerical linear algebra, such as least squares regression and approximate singular value decomposition. It will also discuss how they can be used to achieve very efficient algorithms for variants of these problems, such as robust regression.

David Woodruff joined the algorithms and complexity group at IBM Almaden in 2007 after completing his Ph.D. at MIT in theoretical computer science. His interests are in compressed sensing, communication, numerical linear algebra, sketching, and streaming.

Sam Ritchie

Summingbird: Streaming Map/Reduce at Twitter

Summingbird is a platform for streaming map/reduce used at Twitter to build aggregations in real-time or on hadoop. When the programmer describes her job, that job can be run without change on Storm or Hadoop. Additionally, summingbird can manage merging realtime/online computations with offline batches so that small errors in real-time do not accumulate. Put another way, summingbird gives eventual consistency in a manner that is easy for the programmer to reason about.

Sam Ritchie works on data analysis and infrastructure problems in Twitter’s Revenue engineering team. He is co-author of a number of open-source Scala and Clojure libraries, including Bijection, Algebird, Cascalog 2 and ElephantDB. He holds a bachelor’s degree in mechanical and aerospace engineering.

Alexandr Andoni

Similarity Search Algorithms

Nearest Neighbor Search is an ubiquitous problem in analyzing massive datasets: its goal is to process a set of objects (such as images), so that later, one can find the object most similar to a given query object. I will survey the state-of-the-art for this problem, starting from the (Kanellakis-award winning) technique of Locality Sensitive Hashing, to its more modern relatives, and touch upon connection to the theory of sketching.

Alexandr Andoni is a researcher in the Microsoft Research at Silicon Valley since 2010, after finishing his PhD in MIT’s theory group and year-long postdoctoral position at Princeton University. His research interests revolve around algorithms for massive datasets, including similarity search and streaming/sublinear algorithms, as well as theoretical machine learning.

There will be a panel discussion on the topic of harboring research in startups. Speakers include:
Pete Skomoroch from the LinkedIn data science team.
Rob Grzywinski of Aggregate Knowledge.
Joseph Turian of Metaoptimize

Lightning talks
  • Armon Dadgar (Kiip) – Sketching Data Structures at Kiip
  • Blake Mizerany (Heroku) – An Engineer Reads a Data Sketch
  • Timon Karnezos (AK) – TBD
  • Jérémie Lumbroso (INRIA) – Philippe Flajolet’s Contribution to Streaming Algorithms

Update! We posted the videos and slides for those of you that couldn’t make it (or those that want to relive the fun). Enjoy!


  1. Sounds like a great event. Are you planning to publish some video recordings of the talks?

  2. +1 for vids as well


  1. […] finalized the agenda for the “Data Science Summit” at 111 Minna. See here for details. The first talks will start at 10am and we will try to […]

  2. […] Also, don’t forget to register now for AK’s “Data Science Summit“! […]

  3. […] don’t think I’m going out on a limb saying that our conference last week was a success. Thanks to everyone who attended and thanks again to all the speakers. […]

  4. […] look forward to hosting many more “Data Science Summits” in the future. And, thanks again to everyone for making it such a […]

  5. […] happy to announce that all the talks (slides and video) from our Data Science Summit are available for free, right here on the blog! You can find a full list of the conference’s […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: