No BS Data Salon #3

On Saturday Aggregate Knowledge hosted the third No BS Data Salon on databases and data infrastructure. A handful of people showed up to hear Scott Andreas of Boundary talk about distributed, streaming service architecture, and I also gave a talk about AK’s use of probabilistic data structures.

The smaller group made for some fantastic, honest conversation about the different approaches to streaming architectures, the perils of distributing analytics workloads in a streaming setting, and the challenges of pushing scientific and engineering breakthroughs all the way through to product innovation.

We’re all looking forward to the next event, which will be in San Francisco, in a month or two. If you have topics you’d like to see covered, we’d love to hear from you in the comments below!

As promised, I’ve assembled something of a “References” section to my talk, which you can find below.

(Hyper)LogLog

Random

  • Sean Gourley’s talk on human-scale analytics and decision-making
  • Muthu Muthukrishnan’s home page, where research on streaming in general abounds
  • A collection of C and Java implementations of different probabilistic sketches

Attendees of the third No BS Data Salon

No BS Data Salon #2

On Saturday, our illustrious Chief Scientist Matt Curcio sat on the Frameworks, Tools, and Techniques for Scaling up Machine Learning panel at the second No BS Data Salon hosted by MetaMarkets. The discussion ranged from scaling the human aspect of ML and analytics to brass tacks about the difficulties of actually performing ML on web scale data sets.

The theme for this Salon was analytics, and just like last time the focus was on real use cases and a no-nonsense open discussion. We heard great presentations from Sean Gourley of Quid, Ian Wong of Square, and Metamarkets’ own Nelson Ray. Sean gave a great talk about the future of human-scale vs. machine-scale decision-making and analytics. Ian gave an informative overview of the challenges of growing a risk analysis team at Square. Nelson threw down with an awesome technical presentation about how to “A/B test anything”.

This second event in the series was certainly a worthwhile follow-up to the first! Thanks again to Metamarkets for the food, drink, and welcome. You won’t find better hosts than Mike and Nisha!

No BS Data Salon #2 attendees during Sean Gourley's PresentationNo BS Data Salon #2 attendees during Sean Gourley's PresentationNo BS Data Salon #2 attendees caffeinating in between presentations

No BS Data Salon

After being quite disenchanted with the state of the Big Data conferences, I thought that I would reach out to some folks that do work similar to ours and plan a mini conference of our own. The first guy that I reached out to was Mike Driscoll, the CTO of MetaMarkets. I had hit the jackpot on the first pull. Mike had been toying with the idea of having a “No BS Data Salon” where he’d get together folks that have challenging problems and present how they’ve solved them in a use-case style format. He wanted to hit at least three levels of the stack: visualization, analytics and data infrastructure. Timon and I encouraged him to take his ideas and make it real since it was exactly what we were thinking.

Today we had the first in the series. It covered data visualization. Mike put together a fantastic group of presenters from Bret Victor to Nick Bilton. All told, there were 5 presenters, a panel discussion on JS visualization tools, and around 20 attendees. It was an awesome opportunity to just talk shop about data and visualizations.

A big thanks to Mike, Nisha and all of the MetaMarkets folks for all of their work and hospitality. And another big thanks to all of the presenters. I certainly look forward to attending and presenting at future get togethers.

Image provided by Xavier Leaute