Neustar at re:Invent 2014

On Thursday, Nov. 13th I was given the opportunity to speak at AWS re:Invent 2014 about the ad analytics products and platforms we’ve built on Redshift. My slides are available here along with what I said in the speaker’s notes.

I covered how to implement four classical ad tech queries:

  • Frequency – how many ads should I show each user?
  • Attribution – how much should I pay for each ad?
  • Overlap – where can I find these people for cheaper?
  • Ad-hoc – how can I empower my users to run their own custom queries?

I also discussed how to take maximum advantage of Redshift’s simple orchestration and provisioning to scale to as many workloads and platforms as your heart desires and save money in the process!

It was a pleasure, as usual, to present to a packed room and to discuss all the great things the team has been up to.

AK at re:Invent 2013

I was given the opportunity to speak at AWS re:Invent 2013, on Wednesday, Nov. 13th, about how we extract maximum performance, across our organization, from Redshift. You can find the slides here, and my voice-over, though not identical, is mostly captured by the notes in those slides.

The talk, in brief, was about the technical features of Redshift that allow us to write, run, and maintain many thousands of lines of SQL that run over hundreds of billions of rows at a time. I stayed away from nitty-gritty details in the talk, since I only had 15 minutes, and tried to focus on high-level take-aways:

  • SQL is code, so treat it like the rest of your code. It has to be clean, factored, and documented. Use CTEs to break queries into logical chunks. Use window functions to express complex ideas succinctly.
  • Redshift is an MPP system with fast IO, fast sorting, and lots of storage. Use this to your advantage by storing multiple different sort orders of your data if you have different access patterns. Materialize shared intermediates so many queries can take advantage of them.
  • Redshift has excellent integration with S3. Use the fat pipes to cheaply materialize query intermediates for debugging. Use the one-click snapshot feature to open up experimentation with schema, data layout, and column compression. If it doesn’t work out, you revert to your old snapshot.
  • Use the operational simplicity of Redshift to be frugal. Turn over the responsibility of managing cluster lifecycles to the people that use them. For instance, devs and QA rarely need their clusters when the workday is done. The dashboards are such a no-brainer that it’s barely a burden to have them turn off and start up their own clusters. Allow users to take responsibility for their cluster, and they will become more responsible about using their cluster.
  • Use the operational simplicity of Redshift to be more aware. With just a few clicks, you can launch differently sized clusters and evaluate your reports and queries against all of them. Quantify the cost of your queries in time and money.
  • It’s a managed service: stop worrying about nodes and rows and partitions and compression! Get back to business value:
    • How long does the customer have to wait?
    • How much does this report cost?
    • How do I make my staff more productive?
    • How do I minimize my technical debt?