AK at re:Invent 2013

I was given the opportunity to speak at AWS re:Invent 2013, on Wednesday, Nov. 13th, about how we extract maximum performance, across our organization, from Redshift. You can find the slides here, and my voice-over, though not identical, is mostly captured by the notes in those slides.

The talk, in brief, was about the technical features of Redshift that allow us to write, run, and maintain many thousands of lines of SQL that run over hundreds of billions of rows at a time. I stayed away from nitty-gritty details in the talk, since I only had 15 minutes, and tried to focus on high-level take-aways:

  • SQL is code, so treat it like the rest of your code. It has to be clean, factored, and documented. Use CTEs to break queries into logical chunks. Use window functions to express complex ideas succinctly.
  • Redshift is an MPP system with fast IO, fast sorting, and lots of storage. Use this to your advantage by storing multiple different sort orders of your data if you have different access patterns. Materialize shared intermediates so many queries can take advantage of them.
  • Redshift has excellent integration with S3. Use the fat pipes to cheaply materialize query intermediates for debugging. Use the one-click snapshot feature to open up experimentation with schema, data layout, and column compression. If it doesn’t work out, you revert to your old snapshot.
  • Use the operational simplicity of Redshift to be frugal. Turn over the responsibility of managing cluster lifecycles to the people that use them. For instance, devs and QA rarely need their clusters when the workday is done. The dashboards are such a no-brainer that it’s barely a burden to have them turn off and start up their own clusters. Allow users to take responsibility for their cluster, and they will become more responsible about using their cluster.
  • Use the operational simplicity of Redshift to be more aware. With just a few clicks, you can launch differently sized clusters and evaluate your reports and queries against all of them. Quantify the cost of your queries in time and money.
  • It’s a managed service: stop worrying about nodes and rows and partitions and compression! Get back to business value:
    • How long does the customer have to wait?
    • How much does this report cost?
    • How do I make my staff more productive?
    • How do I minimize my technical debt?

Trackbacks

  1. […] at AWS re:Invent in Las Vegas this past Wednesday. (To learn more about my talk, you can visit my post on the AK Tech Blog.) Even though I was only around for a day, I had the opportunity to talk with some inspiring, […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: