Author’s Note: Hello readers! I’m Sonya Berg, and this is my first post on Neustar’s research blog. I am a data scientist at Neustar Research, where my focus is on solving big data problems in advertising. Before joining Neustar, I completed a PhD in mathematics from UC Davis, where I developed algorithms for quantum computing.
Matt Curcio, Rafael Solari and I had the pleasure of starting our new year off by attending SIAM’s Symposium on Discrete Algorithms, or SODA, conference in sunny San Diego. SODA is a computer science theory conference, with most talks given by academics. Below I highlight two examples of high-level topics we were excited as practitioners to see covered by theorists at SODA.
- Dataset privacy. In a variety of industries, publicly releasing datasets, or even just statistics about datasets, has proved to be useful for R&D. However, there is general concern that publishing data about individuals can compromise personal privacy. For example, last summer we highlighted breaches of privacy in a published dataset. At the SODA conference we learned about advancements in privacy research, including how to privately conduct supervised learning, as well as the application of privacy mechanisms to discourage lying.
- Distributed computing. As a big data shop, we use various distributed computing systems, such as Hadoop, Hive, and Amazon’s Redshift. We’ve seen the ugly sides of parallelism, such as clusters bottlenecking on intra-cluster bandwidth during repartitioning. We were glad to learn at SODA that academics are responding to this reality by exploring theoretical approaches to minimizing bandwidth complexity.
Learning directly from the top computer science researchers in the world can certainly be a challenge. But, we continue to attend conferences such as SODA because we return home inspired to apply new theoretical concepts to our most interesting R&D problems.