AI Systems

Workshop on AI Systems at SOSP 2019

October 27, 2019

Invited Talks

What are the Unique Challenges and Opportunities in Systems for ML?, Matei Zaharia

A View of Programming Languages & Software Engineering for ML Software, Caroline Lemieux

The rise of deep learning platforms such as PyTorch and Tensorflow has greatly simplified the task of developing deep learning software. However, significant challenges still arise in developing machine learning software, and integrating it with existing programming systems. Many of the challenges are fundamentally software engineering problems; so, it is valuable to examine them through the research lens of programming languages and software engineering, i.e. programming systems research. In this talk, I will present a few key problems in developing machine learning software in which there are exciting programming systems research challenges. In particular, I will highlight the ways in which machine learning software development distinguishes itself from conventional software development, and the research opportunities this presents.

Caroline Lemieux is a PhD candidate at UC Berkeley, advised by Koushik Sen. Her research interests center around improving the correctness and reliability of software systems by developing automated methods for engineering tasks such as testing, debugging, and comprehension. Her current projects focus on fuzz testing and program synthesis. Her work on fuzz testing has been awarded an ACM SIGSOFT Distinguished Paper Award, Distinguished Artifact Award, Tool Demonstration Award, and Best Industry Paper Award. Before Berkeley, she received her B.Sc. in Computer Science and Mathematics at the University of British Columbia. She is the recipient of a Berkeley Fellowship for Graduate Study, and a Google PhD Fellowship in Programming Technologies and Software Engineering.

Asynchrony and Quantization for Efficient and Scalable Learning, Christopher De Sa

Much of the recent advancement in machine learning has been driven by the capability of machine learning systems to process and learn from very large data sets using very complicated models. Continuing to scale data up in this way presents a computational challenge, as power, memory, and time are all factors that limit performance. Two popular approaches to address these issues are low-precision arithmetic and asynchronous parallelism. Lower precision arithmetic improves systems metrics by using fewer bits to represent numbers. Asynchronous parallelism allows for scaling without the limiting factor of synchronization between parallel workers. However, both methods can introduce error or noise into a training algorithm. In this talk, I will discuss recent results to advance both the theoretical understanding and practical application of these methods.

Christopher De Sa is an Assistant Professor in the Cornell Department of Computer Science. His research covers algorithmic, software, and hardware techniques for high-performance machine learning, with a focus on relaxed-consistency variants of stochastic algorithms such as asynchronous and low-precision stochastic gradient descent (SGD) and Markov chain Monte Carlo.

Learning Based Coded-Computation: A Novel Approach for Resilient Computation in ML Inference Systems, Rashmi K. Vinayak

Building Scalable Systems for Reinforcement Learning and Using Reinforcement Learning for Better Systems, Yuandong Tian

In this talk, I will introduce our works on building scalable system for training large-scale reinforcement learning (RL) agents more efficiently, and how RL can help improve systems that are traditionally built from hand-coded rules and domain-specific human experience. First, we introduce our efforts of building and open sourcing scalable systems for reinforcement learning that can be used to train agents to play Atari, Go and real-time strategy games. Second, we show that neural networks, coupled with reinforcement learning and search methods, can be used to learn heuristics of a complicated optimization problem and achieve better performance than using human experience. The application includes online job scheduling, vehicle routing, architecture search, and code decompiler, showing potentials to use machine learning for better systems.

Challenges and Progress in Scaling ML Fairness, Alex Beutel

As we learn more about how to measure and mitigate fairness issues, how do we scale our approaches across the many applications of ML? Building systems to support fairer machine learning is important, but comes with many challenges. Most of the academic literature focuses on removing biases for a well-defined group in a particular model, but unfortunately this is rarely how ML is applied. How can we measure or mitigate issues if most systems almost never have demographic data? And, how do we improve systems that are composed of numerous models designed by different groups of people? In this talk, I’ll present a brief background on machine learning fairness and discuss recent progress we’ve made in addressing these problems as we work to put fairness principles into practical application.

Alex Beutel is a Staff Research Scientist in Google Brain SIR, leading a team working on ML Fairness as well as researching neural recommendation and ML for Systems. He received his Ph.D. in 2016 from Carnegie Mellon University’s Computer Science Department, and previously received his B.S. from Duke University in computer science and physics. His Ph.D. thesis on large-scale user behavior modeling, covering recommender systems, fraud detection, and scalable machine learning, was given the SIGKDD 2017 Doctoral Dissertation Award Runner-Up. He also received the Best Paper Award at KDD 2016 and ACM GIS 2010, was a finalist for best paper in KDD 2014 and ASONAM 2012, and was awarded the Facebook Fellowship in 2013 and the NSF Graduate Research Fellowship in 2011. More details can be found at