Timezone: »

Challenges in Deploying and monitoring Machine Learning Systems
Alessandra Tosi · Nathan Korda · Michael A Osborne · Stephen Roberts · Andrei Paleyes · Fariba Yousefi

Fri Jul 23 02:00 AM -- 11:30 AM (PDT) @ None
Event URL: https://sites.google.com/view/deploymonitormlsystems2021/home »

Until recently, many industrial Machine Learning applications have been the remit of consulting academics, data scientists within larger companies, and a number of dedicated Machine Learning research labs within a few of the world’s most innovative tech companies. Over the last few years we have seen the dramatic rise of companies dedicated to providing Machine Learning software-as-a-service tools, with the aim of democratizing access to the benefits of Machine Learning. All these efforts have revealed major hurdles to ensuring the continual delivery of good performance from deployed Machine Learning systems. These hurdles range from challenges in MLOps, to fundamental problems with deploying certain algorithms, to solving the legal issues surrounding the ethics involved in letting algorithms make decisions for your business.

This workshop will invite papers related to the challenges in deploying and monitoring ML systems. It will encourage submission on subjects related to: MLOps for deployed ML systems; the ethics around deploying ML systems; useful tools and programming languages for deploying ML systems; specific challenges relating to deploying reinforcement learning in ML systems and performing continual learning and providing continual delivery in ML systems;
and finally data challenges for deployed ML systems.

We will also invite the submission of open problems and encourage the discussion (through two live panels) on topics related to the areas of: "Deploying machine learning applications in the legal system" and "Deploying machine learning on devices or constrained hardware".

These subjects represent a wealth of topical and high-impact issues for the community to work on.

Fri 2:00 a.m. - 2:10 a.m.
Opening remarks (Introduction)
Alessandra Tosi, Nathan Korda, Fariba Yousefi, Andrei Paleyes, Stephen Roberts
Fri 2:10 a.m. - 2:50 a.m.

Speaker: Engineer Bainomugisha

Bio: I am an Associate Professor of Computer Science and the Chair of the Department of Computer Science at Makerere University. My research focuses on Computer Science-driven solutions to the prevailing world challenges. I am also passionate about contributing to quality Computer Science education that is of sufficient breath and depth, practical and fast enough. Currently, I lead several innovative and research initiatives that aim to create and apply computational methods and tools that can improve the quality of life especially in the developing world setting.


Engineer Bainomugisha
Fri 2:50 a.m. - 3:30 a.m.

Can machine learning help to reduce costs and facilitate access to justice within the legal system? We first consider constitutional constraints to which deployment of ML in the legal system are subject; in particular, the protection of fundamental rights and the need to give reasons in legal decisions. We then turn to the technical state of the art with ML analysis of caselaw decisions. It is possible to predict outcomes of cases, given a set of facts, with a high degree of accuracy, but the explainability of these methods is limited. The research frontier therefore explores ways to provide legal reasons for case predictions.

Speaker: John Armour is Professor of Law and Finance at Oxford University and a Fellow of the British Academy and the European Corporate Governance Institute.


John Armour
Fri 3:30 a.m. - 4:30 a.m.

Chair: JOHN ARMOUR, University of Oxford

Panelists - Jessica Montgomery, University of Cambridge. "ML for policy: deploying machine learning to tackle public policy challenges " - Teresa Scantamburlo, European Centre for Living Technology. "The laborious exercise of human oversight in workforce surveillance " - Charles Brecque, Legislate.tech. "Can machine learning ever take the lawyer out of the loop completely? Can machine learning make the legal system fairer, make contracts less confusing and give legal advice?"

Jess Montgomery, Charles Brecque, Teresa Scantamburlo
Fri 4:30 a.m. - 4:40 a.m.
Short Break (Break)
Fri 4:40 a.m. - 5:30 a.m.

The automated design of chips is facing growing challenges due to a high volume of smartphones, the increasing functionality, and the corresponding heterogeneity of the chips. In this talk, I will survey how machine learning has recently emerged as a core technique that promises to rescue the reducing gains in power performance and area in this field. In particular, I will focus on the challenges in deploying learning algorithms in electronic design automation and outline the solution that we take at Qualcomm which combines machine learning with combinatorial optimization solvers.

Speaker: Roberto Bondesan, Qualcomm https://scholar.google.com/citations?user=l2z7p3oAAAAJ&hl=en

Roberto Bondesan
Fri 5:30 a.m. - 6:15 a.m.

Speaker's bio: Professor Richard Susskind OBE is an author, speaker, and independent adviser to major professional firms and to national governments. His main area of expertise is the future of professional service and, in particular, the way in which the IT and the Internet are changing the work of lawyers. He has worked on legal technology for over 30 years. He lectures internationally, has written many books, and advised on numerous government inquiries.


Richard Susskind
Fri 6:15 a.m. - 6:30 a.m.
Short Break (Break)
Fri 6:30 a.m. - 6:39 a.m.

We have seen a surge in research aims toward adversarial attacks and defenses in AI/ML systems. While it is crucial to formulate new attack methods and devise novel defense strategies for robustness, it is also imperative to recognize who is responsible for implementing, validating and justifying the necessity of these defenses. In particular, which components of the system are vulnerable to what type of adversarial attacks, and the expertise needed to realize the severity of adversarial attacks. Also how to evaluate and address the adversarial challenges in order to recommend defense strategies for different applications. This paper opened a discussion on who should examine and implement the adversarial defenses and the reason behind such efforts.

Authors: Kishor Datta Gupta ( University of Memphis ) Dipankar Dasgupta ( University of Memphis )

Kishor Datta Gupta
Fri 6:39 a.m. - 6:50 a.m.

Recent laws such as the GDPR require machine learning applications to "unlearn" parts of their training data if a user withdraws consent for their data. Current unlearning approaches accelerate the retraining of models, but come with hidden costs due to the need to reaccess training data and redeploy the resulting models.

We propose to look at machine unlearning as an "incremental view maintenance" problem, leveraging existing research from the data management community on efficiently maintaining the results of a query in response to changes in its inputs. Our core idea is to consider ML models as views over their training data, and express the training procedure as a differential dataflow computation, whose outputs can be automatically updated. As a consequence, the resulting models can be continuously trained over streams of updates and deletions. We discuss important limitations of this approach, and provide preliminary experimental results for maintaining a state-of-the-art sequential recommendation model.

Authors: Sebastian Schelter ( University of Amsterdam )

Sebastian Schelter
Fri 6:50 a.m. - 7:00 a.m.

Data quality validation plays an important role in ensuring the proper behaviour of productive machine learning (ML) applications and services. Observing a lack of existing solutions for quality control in medium-sized production systems, we developed DuckDQ: A lightweight and efficient Python library for data quality validation, that seamlessly integrates with existing scikit-learn ML pipelines and does not require a distributed computing environment or ML platform infrastructure, while outperforming existing solutions by a factor 3 to 40 in terms of runtime. We introduce the notion of data quality assertions, which can stop a pipeline when quality constraints of the input data or the model's output are not met. Furthermore, we employ stateful metric computations, which greatly enhance the possibilities for post-hoc failure analysis and drift detection, even when the serving data is not around anymore.

Authors Till Doehmen ( Fraunhofer FIT ) Mark Raasveldt ( CWI ) Hannes Mühleisen ( Centrum Wiskunde & Informatica ) Sebastian Schelter ( University of Amsterdam )

Till Döhmen
Fri 7:00 a.m. - 7:10 a.m.

Post-deployment monitoring of the performance ML systems is critical for ensuring reliability, especially as new user inputs can differ from the training distribution. Here we propose a novel approach, MLDemon, for ML DEployment MONitoring. MLDemon integrates both unlabeled features and a small amount of labeled examples which arrive over time to produce a real-time estimate of the ML model's current performance. Subject to budget constraints, MLDemon decides when to acquire additional, potentially costly, labels to verify the model. On temporal datasets with diverse distribution drifts and models, MLDemon substantially outperforms existing monitoring approaches. Moreover, we provide theoretical analysis to show that MLDemon is minimax rate optimal up to logarithmic factors and is provably robust against broad distribution drifts whereas prior approaches are not.

Authors Tony Ginart ( Stanford University ) Martin Zhang ( Harvard School of Public Health ) James Zou ( Stanford University )

Tony Ginart
Fri 7:10 a.m. - 7:20 a.m.

The fast-growing machine learning as a service industry has incubated many APIs for multi-label classification tasks such as OCR and multi-object recognition. The heterogeneity in those APIs' price and performance, however, often forces users to choose between accuracy and expense. In this work, we propose FrugalMCT, a principled framework that jointly maximizes the accuracy while minimizes the expense by adaptively selecting the APIs to use for different data. FrugalMCT combines different APIs' predictions to improve accuracy and selects which combination to use to respect expense constraints. Preliminary experiments using ML APIs from Google, Microsoft, and other providers for multi-label image classification show that FrugalMCT often achieves more than 50% cost reduction while matching the accuracy of the best single API.

Authors: Lingjiao Chen ( Stanford University ) James Zou ( Stanford University ) Matei Zaharia ( Stanford and Databricks )

Lingjiao Chen
Fri 7:20 a.m. - 7:30 a.m.

As machine learning (ML) is deployed by many competing service providers, the underlying ML predictors also compete against each other, and it is increasingly important to understand the impacts and biases from such competition. In this paper, we study what happens when the competing predictors can acquire additional labeled data to improve their prediction quality. We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users. Our environment models a critical aspect of data acquisition in competing systems which has not been well-studied before. When predictors can purchase additional labeled data, their overall performance improves. Surprisingly, however, the quality that users experience---i.e. the accuracy of the predictor selected by each user---can decrease even as the individual predictors get better. We show that this phenomenon naturally arises due to a trade-off whereby competition pushes each predictor to specialize in a subset of the population while data purchase has the effect of making predictors more uniform.

Authors: Yongchan Kwon ( Stanford University ) Tony Ginart ( Stanford University ) James Zou ( Stanford University )

Yongchan Kwon
Fri 7:30 a.m. - 8:00 a.m.
 link » Kishor Datta Gupta, Sebastian Schelter, Till Döhmen, Tony Ginart, Lingjiao Chen, Yongchan Kwon
Fri 8:00 a.m. - 9:00 a.m.

Chair: Stephen Roberts, University of Oxford

Panelists: - Partha Maji, ARM. "How can we extract true model uncertainty with little to no additional computational costs?", "Are popular Bayesian techniques equally expressible after they have gone through several stages of model compression?" - Cecilia Mascolo, University of Cambridge. "On-device Uncertainty Estimation" - Ivan Kiskin, University of Oxford. "Data Drift", "Updating ML lifecycle" - Yunpeng Li, University of Surrey. "Server-side vs client-side implementation" - Maria Nyamukuru, Dartmouth College. "Pathways to unified embedded machine learning algorithms"

Cecilia Mascolo, Maria Nyamukuru, Ivan Kiskin, Partha Maji, Yunpeng Li, Stephen Roberts
Fri 9:00 a.m. - 9:10 a.m.
Short Break (Break)
Fri 9:10 a.m. - 9:50 a.m.

Speaker: Shalmali Joshi, Postdoctoral Fellow at the Center for Research on Computation on Society, Harvard University (SEAS)


Shalmali Joshi
Fri 9:50 a.m. - 10:30 a.m.

The number of applications relying on inference from Machine Learning (ML) models is already large and expected to keep growing. Facebook, for instance, serves tens-of-trillions of inference queries per day. Distributed inference dominates ML production costs: on AWS, it accounts for over 90% of ML infrastructure cost. Despite existing work in machine learning inference serving, ease-of-use and cost efficiency remain challenges at large scales. Developers must manually search through thousands of model-variants—versions of already-trained models that differ in hardware, resource footprints, latencies, costs, and accuracies—to meet the diverse application requirements. Since requirements, query load, and applications themselves evolve over time, these decisions need to be made dynamically for each inference query to avoid excessive costs through naive autoscaling. To avoid navigating through the large and complex trade-off space of model-variants, developers often fix a variant across queries, and replicate it when load increases. However, given the diversity across variants and hardware platforms in the cloud, a lack of understanding of the trade-off space can incur significant costs to developers.

In this talk, I will primarily focus on INFaaS, an automated model-less system for distributed inference serving, where developers simply specify the performance and accuracy requirements for their applications without needing to specify a specific model-variant for each query. INFaaS generates model-variants from already trained models, and efficiently navigates the large trade-off space of model-variants on behalf of developers to meet application-specific objectives: (a) for each query, it selects a model, hardware architecture, and model optimizations, (b) it combines VM-level horizontal autoscaling with model-level autoscaling, where multiple, different model-variants are used to serve queries within each machine. By leveraging diverse variants and sharing hardware resources across models, INFaaS achieves significant improvement in performance (throughput and latency of model serving) while saving costs compared to existing inference serving systems. I will conclude the talk with a brief discussion on future directions.

Neeraja J Yadwadkar
Fri 10:30 a.m. - 11:30 a.m.

Chair: Stephen Roberts, University of Oxford

Neeraja J Yadwadkar, Shalmali Joshi, Roberto Bondesan, Engineer Bainomugisha, Stephen Roberts

Author Information

Alessandra Tosi (Mind Foundry)
Nathan Korda (Mind Foundry)
Michael A Osborne (U Oxford)
Stephen Roberts (University of Oxford)
Andrei Paleyes (University of Cambridge)
Fariba Yousefi (University of Sheffield)

More from the Same Authors