Workshop

Workshop on Distribution-Free Uncertainty Quantification

Anastasios Angelopoulos, Stephen Bates, Sharon Li, Aaditya Ramdas, Ryan Tibshirani

Abstract:

Visit https://sites.google.com/berkeley.edu/dfuq21/ for details!

While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. A recent line of work we call distribution-free predictive inference (i.e., conformal prediction and related methods) has developed a set of methods that give finite-sample statistical guarantees for any (possibly incorrectly specified) predictive model and any (unknown) underlying distribution of the data, ensuring reliable uncertainty quantification (UQ) for many prediction tasks. This line of work represents a promising new approach to UQ with complex prediction systems but is relatively unknown in the applied machine learning community. Moreover, much remains to be done integrating distribution-free methods with existing approaches to UQ via calibration (e.g. with temperature scaling) -- little work has been done to bridge these two worlds. To facilitate the emerging topics on distribution-free methods, the proposed workshop has two goals. First, to bring together researchers in distribution-free methods with researchers specializing in calibration techniques to catalyze work at this interface. Second, to introduce distribution-free methods to a wider ML audience. Given the important recent emphasis on the reliable real-world performance of ML models, we believe a large fraction of ICML attendees will find this workshop highly relevant.

Chat is not available.

Timezone: »

Schedule

Sat 7:25 a.m. - 7:30 a.m.
Introduction to Conformal Prediction (Introduction)   
Anastasios Angelopoulos
Sat 7:30 a.m. - 8:15 a.m.
Talk by Rina Barber (Live Talk)   
Sat 8:15 a.m. - 9:15 a.m.
Panel with Michael I. Jordan, Vladimir Vovk, and Larry Wasserman, moderated by Aaditya Ramdas (Discussion Panel)   
Sat 9:15 a.m. - 9:24 a.m.
  

We develop a novel approach to conformal prediction when the target task has limited data available for training. Conformal prediction identifies a small set of promising output candidates in place of a single prediction, with guarantees that the set contains the correct answer with high probability. When training data is limited, however, the predicted set can easily become unusably large. In this work, we obtain substantially tighter prediction sets while maintaining desirable marginal guarantees by casting conformal prediction as a meta-learning paradigm over exchangeable collections of auxiliary tasks. Our conformalization algorithm is simple, fast, and agnostic to the choice of underlying model, learning algorithm, or dataset. We demonstrate the effectiveness of this approach across a number of few-shot classification and regression tasks in natural language processing, computer vision, and computational chemistry for drug discovery.

Adam Fisch
Sat 9:24 a.m. - 9:33 a.m.
  
We present a general, efficient technique for providing contextual predictions that are ``multivalid'' in various senses, against an online sequence of adversarially chosen examples $(x,y)$. This means that the resulting estimates correctly predict various statistics of the labels $y$ not just \emph{marginally} --- as averaged over the sequence of examples --- but also conditionally on $x \in G$ for any $G$ belonging to an arbitrary intersecting collection of groups $\cG$. We provide three instantiations of this framework. The first is mean prediction, which corresponds to an online algorithm satisfying the notion of multicalibration from \cite{multicalibration}. The second is variance and higher moment prediction, which corresponds to an online algorithm satisfying the notion of mean-conditioned moment multicalibration from \cite{momentmulti}. Finally, we define a new notion of prediction interval multivalidity, and give an algorithm for finding prediction intervals which satisfy it. Because our algorithms handle adversarially chosen examples, they can equally well be used to predict statistics of the residuals of arbitrary point prediction methods, giving rise to very general techniques for quantifying the uncertainty of predictions of black box algorithms, even in an online adversarial setting. When instantiated for prediction intervals, this solves a similar problem as conformal prediction, but in an adversarial environment and with multivalidity guarantees stronger than simple marginal coverage guarantees.
Christopher Jung
Sat 9:33 a.m. - 9:42 a.m.
  

Risk assessments to help inform criminal justice decisions have been used in the United States since the 1920s. Over the past several years, statistical learning risk algorithms have been introduced amid much controversy about fairness, transparency and accuracy. In this paper, we focus on accuracy for a large department of probation and parole that is considering a major revision of its current, statistical learning risk methods. Because the content of each offender’s supervision is substantially shaped by a forecast of subsequent conduct, forecasts have real consequences. Here we consider the probability that risk forecasts are correct. We augment standard statistical learning estimates of forecasting uncertainty (i.e., confusion tables) with uncertainty estimates from nested conformal prediction sets. In a demonstration of concept using data from the department of probation and parole, we show that the standard uncertainty measures and uncertainty measures from nested conformal prediction sets can differ dramatically in concept and output. We also provide a modification of nested conformal called the localized conformal method to match confusion tables more closely when possible. A strong case can be made favoring the nested and localized conformal approach. As best we can tell, our formulation of such comparisons and consequent recommendations is novel.

Richard Berk
Sat 9:42 a.m. - 9:51 a.m.
  

The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.

Eyke Hüllermeier
Sat 9:51 a.m. - 10:00 a.m.
  

We illustrate how indirect or prior information can be optimally used to construct a prediction region that maintains a target frequentist coverage rate. If the indirect information is accurate, the volume of the prediction region is lower on average than that of other regions with the same coverage rate. Even if the indirect information is inaccurate, the resulting region still maintains the target coverage rate. Such a prediction region can be constructed for models that have a complete sufficient statistic, which includes many widely-used parametric and nonparametric models. Particular examples include a Bayes-optimal conformal prediction procedure that maintains a constant coverage rate across distributions in a nonparametric model, as well as a prediction procedure for the normal linear regression model that can utilize a regularizing prior distribution, yet maintain a frequentist coverage rate that is constant as a function of the model parameters and explanatory variables. No results rely on asymptotic approximations.

Peter Hoff
Sat 10:00 a.m. - 10:15 a.m.
 link »

Water break with gather.town

Sat 10:15 a.m. - 11:00 a.m.
Talk by Leying Guan (Live Talk)   
Sat 11:00 a.m. - 12:00 p.m.

Poster session in two gather.town rooms:

https://eventhosts.gather.town/FiEv6wmKUf7jwHlu/dist-free-uq-poster-1

https://eventhosts.gather.town/Zcqel0fMxeVDnfuu/dist-free-uq-poster-1b

Sat 12:00 p.m. - 4:15 p.m.
Break
Sat 4:13 p.m. - 4:15 p.m.
Welcome back (Live introduction by moderator)
Sat 4:15 p.m. - 4:24 p.m.
  

Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model's uncertainty is evaluated using point-prediction metrics such as the negative log-likelihood (NLL), expected calibration error (ECE) or the Brier score on heldout data. Marginal coverage of prediction intervals or sets, a well-known concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for this class of models. With marginal coverage, and the complementary notion of the width of a prediction interval, downstream users of a deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a per-sample basis. In this study, we provide the first large scale evaluation of the empirical frequentist coverage properties of well known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on in distribution samples, but that coverage is not maintained on out-of-distribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for real-world applications.

Ben Kompa
Sat 4:24 p.m. - 4:33 p.m.
  

We study the problem of post-hoc calibration for multiclass classification, with an emphasis on histogram binning. Multiple works have focused on calibration with respect to the confidence of just the predicted class (or 'top-label'). We find that the popular notion of confidence calibration [Guo et al., 2017] is not sufficiently strong -- there exist predictors that are not calibrated in any meaningful way but are perfectly confidence calibrated. We propose a closely related (but subtly different) notion, top-label calibration, that accurately captures the intuition and simplicity of confidence calibration, but addresses its drawbacks. We formalize a histogram binning (HB) algorithm that reduces top-label multiclass calibration to the binary case, prove that it has clean theoretical guarantees without distributional assumptions, and perform a methodical study of its practical performance. Some prediction tasks require stricter notions of multiclass calibration such as class-wise or canonical calibration. We formalize appropriate HB algorithms corresponding to each of these goals. In experiments with deep neural nets, we find that our principled versions of HB are often better than temperature scaling, for both top-label and class-wise calibration. Code for this work will be made publicly available at https://github.com/aigen/df-posthoc-calibration.

Chirag Gupta
Sat 4:33 p.m. - 4:42 p.m.
  
Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input. It is frequently observed that quantile regression---a vanilla algorithm for learning quantiles with asymptotic guarantees---tends to \emph{under-cover} than the desired coverage level in reality. While various fixes have been proposed, a more fundamental understanding of why this under-coverage bias happens in the first place remains elusive. In this paper, we present a rigorous theoretical study on the coverage of uncertainty estimation algorithms in learning quantiles. We prove that quantile regression suffers from an inherent under-coverage bias, in a vanilla setting where we learn a realizable linear quantile function and there is more data than parameters. More quantitatively, for $\alpha>0.5$ and small $d/n$, the $\alpha$-quantile learned by quantile regression roughly achieves coverage $\alpha - (\alpha-1/2)\cdot d/n$ regardless of the noise distribution, where $d$ is the input dimension and $n$ is the number of training data. Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error that is not implied by existing theories on quantile regression. Experiments on simulated and real data verify our theory and further illustrate the effect of various factors such as sample size and model capacity on the under-coverage bias in more practical setups.
Yu Bai
Sat 4:42 p.m. - 4:51 p.m.
  

We propose a new nonparametric approach in the field of Conformal Prediction based on a new family of nonconformity measures inducing conformal predictors able to create closed-form finite-sample valid or exact prediction sets for functional data under very minimal distributional assumptions. Our proposal ensures that the prediction sets obtained are bands, an essential feature in the functional setting that allows the viualization and interpretation of such sets. The procedure is also fast, scalable, does not rely on functional dimension reduction techniques and allows the user to select different nonconformity measures depending on the problem at hand always obtaining valid bands. Within this family of measures, we propose also a specific measure leading to prediction bands asymptotically no less efficient than those with constant width.

Matteo Fontana
Sat 4:51 p.m. - 5:00 p.m.
  

Split conformal prediction is a computationally e cient method for performing distribution-free predictive inference in regression. It involves, however, a one-time random split of the data, and the result can strongly depend on the particular split. To address this problem, we propose multi split conformal prediction, a simple method based on Markov's inequality to aggregate split conformal prediction intervals across multiple splits.

Aldo Solari, Vera Djordjilovic
Sat 5:00 p.m. - 5:45 p.m.
Talk by Jing Lei (Live Talk)   
Sat 5:45 p.m. - 5:54 p.m.
  

While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy---coming from robust statistics and optimization---is thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an f-divergence ball around the training population. The method, based on conformal inference, achieves (nearly) valid coverage in finite samples, under only the condition that the training data be exchangeable. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it; we develop estimators and prove their consistency for protection and validity of uncertainty estimates under shifts. By experimenting on several large-scale benchmark datasets, including Recht et al.'s CIFAR-v4 and ImageNet-V2 datasets, we provide complementary empirical results that highlight the importance of robust predictive validity.

Suyash Gupta
Sat 5:54 p.m. - 6:03 p.m.
  

Estimating uncertainties associated with the predictions of machine learning models is of crucial importance to assess their robustness and predictive power. Recently, new distribution-free methods have emerged and allow to compute uncertainties with strong theoretical guarantees without making any assumption on the model nor on the underlying data distribution. In this paper, we introduce MAPIE (Model Agnostic Prediction Interval Estimator), an open-source python package following the standard scikit-learn API and implementing recent resampling methods such as the jackknife+. The package is available at this url: https://github.com/simai-ml/MAPIE.

Vianney Taquet, Gregoire Martinon, Nicolas J-B Brunel
Sat 6:03 p.m. - 6:12 p.m.
  

Conformal Predictors (CP) are wrappers around ML models, providing error guarantees under weak assumptions on the data distribution. They are suitable for a wide range of problems, from classification and regression to anomaly detection. Unfortunately, their very high computational complexity limits their applicability to large datasets. In this work, we show that it is possible to speed up a CP classifier considerably, by studying it in conjunction with the underlying ML method, and by exploiting incremental\&decremental learning. For methods such as k-NN, KDE, and kernel LS-SVM, our approach reduces the running time by one order of magnitude, whilst producing exact solutions. With similar ideas, we also achieve a linear speed up for the harder case of bootstrapping. Finally, we extend these techniques to improve upon an optimization of k-NN CP for regression. We evaluate our findings empirically, and discuss when methods are suitable for CP optimization.

Giovanni Cherubin, Konstantinos Chatzikokolakis, Martin Jaggi
Sat 6:12 p.m. - 6:21 p.m.
  

Detecting out-of-distribution (OOD) inputs is a central challenge for safely deploying machine learning models in the real world. Existing solutions are mainly driven by small-scale natural image datasets and are far from readily usable for safety-critical domains such as medical imaging diagnosis. In this paper, we bridge this critical gap by proposing a localization-based OOD detection framework LOOD, which demonstrates substantial improvement over previous methods. Our key idea is to estimate the OOD score from a localized feature region that is highly indicative of the disease label, as opposed to averaging signals from all spatial locations. We achieve this by devising a specialized pooling mechanism termed selective pooling, which yields OOD scores that better distinguish between the in-distribution and OOD data. We evaluate the model trained on a large-scale clinical chest X-ray dataset against five diverse OOD datasets. LOOD establishes superior performance on this challenging task, reducing the average FPR95 by up to 57.83%.

Yiyou Sun, Sharon Li
Sat 6:21 p.m. - 6:30 p.m.
  

Decision makers often need to rely on imperfect probabilistic forecasts. While average performance metrics are typically available, it is difficult to assess the quality of individual forecasts and the corresponding utilities without strong assumptions on the data distribution. To convey confidence about individual predictions to decision-makers, we propose a compensation mechanism ensuring that the forecasted utility matches the actually accrued utility. While a naive scheme to compensate decision-makers for prediction errors can be exploited and might not be sustainable in the long run, we propose a mechanism based on fair bets and online learning that provably cannot be exploited. We demonstrate an application showing how passengers could confidently optimize individual travel plans based on flight delay probabilities estimated by an airline.

Shengjia Zhao
Sat 6:30 p.m. - 6:45 p.m.
Water Break with Gather (Gather)  link »
Sat 6:45 p.m. - 7:03 p.m.
Talk by Kilian Weinberger (Introduction by moderator)   
Kilian Q Weinberger
Sat 7:03 p.m. - 8:05 p.m.
 link »

Poster session on gather.town:

https://eventhosts.gather.town/gqcpQdCtrcBumwvo/dist-free-uq-poster-2

Sat 8:05 p.m. - 8:50 p.m.
Talk by Emmanuel Candes (Pre-Recorded Talk)   
Emmanuel J Candes
-
DFUQ poster 1 -- Robust validation: Confident predictions even when distributions shift (poster presentation) [ Visit Poster at Spot A6 in Virtual World ]
-
DFUQ poster 2 -- An Automatic Finite Sample Robustness Metric (poster presentation) [ Visit Poster at Spot A1 in Virtual World ]
-
DFUQ poster 1 -- Deep Quantile Aggregation (poster presentation) [ Visit Poster at Spot B2 in Virtual World ]
-
DFUQ poster 1 -- Sequential Regression Using Metamodels (poster presentation) [ Visit Poster at Spot B0 in Virtual World ]
-
DFUQ poster 2 -- An Approximate Parallel Tempering for Uncertainty Quantification in Deep Learning (poster presentation) [ Visit Poster at Spot A0 in Virtual World ]
-
DFUQ poster 1 -- Testing for Outliers with Conformal p-values (poster presentation) [ Visit Poster at Spot B2 in Virtual World ]
-
DFUQ poster 1 -- Probabilistic Forecasting: A Level Set Approach (poster presentation) [ Visit Poster at Spot A5 in Virtual World ]
-
DFUQ poster 1 -- Distribution Free Uncertainty for the Minimum Norm Solution of Over-parameterized Linear Regression (poster presentation) [ Visit Poster at Spot B3 in Virtual World ]
-
DFUQ poster 1 -- Conformal Histogram Regression (poster presentation) [ Visit Poster at Spot A5 in Virtual World ]
-
DFUQ poster 1 -- Learning Quantile Function without Quantile Crossing for Distribution-free Time Series Forecasting (poster presentation) [ Visit Poster at Spot C6 in Virtual World ]
-
DFUQ poster 1 -- Adaptive Conformal Inference Under Distribution Shift (poster presentation) [ Visit Poster at Spot A1 in Virtual World ]
-
DFUQ poster 1 -- Conformal Uncertainty Sets for Robust Optimization (poster presentation) [ Visit Poster at Spot B0 in Virtual World ]
-
DFUQ poster 2 -- Reliable Decisions With Threshold Calibration (poster presentation) [ Visit Poster at Spot B3 in Virtual World ]
-
DFUQ poster 1 -- Learning Prediction Intervals for Model Performance (poster presentation) [ Visit Poster at Spot C5 in Virtual World ]
-
DFUQ poster 1 -- CD Split and HPD Split (poster presentation) [ Visit Poster at Spot A4 in Virtual World ]
-
DFUQ poster 1 -- Bayesian Triplet Loss (poster presentation) [ Visit Poster at Spot A3 in Virtual World ]
-
DFUQ poster 2 -- Conformal Prediction for Simulation Models (poster presentation) [ Visit Poster at Spot A4 in Virtual World ]
-
DFUQ poster 1 -- MD-split+: Practical Local Conformal Inference in High Dimensions (poster presentation) [ Visit Poster at Spot A0 in Virtual World ]
-
DFUQ poster 1 -- How Nonconformity Functions and Difficulty of Datasets Impact the Efficiency of Conformal Classifiers (poster presentation) [ Visit Poster at Spot C3 in Virtual World ]
-
DFUQ poster 1 -- Distribution-Independent Confidence Intervals for the Eigendecomposition of Covariance Matrices via the Eigenvalue-Eigenvector Identity (poster presentation) [ Visit Poster at Spot B6 in Virtual World ]
-
DFUQ poster 1 -- Bayesian Crowd Counting (poster presentation) [ Visit Poster at Spot A2 in Virtual World ]
-
DFUQ poster 1 -- NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural Networks (poster presentation) [ Visit Poster at Spot A1 in Virtual World ]
-
DFUQ poster 1 -- Efficient Conformal Prediction via Cascaded Inference with Expanded Admission (poster presentation) [ Visit Poster at Spot C1 in Virtual World ]
-
DFUQ poster 1 -- Few Shot Conformal Prediction with Auxiliary Tasks (poster presentation) [ Visit Poster at Spot C2 in Virtual World ]
-
DFUQ poster 2 -- Using Conformalized Prediction of Performance to Make Learning more Transparent (poster presentation) [ Visit Poster at Spot C0 in Virtual World ]
-
DFUQ poster 1 -- Prediction Intervals for Active Learning (poster presentation) [ Visit Poster at Spot A4 in Virtual World ]
-
DFUQ poster 2 -- Consistent Accelerated Inference via Confident Adaptive Transformers (poster presentation) [ Visit Poster at Spot A5 in Virtual World ]
-
DFUQ poster 1 -- Understanding The Under-Coverage Bias in Uncertainty Estimation (poster presentation) [ Visit Poster at Spot B5 in Virtual World ]
-
DFUQ poster 1 -- T-SCI: A Two-Stage Conformal Inference Algorithm with Guaranteed Coverage for Cox-MLP (poster presentation) [ Visit Poster at Spot B1 in Virtual World ]
-
DFUQ poster 1 -- Online Multivalid Learning (poster presentation) [ Visit Poster at Spot A2 in Virtual World ]
-
DFUQ poster 2 -- Finite-sample Efficient Conformal Prediction (poster presentation) [ Visit Poster at Spot A6 in Virtual World ]
-
DFUQ poster 1 -- PAC Prediction Sets Under Covariate Shift (poster presentation) [ Visit Poster at Spot A3 in Virtual World ]
-
DFUQ poster 2 -- Uncertainty Quantification ForAmniotic Fluid Segmentation And Volume Prediction (poster presentation) [ Visit Poster at Spot B6 in Virtual World ]
-
DFUQ poster 2 -- Conformal Anomaly Detection on Spatio-Temporal Observations with Missing Data (poster presentation) [ Visit Poster at Spot A3 in Virtual World ]
-
DFUQ poster 2 -- Nested Conformal (poster presentation) [ Visit Poster at Spot B2 in Virtual World ]
-
DFUQ poster 1 -- Top Label Calibration (poster presentation) [ Visit Poster at Spot B3 in Virtual World ]
-
DFUQ poster 1 -- Distribution Free UQ for Classification Under Label Shift (poster presentation) [ Visit Poster at Spot B4 in Virtual World ]
-
DFUQ poster 2 -- Calibrating Predictions to Decisions (poster presentation) [ Visit Poster at Spot A2 in Virtual World ]
-
DFUQ poster 2 -- Root-finding Approaches for Computing Conformal Prediction Sets (poster presentation) [ Visit Poster at Spot B5 in Virtual World ]
-
DFUQ poster 1 -- Cross-validation Confidence Intervals For Test Error (poster presentation) [ Visit Poster at Spot B1 in Virtual World ]
-
DFUQ poster 1 -- Improving Conditional Coverage via Orthogonal Quantile Regression (poster presentation) [ Visit Poster at Spot C4 in Virtual World ]
-
DFUQ poster 1 -- Estimation and Inference on Nonlinear Heterogeneous Effects (poster presentation) [ Visit Poster at Spot C0 in Virtual World ]
-
DFUQ poster 2 -- Locally Valid and Discriminative Confidence Intervals for Deep Learning Models (poster presentation) [ Visit Poster at Spot B1 in Virtual World ]
-
DFUQ poster 2 -- Interval Deep Learning (poster presentation) [ Visit Poster at Spot B0 in Virtual World ]
-
DFUQ poster 1 -- Training Models For Uncertainty Quantification (poster presentation) [ Visit Poster at Spot B4 in Virtual World ]
-
DFUQ poster 1 -- Distribution-free inference for regression discrete, continuous, and in between (poster presentation) [ Visit Poster at Spot B5 in Virtual World ]
-
DFUQ poster 2 -- Right Decisions from Wrong Predictions (poster presentation) [ Visit Poster at Spot B4 in Virtual World ]
-
DFUQ poster 1 -- Conformal Prediction with Localized Decorrelation (poster presentation) [ Visit Poster at Spot A6 in Virtual World ]
-
DFUQ poster 1 -- Distribution-free Conditional Median Inference (poster presentation) [ Visit Poster at Spot C0 in Virtual World ]
-
DFUQ poster 1 -- Copula-based Conformal Prediction for Multi Target Regression (poster presentation) [ Visit Poster at Spot B6 in Virtual World ]
-
DFUQ poster 2 -- Weakly Conformalized Predictive Sets with Partial Supervision (poster presentation) [ Visit Poster at Spot C5 in Virtual World ]
-
DFUQ poster 1 -- Conformalized Survival Analysis (poster presentation) [ Visit Poster at Spot A0 in Virtual World ]