## Workshop on Distribution-Free Uncertainty Quantification

### Anastasios Angelopoulos, Stephen Bates, Sharon Li, Aaditya Ramdas, Ryan Tibshirani

Abstract:

 Sat 7:25 a.m. - 7:30 a.m. Introduction to Conformal Prediction (Introduction) Anastasios Angelopoulos Sat 7:30 a.m. - 8:15 a.m. Talk by Rina Barber (Live Talk) Sat 8:15 a.m. - 9:15 a.m. Panel with Michael I. Jordan, Vladimir Vovk, and Larry Wasserman, moderated by Aaditya Ramdas (Discussion Panel) Sat 9:15 a.m. - 9:24 a.m. Few-Shot Conformal Prediction with Auxiliary Tasks (Spotlight #1) (Pre-Recorded Talk) »    We develop a novel approach to conformal prediction when the target task has limited data available for training. Conformal prediction identifies a small set of promising output candidates in place of a single prediction, with guarantees that the set contains the correct answer with high probability. When training data is limited, however, the predicted set can easily become unusably large. In this work, we obtain substantially tighter prediction sets while maintaining desirable marginal guarantees by casting conformal prediction as a meta-learning paradigm over exchangeable collections of auxiliary tasks. Our conformalization algorithm is simple, fast, and agnostic to the choice of underlying model, learning algorithm, or dataset. We demonstrate the effectiveness of this approach across a number of few-shot classification and regression tasks in natural language processing, computer vision, and computational chemistry for drug discovery. Adam Fisch Sat 9:24 a.m. - 9:33 a.m. Online Multivalid Learning: Means, Moments, and Prediction Intervals (Spotlight #2) (Pre-Recorded Talk) »    We present a general, efficient technique for providing contextual predictions that are multivalid'' in various senses, against an online sequence of adversarially chosen examples $(x,y)$. This means that the resulting estimates correctly predict various statistics of the labels $y$ not just \emph{marginally} --- as averaged over the sequence of examples --- but also conditionally on $x \in G$ for any $G$ belonging to an arbitrary intersecting collection of groups $\cG$. We provide three instantiations of this framework. The first is mean prediction, which corresponds to an online algorithm satisfying the notion of multicalibration from \cite{multicalibration}. The second is variance and higher moment prediction, which corresponds to an online algorithm satisfying the notion of mean-conditioned moment multicalibration from \cite{momentmulti}. Finally, we define a new notion of prediction interval multivalidity, and give an algorithm for finding prediction intervals which satisfy it. Because our algorithms handle adversarially chosen examples, they can equally well be used to predict statistics of the residuals of arbitrary point prediction methods, giving rise to very general techniques for quantifying the uncertainty of predictions of black box algorithms, even in an online adversarial setting. When instantiated for prediction intervals, this solves a similar problem as conformal prediction, but in an adversarial environment and with multivalidity guarantees stronger than simple marginal coverage guarantees. Christopher Jung Sat 9:33 a.m. - 9:42 a.m. Nested Conformal Prediction Sets for Classification with Applications to Probation Data (Spotlight #3) (Pre-Recorded Talk) »    Risk assessments to help inform criminal justice decisions have been used in the United States since the 1920s. Over the past several years, statistical learning risk algorithms have been introduced amid much controversy about fairness, transparency and accuracy. In this paper, we focus on accuracy for a large department of probation and parole that is considering a major revision of its current, statistical learning risk methods. Because the content of each offender’s supervision is substantially shaped by a forecast of subsequent conduct, forecasts have real consequences. Here we consider the probability that risk forecasts are correct. We augment standard statistical learning estimates of forecasting uncertainty (i.e., confusion tables) with uncertainty estimates from nested conformal prediction sets. In a demonstration of concept using data from the department of probation and parole, we show that the standard uncertainty measures and uncertainty measures from nested conformal prediction sets can differ dramatically in concept and output. We also provide a modification of nested conformal called the localized conformal method to match confusion tables more closely when possible. A strong case can be made favoring the nested and localized conformal approach. As best we can tell, our formulation of such comparisons and consequent recommendations is novel. Richard Berk Sat 9:42 a.m. - 9:51 a.m. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods (Spotlight #4) (Pre-Recorded Talk) »    The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular. Eyke Hüllermeier Sat 9:51 a.m. - 10:00 a.m. Bayes-optimal prediction with frequentist coverage control (Spotlight #5) (Pre-Recorded Talk) »    We illustrate how indirect or prior information can be optimally used to construct a prediction region that maintains a target frequentist coverage rate. If the indirect information is accurate, the volume of the prediction region is lower on average than that of other regions with the same coverage rate. Even if the indirect information is inaccurate, the resulting region still maintains the target coverage rate. Such a prediction region can be constructed for models that have a complete sufficient statistic, which includes many widely-used parametric and nonparametric models. Particular examples include a Bayes-optimal conformal prediction procedure that maintains a constant coverage rate across distributions in a nonparametric model, as well as a prediction procedure for the normal linear regression model that can utilize a regularizing prior distribution, yet maintain a frequentist coverage rate that is constant as a function of the model parameters and explanatory variables. No results rely on asymptotic approximations. Peter Hoff Sat 10:00 a.m. - 10:15 a.m. Water Break with Gather (Gather) »  link » Water break with gather.town Link » Sat 10:15 a.m. - 11:00 a.m. Talk by Leying Guan (Live Talk) Sat 11:00 a.m. - 12:00 p.m. Poster Session #1 (Poster Session) » Poster session in two gather.town rooms: https://eventhosts.gather.town/FiEv6wmKUf7jwHlu/dist-free-uq-poster-1 https://eventhosts.gather.town/Zcqel0fMxeVDnfuu/dist-free-uq-poster-1b Sat 12:00 p.m. - 4:15 p.m. Break Sat 4:13 p.m. - 4:15 p.m. Welcome back (Live introduction by moderator) Sat 4:15 p.m. - 4:24 p.m. Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures (Spotlight #6) (Pre-Recorded Talk) »    Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model's uncertainty is evaluated using point-prediction metrics such as the negative log-likelihood (NLL), expected calibration error (ECE) or the Brier score on heldout data. Marginal coverage of prediction intervals or sets, a well-known concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for this class of models. With marginal coverage, and the complementary notion of the width of a prediction interval, downstream users of a deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a per-sample basis. In this study, we provide the first large scale evaluation of the empirical frequentist coverage properties of well known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on in distribution samples, but that coverage is not maintained on out-of-distribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for real-world applications. Ben Kompa Sat 4:24 p.m. - 4:33 p.m. Top-label calibration (Spotlight #7) (Pre-Recorded Talk) »    We study the problem of post-hoc calibration for multiclass classification, with an emphasis on histogram binning. Multiple works have focused on calibration with respect to the confidence of just the predicted class (or 'top-label'). We find that the popular notion of confidence calibration [Guo et al., 2017] is not sufficiently strong -- there exist predictors that are not calibrated in any meaningful way but are perfectly confidence calibrated. We propose a closely related (but subtly different) notion, top-label calibration, that accurately captures the intuition and simplicity of confidence calibration, but addresses its drawbacks. We formalize a histogram binning (HB) algorithm that reduces top-label multiclass calibration to the binary case, prove that it has clean theoretical guarantees without distributional assumptions, and perform a methodical study of its practical performance. Some prediction tasks require stricter notions of multiclass calibration such as class-wise or canonical calibration. We formalize appropriate HB algorithms corresponding to each of these goals. In experiments with deep neural nets, we find that our principled versions of HB are often better than temperature scaling, for both top-label and class-wise calibration. Code for this work will be made publicly available at https://github.com/aigen/df-posthoc-calibration. Chirag Gupta Sat 4:33 p.m. - 4:42 p.m. Understanding the Under-Coverage Bias in Uncertainty Estimation (Spotlight #8) (Pre-Recorded Talk) »    Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input. It is frequently observed that quantile regression---a vanilla algorithm for learning quantiles with asymptotic guarantees---tends to \emph{under-cover} than the desired coverage level in reality. While various fixes have been proposed, a more fundamental understanding of why this under-coverage bias happens in the first place remains elusive. In this paper, we present a rigorous theoretical study on the coverage of uncertainty estimation algorithms in learning quantiles. We prove that quantile regression suffers from an inherent under-coverage bias, in a vanilla setting where we learn a realizable linear quantile function and there is more data than parameters. More quantitatively, for $\alpha>0.5$ and small $d/n$, the $\alpha$-quantile learned by quantile regression roughly achieves coverage $\alpha - (\alpha-1/2)\cdot d/n$ regardless of the noise distribution, where $d$ is the input dimension and $n$ is the number of training data. Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error that is not implied by existing theories on quantile regression. Experiments on simulated and real data verify our theory and further illustrate the effect of various factors such as sample size and model capacity on the under-coverage bias in more practical setups. Yu Bai Sat 4:42 p.m. - 4:51 p.m. A Conformal Approach for Functional Prediction Bands (Spotlight #9) (Pre-Recorded Talk) »    We propose a new nonparametric approach in the field of Conformal Prediction based on a new family of nonconformity measures inducing conformal predictors able to create closed-form finite-sample valid or exact prediction sets for functional data under very minimal distributional assumptions. Our proposal ensures that the prediction sets obtained are bands, an essential feature in the functional setting that allows the viualization and interpretation of such sets. The procedure is also fast, scalable, does not rely on functional dimension reduction techniques and allows the user to select different nonconformity measures depending on the problem at hand always obtaining valid bands. Within this family of measures, we propose also a specific measure leading to prediction bands asymptotically no less efficient than those with constant width. Matteo Fontana Sat 4:51 p.m. - 5:00 p.m. Multi Split Conformal Prediction (Spotlight #10) (Pre-Recorded Talk) »    Split conformal prediction is a computationally e cient method for performing distribution-free predictive inference in regression. It involves, however, a one-time random split of the data, and the result can strongly depend on the particular split. To address this problem, we propose multi split conformal prediction, a simple method based on Markov's inequality to aggregate split conformal prediction intervals across multiple splits. Aldo Solari, Vera Djordjilovic Sat 5:00 p.m. - 5:45 p.m. Talk by Jing Lei (Live Talk) Sat 5:45 p.m. - 5:54 p.m. Robust Validation: Confident Predictions Even When Distributions Shift (Spotlight #11) (Pre-Recorded Talk) »    While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy---coming from robust statistics and optimization---is thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an f-divergence ball around the training population. The method, based on conformal inference, achieves (nearly) valid coverage in finite samples, under only the condition that the training data be exchangeable. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it; we develop estimators and prove their consistency for protection and validity of uncertainty estimates under shifts. By experimenting on several large-scale benchmark datasets, including Recht et al.'s CIFAR-v4 and ImageNet-V2 datasets, we provide complementary empirical results that highlight the importance of robust predictive validity. Suyash Gupta Sat 5:54 p.m. - 6:03 p.m. MAPIE: Model Agnostic Prediction Interval Estimator (Spotlight #12) (Pre-Recorded Talk) »    Estimating uncertainties associated with the predictions of machine learning models is of crucial importance to assess their robustness and predictive power. Recently, new distribution-free methods have emerged and allow to compute uncertainties with strong theoretical guarantees without making any assumption on the model nor on the underlying data distribution. In this paper, we introduce MAPIE (Model Agnostic Prediction Interval Estimator), an open-source python package following the standard scikit-learn API and implementing recent resampling methods such as the jackknife+. The package is available at this url: https://github.com/simai-ml/MAPIE. Vianney Taquet, Gregoire Martinon, Nicolas J-B Brunel Sat 6:03 p.m. - 6:12 p.m. Exact Optimization of Conformal Predictors via Incremental and Decremental Learning (Spotlight #13) (Pre-Recorded Talk) »    Conformal Predictors (CP) are wrappers around ML models, providing error guarantees under weak assumptions on the data distribution. They are suitable for a wide range of problems, from classification and regression to anomaly detection. Unfortunately, their very high computational complexity limits their applicability to large datasets. In this work, we show that it is possible to speed up a CP classifier considerably, by studying it in conjunction with the underlying ML method, and by exploiting incremental\&decremental learning. For methods such as k-NN, KDE, and kernel LS-SVM, our approach reduces the running time by one order of magnitude, whilst producing exact solutions. With similar ideas, we also achieve a linear speed up for the harder case of bootstrapping. Finally, we extend these techniques to improve upon an optimization of k-NN CP for regression. We evaluate our findings empirically, and discuss when methods are suitable for CP optimization. Giovanni Cherubin, Konstantinos Chatzikokolakis, Martin Jaggi Sat 6:12 p.m. - 6:21 p.m. LOOD: Localization-based Uncertainty Estimation for Medical Imaging (Spotlight #14) (Pre-Recorded Talk) »    Detecting out-of-distribution (OOD) inputs is a central challenge for safely deploying machine learning models in the real world. Existing solutions are mainly driven by small-scale natural image datasets and are far from readily usable for safety-critical domains such as medical imaging diagnosis. In this paper, we bridge this critical gap by proposing a localization-based OOD detection framework LOOD, which demonstrates substantial improvement over previous methods. Our key idea is to estimate the OOD score from a localized feature region that is highly indicative of the disease label, as opposed to averaging signals from all spatial locations. We achieve this by devising a specialized pooling mechanism termed selective pooling, which yields OOD scores that better distinguish between the in-distribution and OOD data. We evaluate the model trained on a large-scale clinical chest X-ray dataset against five diverse OOD datasets. LOOD establishes superior performance on this challenging task, reducing the average FPR95 by up to 57.83%. Yiyou Sun, Sharon Li Sat 6:21 p.m. - 6:30 p.m. Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration (Spotlight #15) (Pre-Recorded Talk) »    Decision makers often need to rely on imperfect probabilistic forecasts. While average performance metrics are typically available, it is difficult to assess the quality of individual forecasts and the corresponding utilities without strong assumptions on the data distribution. To convey confidence about individual predictions to decision-makers, we propose a compensation mechanism ensuring that the forecasted utility matches the actually accrued utility. While a naive scheme to compensate decision-makers for prediction errors can be exploited and might not be sustainable in the long run, we propose a mechanism based on fair bets and online learning that provably cannot be exploited. We demonstrate an application showing how passengers could confidently optimize individual travel plans based on flight delay probabilities estimated by an airline. Shengjia Zhao Sat 6:30 p.m. - 6:45 p.m. Water Break with Gather (Gather)  link » Sat 6:45 p.m. - 7:03 p.m. Talk by Kilian Weinberger (Introduction by moderator) Kilian Q Weinberger Sat 7:03 p.m. - 8:05 p.m. Poster Session #2 (Poster Session) »  link » Poster session on gather.town: https://eventhosts.gather.town/gqcpQdCtrcBumwvo/dist-free-uq-poster-2 Link » Sat 8:05 p.m. - 8:50 p.m. Talk by Emmanuel Candes (Pre-Recorded Talk) Emmanuel J Candes - DFUQ poster 1 -- Robust validation: Confident predictions even when distributions shift (poster presentation) - DFUQ poster 2 -- An Automatic Finite Sample Robustness Metric (poster presentation) - DFUQ poster 1 -- Deep Quantile Aggregation (poster presentation) - DFUQ poster 1 -- Sequential Regression Using Metamodels (poster presentation) - DFUQ poster 2 -- An Approximate Parallel Tempering for Uncertainty Quantification in Deep Learning (poster presentation) - DFUQ poster 1 -- Testing for Outliers with Conformal p-values (poster presentation) - DFUQ poster 1 -- Probabilistic Forecasting: A Level Set Approach (poster presentation) - DFUQ poster 1 -- Distribution Free Uncertainty for the Minimum Norm Solution of Over-parameterized Linear Regression (poster presentation) - DFUQ poster 1 -- Conformal Histogram Regression (poster presentation) - DFUQ poster 1 -- Learning Quantile Function without Quantile Crossing for Distribution-free Time Series Forecasting (poster presentation) - DFUQ poster 1 -- Adaptive Conformal Inference Under Distribution Shift (poster presentation) - DFUQ poster 1 -- Conformal Uncertainty Sets for Robust Optimization (poster presentation) - DFUQ poster 2 -- Reliable Decisions With Threshold Calibration (poster presentation) - DFUQ poster 1 -- Learning Prediction Intervals for Model Performance (poster presentation) - DFUQ poster 1 -- CD Split and HPD Split (poster presentation) - DFUQ poster 1 -- Bayesian Triplet Loss (poster presentation) - DFUQ poster 2 -- Conformal Prediction for Simulation Models (poster presentation) - DFUQ poster 1 -- MD-split+: Practical Local Conformal Inference in High Dimensions (poster presentation) - DFUQ poster 1 -- How Nonconformity Functions and Difficulty of Datasets Impact the Efficiency of Conformal Classifiers (poster presentation) - DFUQ poster 1 -- Distribution-Independent Confidence Intervals for the Eigendecomposition of Covariance Matrices via the Eigenvalue-Eigenvector Identity (poster presentation) - DFUQ poster 1 -- Bayesian Crowd Counting (poster presentation) - DFUQ poster 1 -- NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural Networks (poster presentation) - DFUQ poster 1 -- Efficient Conformal Prediction via Cascaded Inference with Expanded Admission (poster presentation) - DFUQ poster 1 -- Few Shot Conformal Prediction with Auxiliary Tasks (poster presentation) - DFUQ poster 2 -- Using Conformalized Prediction of Performance to Make Learning more Transparent (poster presentation) - DFUQ poster 1 -- Prediction Intervals for Active Learning (poster presentation) - DFUQ poster 2 -- Consistent Accelerated Inference via Confident Adaptive Transformers (poster presentation) - DFUQ poster 1 -- Understanding The Under-Coverage Bias in Uncertainty Estimation (poster presentation) - DFUQ poster 1 -- T-SCI: A Two-Stage Conformal Inference Algorithm with Guaranteed Coverage for Cox-MLP (poster presentation) - DFUQ poster 1 -- Online Multivalid Learning (poster presentation) - DFUQ poster 2 -- Finite-sample Eﬀicient Conformal Prediction (poster presentation) - DFUQ poster 1 -- PAC Prediction Sets Under Covariate Shift (poster presentation) - DFUQ poster 2 -- Uncertainty Quantification ForAmniotic Fluid Segmentation And Volume Prediction (poster presentation) - DFUQ poster 2 -- Conformal Anomaly Detection on Spatio-Temporal Observations with Missing Data (poster presentation) - DFUQ poster 2 -- Nested Conformal (poster presentation) - DFUQ poster 1 -- Top Label Calibration (poster presentation) - DFUQ poster 1 -- Distribution Free UQ for Classification Under Label Shift (poster presentation) - DFUQ poster 2 -- Calibrating Predictions to Decisions (poster presentation) - DFUQ poster 2 -- Root-finding Approaches for Computing Conformal Prediction Sets (poster presentation) - DFUQ poster 1 -- Cross-validation Confidence Intervals For Test Error (poster presentation) - DFUQ poster 1 -- Improving Conditional Coverage via Orthogonal Quantile Regression (poster presentation) - DFUQ poster 1 -- Estimation and Inference on Nonlinear Heterogeneous Effects (poster presentation) - DFUQ poster 2 -- Locally Valid and Discriminative Confidence Intervals for Deep Learning Models (poster presentation) - DFUQ poster 2 -- Interval Deep Learning (poster presentation) - DFUQ poster 1 -- Training Models For Uncertainty Quantification (poster presentation) - DFUQ poster 1 -- Distribution-free inference for regression discrete, continuous, and in between (poster presentation) - DFUQ poster 2 -- Right Decisions from Wrong Predictions (poster presentation) - DFUQ poster 1 -- Conformal Prediction with Localized Decorrelation (poster presentation) - DFUQ poster 1 -- Distribution-free Conditional Median Inference (poster presentation) - DFUQ poster 1 -- Copula-based Conformal Prediction for Multi Target Regression (poster presentation) - DFUQ poster 2 -- Weakly Conformalized Predictive Sets with Partial Supervision (poster presentation) - DFUQ poster 1 -- Conformalized Survival Analysis (poster presentation)