Workshop
Workshop on DistributionFree Uncertainty Quantification
Anastasios Angelopoulos · Stephen Bates · Yixuan Li · Aaditya Ramdas · Ryan Tibshirani
Visit https://sites.google.com/berkeley.edu/dfuq21/ for details!
While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decisionmaking. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. A recent line of work we call distributionfree predictive inference (i.e., conformal prediction and related methods) has developed a set of methods that give finitesample statistical guarantees for any (possibly incorrectly specified) predictive model and any (unknown) underlying distribution of the data, ensuring reliable uncertainty quantification (UQ) for many prediction tasks. This line of work represents a promising new approach to UQ with complex prediction systems but is relatively unknown in the applied machine learning community. Moreover, much remains to be done integrating distributionfree methods with existing approaches to UQ via calibration (e.g. with temperature scaling)  little work has been done to bridge these two worlds. To facilitate the emerging topics on distributionfree methods, the proposed workshop has two goals. First, to bring together researchers in distributionfree methods with researchers specializing in calibration techniques to catalyze work at this interface. Second, to introduce distributionfree methods to a wider ML audience. Given the important recent emphasis on the reliable realworld performance of ML models, we believe a large fraction of ICML attendees will find this workshop highly relevant.
Schedule
Sat 7:25 a.m.  7:30 a.m.

Introduction to Conformal Prediction
(
Introduction
)
SlidesLive Video 
Anastasios Angelopoulos 🔗 
Sat 7:30 a.m.  8:15 a.m.

Talk by Rina Barber
(
Live Talk
)
SlidesLive Video 
🔗 
Sat 8:15 a.m.  9:15 a.m.

Panel with Michael I. Jordan, Vladimir Vovk, and Larry Wasserman, moderated by Aaditya Ramdas
(
Discussion Panel
)
SlidesLive Video 
🔗 
Sat 9:15 a.m.  9:24 a.m.

FewShot Conformal Prediction with Auxiliary Tasks (Spotlight #1)
(
PreRecorded Talk
)
SlidesLive Video We develop a novel approach to conformal prediction when the target task has limited data available for training. Conformal prediction identifies a small set of promising output candidates in place of a single prediction, with guarantees that the set contains the correct answer with high probability. When training data is limited, however, the predicted set can easily become unusably large. In this work, we obtain substantially tighter prediction sets while maintaining desirable marginal guarantees by casting conformal prediction as a metalearning paradigm over exchangeable collections of auxiliary tasks. Our conformalization algorithm is simple, fast, and agnostic to the choice of underlying model, learning algorithm, or dataset. We demonstrate the effectiveness of this approach across a number of fewshot classification and regression tasks in natural language processing, computer vision, and computational chemistry for drug discovery. 
Adam Fisch 🔗 
Sat 9:24 a.m.  9:33 a.m.

Online Multivalid Learning: Means, Moments, and Prediction Intervals (Spotlight #2)
(
PreRecorded Talk
)
SlidesLive Video
We present a general, efficient technique for providing contextual predictions that are ``multivalid'' in various senses, against an online sequence of adversarially chosen examples $(x,y)$. This means that the resulting estimates correctly predict various statistics of the labels $y$ not just \emph{marginally}  as averaged over the sequence of examples  but also conditionally on $x \in G$ for any $G$ belonging to an arbitrary intersecting collection of groups $\cG$.
We provide three instantiations of this framework. The first is mean prediction, which corresponds to an online algorithm satisfying the notion of multicalibration from \cite{multicalibration}. The second is variance and higher moment prediction, which corresponds to an online algorithm satisfying the notion of meanconditioned moment multicalibration from \cite{momentmulti}. Finally, we define a new notion of prediction interval multivalidity, and give an algorithm for finding prediction intervals which satisfy it. Because our algorithms handle adversarially chosen examples, they can equally well be used to predict statistics of the residuals of arbitrary point prediction methods, giving rise to very general techniques for quantifying the uncertainty of predictions of black box algorithms, even in an online adversarial setting. When instantiated for prediction intervals, this solves a similar problem as conformal prediction, but in an adversarial environment and with multivalidity guarantees stronger than simple marginal coverage guarantees.

Christopher Jung 🔗 
Sat 9:33 a.m.  9:42 a.m.

Nested Conformal Prediction Sets for Classification with Applications to Probation Data (Spotlight #3)
(
PreRecorded Talk
)
SlidesLive Video Risk assessments to help inform criminal justice decisions have been used in the United States since the 1920s. Over the past several years, statistical learning risk algorithms have been introduced amid much controversy about fairness, transparency and accuracy. In this paper, we focus on accuracy for a large department of probation and parole that is considering a major revision of its current, statistical learning risk methods. Because the content of each offender’s supervision is substantially shaped by a forecast of subsequent conduct, forecasts have real consequences. Here we consider the probability that risk forecasts are correct. We augment standard statistical learning estimates of forecasting uncertainty (i.e., confusion tables) with uncertainty estimates from nested conformal prediction sets. In a demonstration of concept using data from the department of probation and parole, we show that the standard uncertainty measures and uncertainty measures from nested conformal prediction sets can differ dramatically in concept and output. We also provide a modification of nested conformal called the localized conformal method to match confusion tables more closely when possible. A strong case can be made favoring the nested and localized conformal approach. As best we can tell, our formulation of such comparisons and consequent recommendations is novel. 
Richard Berk 🔗 
Sat 9:42 a.m.  9:51 a.m.

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods (Spotlight #4)
(
PreRecorded Talk
)
SlidesLive Video The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular. 
Eyke Hüllermeier 🔗 
Sat 9:51 a.m.  10:00 a.m.

Bayesoptimal prediction with frequentist coverage control (Spotlight #5)
(
PreRecorded Talk
)
SlidesLive Video We illustrate how indirect or prior information can be optimally used to construct a prediction region that maintains a target frequentist coverage rate. If the indirect information is accurate, the volume of the prediction region is lower on average than that of other regions with the same coverage rate. Even if the indirect information is inaccurate, the resulting region still maintains the target coverage rate. Such a prediction region can be constructed for models that have a complete sufficient statistic, which includes many widelyused parametric and nonparametric models. Particular examples include a Bayesoptimal conformal prediction procedure that maintains a constant coverage rate across distributions in a nonparametric model, as well as a prediction procedure for the normal linear regression model that can utilize a regularizing prior distribution, yet maintain a frequentist coverage rate that is constant as a function of the model parameters and explanatory variables. No results rely on asymptotic approximations. 
Peter Hoff 🔗 
Sat 10:00 a.m.  10:15 a.m.

Water Break with Gather
(
Gather
)
link
Water break with gather.town 
🔗 
Sat 10:15 a.m.  11:00 a.m.

Talk by Leying Guan
(
Live Talk
)
SlidesLive Video 
🔗 
Sat 11:00 a.m.  12:00 p.m.

Poster Session #1
(
Poster Session
)
Poster session in two gather.town rooms: [ protected link dropped ] [ protected link dropped ] 
🔗 
Sat 12:00 p.m.  4:15 p.m.

Break
(
Break
)

🔗 
Sat 4:13 p.m.  4:15 p.m.

Welcome back
(
Live introduction by moderator
)

🔗 
Sat 4:15 p.m.  4:24 p.m.

Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures (Spotlight #6)
(
PreRecorded Talk
)
SlidesLive Video Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in highstakes, realworld settings. Currently, the quality of a model's uncertainty is evaluated using pointprediction metrics such as the negative loglikelihood (NLL), expected calibration error (ECE) or the Brier score on heldout data. Marginal coverage of prediction intervals or sets, a wellknown concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for this class of models. With marginal coverage, and the complementary notion of the width of a prediction interval, downstream users of a deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a persample basis. In this study, we provide the first large scale evaluation of the empirical frequentist coverage properties of well known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on in distribution samples, but that coverage is not maintained on outofdistribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for realworld applications. 
Benjamin Kompa 🔗 
Sat 4:24 p.m.  4:33 p.m.

Toplabel calibration (Spotlight #7)
(
PreRecorded Talk
)
SlidesLive Video We study the problem of posthoc calibration for multiclass classification, with an emphasis on histogram binning. Multiple works have focused on calibration with respect to the confidence of just the predicted class (or 'toplabel'). We find that the popular notion of confidence calibration [Guo et al., 2017] is not sufficiently strong  there exist predictors that are not calibrated in any meaningful way but are perfectly confidence calibrated. We propose a closely related (but subtly different) notion, toplabel calibration, that accurately captures the intuition and simplicity of confidence calibration, but addresses its drawbacks. We formalize a histogram binning (HB) algorithm that reduces toplabel multiclass calibration to the binary case, prove that it has clean theoretical guarantees without distributional assumptions, and perform a methodical study of its practical performance. Some prediction tasks require stricter notions of multiclass calibration such as classwise or canonical calibration. We formalize appropriate HB algorithms corresponding to each of these goals. In experiments with deep neural nets, we find that our principled versions of HB are often better than temperature scaling, for both toplabel and classwise calibration. Code for this work will be made publicly available at https://github.com/aigen/dfposthoccalibration. 
Chirag Gupta 🔗 
Sat 4:33 p.m.  4:42 p.m.

Understanding the UnderCoverage Bias in Uncertainty Estimation (Spotlight #8)
(
PreRecorded Talk
)
SlidesLive Video
Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input. It is frequently observed that quantile regressiona vanilla algorithm for learning quantiles with asymptotic guaranteestends to \emph{undercover} than the desired coverage level in reality. While various fixes have been proposed, a more fundamental understanding of why this undercoverage bias happens in the first place remains elusive.
In this paper, we present a rigorous theoretical study on the coverage of uncertainty estimation algorithms in learning quantiles. We prove that quantile regression suffers from an inherent undercoverage bias, in a vanilla setting where we learn a realizable linear quantile function and there is more data than parameters. More quantitatively, for $\alpha>0.5$ and small $d/n$, the $\alpha$quantile learned by quantile regression roughly achieves coverage $\alpha  (\alpha1/2)\cdot d/n$ regardless of the noise distribution, where $d$ is the input dimension and $n$ is the number of training data. Our theory reveals that this undercoverage bias stems from a certain highdimensional parameter estimation error that is not implied by existing theories on quantile regression. Experiments on simulated and real data verify our theory and further illustrate the effect of various factors such as sample size and model capacity on the undercoverage bias in more practical setups.

Yu Bai 🔗 
Sat 4:42 p.m.  4:51 p.m.

A Conformal Approach for Functional Prediction Bands (Spotlight #9)
(
PreRecorded Talk
)
SlidesLive Video We propose a new nonparametric approach in the field of Conformal Prediction based on a new family of nonconformity measures inducing conformal predictors able to create closedform finitesample valid or exact prediction sets for functional data under very minimal distributional assumptions. Our proposal ensures that the prediction sets obtained are bands, an essential feature in the functional setting that allows the viualization and interpretation of such sets. The procedure is also fast, scalable, does not rely on functional dimension reduction techniques and allows the user to select different nonconformity measures depending on the problem at hand always obtaining valid bands. Within this family of measures, we propose also a specific measure leading to prediction bands asymptotically no less efficient than those with constant width. 
Matteo Fontana 🔗 
Sat 4:51 p.m.  5:00 p.m.

Multi Split Conformal Prediction (Spotlight #10)
(
PreRecorded Talk
)
SlidesLive Video Split conformal prediction is a computationally e cient method for performing distributionfree predictive inference in regression. It involves, however, a onetime random split of the data, and the result can strongly depend on the particular split. To address this problem, we propose multi split conformal prediction, a simple method based on Markov's inequality to aggregate split conformal prediction intervals across multiple splits. 
Aldo Solari · Vera Djordjilovic 🔗 
Sat 5:00 p.m.  5:45 p.m.

Talk by Jing Lei
(
Live Talk
)
SlidesLive Video 
🔗 
Sat 5:45 p.m.  5:54 p.m.

Robust Validation: Confident Predictions Even When Distributions Shift (Spotlight #11)
(
PreRecorded Talk
)
SlidesLive Video While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategycoming from robust statistics and optimizationis thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an fdivergence ball around the training population. The method, based on conformal inference, achieves (nearly) valid coverage in finite samples, under only the condition that the training data be exchangeable. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it; we develop estimators and prove their consistency for protection and validity of uncertainty estimates under shifts. By experimenting on several largescale benchmark datasets, including Recht et al.'s CIFARv4 and ImageNetV2 datasets, we provide complementary empirical results that highlight the importance of robust predictive validity. 
Suyash Gupta 🔗 
Sat 5:54 p.m.  6:03 p.m.

MAPIE: Model Agnostic Prediction Interval Estimator (Spotlight #12)
(
PreRecorded Talk
)
SlidesLive Video Estimating uncertainties associated with the predictions of machine learning models is of crucial importance to assess their robustness and predictive power. Recently, new distributionfree methods have emerged and allow to compute uncertainties with strong theoretical guarantees without making any assumption on the model nor on the underlying data distribution. In this paper, we introduce MAPIE (Model Agnostic Prediction Interval Estimator), an opensource python package following the standard scikitlearn API and implementing recent resampling methods such as the jackknife+. The package is available at this url: https://github.com/simaiml/MAPIE. 
Vianney Taquet · Gregoire Martinon · Nicolas JB Brunel 🔗 
Sat 6:03 p.m.  6:12 p.m.

Exact Optimization of Conformal Predictors via Incremental and Decremental Learning (Spotlight #13)
(
PreRecorded Talk
)
SlidesLive Video Conformal Predictors (CP) are wrappers around ML models, providing error guarantees under weak assumptions on the data distribution. They are suitable for a wide range of problems, from classification and regression to anomaly detection. Unfortunately, their very high computational complexity limits their applicability to large datasets. In this work, we show that it is possible to speed up a CP classifier considerably, by studying it in conjunction with the underlying ML method, and by exploiting incremental\&decremental learning. For methods such as kNN, KDE, and kernel LSSVM, our approach reduces the running time by one order of magnitude, whilst producing exact solutions. With similar ideas, we also achieve a linear speed up for the harder case of bootstrapping. Finally, we extend these techniques to improve upon an optimization of kNN CP for regression. We evaluate our findings empirically, and discuss when methods are suitable for CP optimization. 
Giovanni Cherubin · Konstantinos Chatzikokolakis · Martin Jaggi 🔗 
Sat 6:12 p.m.  6:21 p.m.

LOOD: Localizationbased Uncertainty Estimation for Medical Imaging (Spotlight #14)
(
PreRecorded Talk
)
SlidesLive Video Detecting outofdistribution (OOD) inputs is a central challenge for safely deploying machine learning models in the real world. Existing solutions are mainly driven by smallscale natural image datasets and are far from readily usable for safetycritical domains such as medical imaging diagnosis. In this paper, we bridge this critical gap by proposing a localizationbased OOD detection framework LOOD, which demonstrates substantial improvement over previous methods. Our key idea is to estimate the OOD score from a localized feature region that is highly indicative of the disease label, as opposed to averaging signals from all spatial locations. We achieve this by devising a specialized pooling mechanism termed selective pooling, which yields OOD scores that better distinguish between the indistribution and OOD data. We evaluate the model trained on a largescale clinical chest Xray dataset against five diverse OOD datasets. LOOD establishes superior performance on this challenging task, reducing the average FPR95 by up to 57.83%. 
Yiyou Sun · Sharon Li 🔗 
Sat 6:21 p.m.  6:30 p.m.

Calibrating Predictions to Decisions: A Novel Approach to MultiClass Calibration (Spotlight #15)
(
PreRecorded Talk
)
SlidesLive Video Decision makers often need to rely on imperfect probabilistic forecasts. While average performance metrics are typically available, it is difficult to assess the quality of individual forecasts and the corresponding utilities without strong assumptions on the data distribution. To convey confidence about individual predictions to decisionmakers, we propose a compensation mechanism ensuring that the forecasted utility matches the actually accrued utility. While a naive scheme to compensate decisionmakers for prediction errors can be exploited and might not be sustainable in the long run, we propose a mechanism based on fair bets and online learning that provably cannot be exploited. We demonstrate an application showing how passengers could confidently optimize individual travel plans based on flight delay probabilities estimated by an airline. 
Shengjia Zhao 🔗 
Sat 6:30 p.m.  6:45 p.m.

Water Break with Gather ( Gather ) link  🔗 
Sat 6:45 p.m.  7:03 p.m.

Talk by Kilian Weinberger
(
Introduction by moderator
)
SlidesLive Video 
Kilian Q Weinberger 🔗 
Sat 7:03 p.m.  8:05 p.m.

Poster Session #2
(
Poster Session
)
link

🔗 
Sat 8:05 p.m.  8:50 p.m.

Talk by Emmanuel Candes
(
PreRecorded Talk
)
SlidesLive Video 
Emmanuel J Candes 🔗 


DFUQ poster 1  Robust validation: Confident predictions even when distributions shift
(
poster presentation
)

🔗 


DFUQ poster 2  An Automatic Finite Sample Robustness Metric
(
poster presentation
)

🔗 


DFUQ poster 1  Deep Quantile Aggregation
(
poster presentation
)

🔗 


DFUQ poster 1  Sequential Regression Using Metamodels
(
poster presentation
)

🔗 


DFUQ poster 2  An Approximate Parallel Tempering for Uncertainty Quantification in Deep Learning
(
poster presentation
)

🔗 


DFUQ poster 1  Testing for Outliers with Conformal pvalues
(
poster presentation
)

🔗 


DFUQ poster 1  Probabilistic Forecasting: A Level Set Approach
(
poster presentation
)

🔗 


DFUQ poster 1  Distribution Free Uncertainty for the Minimum Norm Solution of Overparameterized Linear Regression
(
poster presentation
)

🔗 


DFUQ poster 1  Conformal Histogram Regression
(
poster presentation
)

🔗 


DFUQ poster 1  Learning Quantile Function without Quantile Crossing for Distributionfree Time Series Forecasting
(
poster presentation
)

🔗 


DFUQ poster 1  Adaptive Conformal Inference Under Distribution Shift
(
poster presentation
)

🔗 


DFUQ poster 1  Conformal Uncertainty Sets for Robust Optimization
(
poster presentation
)

🔗 


DFUQ poster 2  Reliable Decisions With Threshold Calibration
(
poster presentation
)

🔗 


DFUQ poster 1  Learning Prediction Intervals for Model Performance
(
poster presentation
)

🔗 


DFUQ poster 1  CD Split and HPD Split
(
poster presentation
)

🔗 


DFUQ poster 1  Bayesian Triplet Loss
(
poster presentation
)

🔗 


DFUQ poster 2  Conformal Prediction for Simulation Models
(
poster presentation
)

🔗 


DFUQ poster 1  MDsplit+: Practical Local Conformal Inference in High Dimensions
(
poster presentation
)

🔗 


DFUQ poster 1  How Nonconformity Functions and Difficulty of Datasets Impact the Efficiency of Conformal Classifiers
(
poster presentation
)

🔗 


DFUQ poster 1  DistributionIndependent Confidence Intervals for the Eigendecomposition of Covariance Matrices via the EigenvalueEigenvector Identity
(
poster presentation
)

🔗 


DFUQ poster 1  Bayesian Crowd Counting
(
poster presentation
)

🔗 


DFUQ poster 1  NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural Networks
(
poster presentation
)

🔗 


DFUQ poster 1  Efficient Conformal Prediction via Cascaded Inference with Expanded Admission
(
poster presentation
)

🔗 


DFUQ poster 1  Few Shot Conformal Prediction with Auxiliary Tasks
(
poster presentation
)

🔗 


DFUQ poster 2  Using Conformalized Prediction of Performance to Make Learning more Transparent
(
poster presentation
)

🔗 


DFUQ poster 1  Prediction Intervals for Active Learning
(
poster presentation
)

🔗 


DFUQ poster 2  Consistent Accelerated Inference via Confident Adaptive Transformers
(
poster presentation
)

🔗 


DFUQ poster 1  Understanding The UnderCoverage Bias in Uncertainty Estimation
(
poster presentation
)

🔗 


DFUQ poster 1  TSCI: A TwoStage Conformal Inference Algorithm with Guaranteed Coverage for CoxMLP
(
poster presentation
)

🔗 


DFUQ poster 1  Online Multivalid Learning
(
poster presentation
)

🔗 


DFUQ poster 2  Finitesample Eﬀicient Conformal Prediction
(
poster presentation
)

🔗 


DFUQ poster 1  PAC Prediction Sets Under Covariate Shift
(
poster presentation
)

🔗 


DFUQ poster 2  Uncertainty Quantification ForAmniotic Fluid Segmentation And Volume Prediction
(
poster presentation
)

🔗 


DFUQ poster 2  Conformal Anomaly Detection on SpatioTemporal Observations with Missing Data
(
poster presentation
)

🔗 


DFUQ poster 2  Nested Conformal
(
poster presentation
)

🔗 


DFUQ poster 1  Top Label Calibration
(
poster presentation
)

🔗 


DFUQ poster 1  Distribution Free UQ for Classification Under Label Shift
(
poster presentation
)

🔗 


DFUQ poster 2  Calibrating Predictions to Decisions
(
poster presentation
)

🔗 


DFUQ poster 2  Rootfinding Approaches for Computing Conformal Prediction Sets
(
poster presentation
)

🔗 


DFUQ poster 1  Crossvalidation Confidence Intervals For Test Error
(
poster presentation
)

🔗 


DFUQ poster 1  Improving Conditional Coverage via Orthogonal Quantile Regression
(
poster presentation
)

🔗 


DFUQ poster 1  Estimation and Inference on Nonlinear Heterogeneous Effects
(
poster presentation
)

🔗 


DFUQ poster 2  Locally Valid and Discriminative Confidence Intervals for Deep Learning Models
(
poster presentation
)

🔗 


DFUQ poster 2  Interval Deep Learning
(
poster presentation
)

🔗 


DFUQ poster 1  Training Models For Uncertainty Quantification
(
poster presentation
)

🔗 


DFUQ poster 1  Distributionfree inference for regression discrete, continuous, and in between
(
poster presentation
)

🔗 


DFUQ poster 2  Right Decisions from Wrong Predictions
(
poster presentation
)

🔗 


DFUQ poster 1  Conformal Prediction with Localized Decorrelation
(
poster presentation
)

🔗 


DFUQ poster 1  Distributionfree Conditional Median Inference
(
poster presentation
)

🔗 


DFUQ poster 1  Copulabased Conformal Prediction for Multi Target Regression
(
poster presentation
)

🔗 


DFUQ poster 2  Weakly Conformalized Predictive Sets with Partial Supervision
(
poster presentation
)

🔗 


DFUQ poster 1  Conformalized Survival Analysis
(
poster presentation
)

🔗 