I'll talk about one specific problem I have with the field: scale. Many papers fix an architecture and try to improve log-likelihood, comparing to the original base architecture regardless of how much additional compute is used to outperform the original model. Yet, if we adjust for scale—for example, compare an ensemble of size 10 to a model scaled up 10x—we'd see improvements significantly diminish or vanish altogether. Ultimately, we should be examining the frontier of uncertainty-robustness performance as a function of compute. I'll substantiate this perspective with a few works with colleagues. These works advance the frontier with efficient ensembles alongside priors and inductive biases; and we'll examine uncertainty properties of existing giant models.