Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

About

High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compute availability for extensive tuning, incorporation of sufficiently many baselines, and concrete documentation for reproducibility. In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks. As of this writing, the collection spans 19 methods across 9 tasks, each with at least 5 metrics. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. Our goal is to provide immediate starting points for experimentation with new methods or applications. Additionally we provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results. Code available at https://github.com/google/uncertainty-baselines.

Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Qixuan Feng, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Faris Sbahi, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper Snoek, Yarin Gal, Dustin Tran• 2021

Related benchmarks

TaskDatasetResultRank
RegressionUCI ENERGY (test)
Negative Log Likelihood0.93
42
RegressionUCI CONCRETE (test)
Neg Log Likelihood3.04
37
RegressionUCI YACHT (test)
Negative Log Likelihood1.64
33
RegressionUCI POWER (test)
Negative Log Likelihood2.78
29
RegressionUCI KIN8NM (test)
NLL-1.28
25
RegressionUCI WINE (test)
Negative Log Likelihood0.97
24
RegressionUCI NAVAL (test)
Negative Log Likelihood-6.12
21
RegressionUCI PROTEIN (test)
Negative Log Likelihood2.77
8
RegressionBoston Housing UCI (test)
Negative Log-Likelihood2.54
5
Sequential optimizationMulti Optima Dim 1
AUC0.64
5
Showing 10 of 29 rows

Other info

Follow for update