Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Confidence Calibration on MultiNLI Mismatch (test)

0.0071ECE

MIR

0.0033280.0287890.054250.079711Jun 7, 2023
Updated 1mo ago

Evaluation Results

MethodLinks
2023.06
0.00710.01070.00260.0258
2023.06
0.00880.01070.00360.0246
2023.06
0.01030.01360.0040.0235
2023.06
0.01050.0150.00250.0251
2023.06
0.01110.01170.00360.0376
2023.06
0.01220.01350.0040.0278
2023.06
0.01240.01260.00390.0277
2023.06
0.01430.01230.00340.0261
2023.06
0.01450.01460.00370.0278
2023.06
0.01580.01560.00660.0304
2023.06
0.0170.01760.00410.0303
2023.06
0.02420.02460.00930.0356
2023.06
0.02470.02620.00850.0354
2023.06
0.04260.0440.02830.0515
2023.06
0.07530.0750.04210.0778
2023.06
0.10140.10140.0740.1033