Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Confidence Calibration on TriviaQA (test)
Loading...
0.065
Expected Calibration Error
DINCO
0.05984
0.09467
0.1295
0.16433
Sep 29, 2025
Expected Calibration Error
Brier Score
AUC
Updated 1mo ago
Evaluation Results
Method
Method
Links
Expected Calibration Error
Brier Score
AUC
DINCO
Backbone=Qwen3-32B
2025.09
0.065
0.131
0.863
MSP
Backbone=Qwen3-32B
2025.09
0.113
0.164
0.792
SC
Backbone=Qwen3-32B
2025.09
0.129
0.19
0.741
VC
Backbone=Qwen3-32B
2025.09
0.153
0.158
0.862
SC-VC
Backbone=Qwen3-32B
2025.09
0.166
0.213
0.707
K-VC
Backbone=Qwen3-32B
2025.09
0.186
0.211
0.737
NVC
Backbone=Qwen3-32B
2025.09
0.194
0.169
0.88
Feedback
Search any
task
Search any
task