Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Uncertainty Calibration on SciBench
Loading...
78.7
AUROC
PE
54.988
61.144
67.3
73.456
Jan 2, 2026
AUROC
Updated 4d ago
Evaluation Results
Method
Method
Links
AUROC
PE
Backbone=Qwen2.5-7B-In...
2026.01
78.7
ET-PE
Backbone=Qwen2.5-7B-In...
2026.01
78.5
p(True)
Backbone=Qwen2.5-7B-In...
2026.01
77.1
LS
Backbone=Qwen2.5-32B-I...
2026.01
76.8
ET-PE
Backbone=Qwen2.5-32B-I...
2026.01
76.7
LS
Backbone=Qwen2.5-7B-In...
2026.01
76.5
PE
Backbone=Qwen2.5-32B-I...
2026.01
76.5
LS
Backbone=Qwen2.5-14B-I...
2026.01
75.3
ET-PE
Backbone=Qwen2.5-14B-I...
2026.01
75.1
PE
Backbone=Qwen2.5-14B-I...
2026.01
73.7
p(True)
Backbone=Qwen2.5-32B-I...
2026.01
70.8
SE
Backbone=Qwen2.5-14B-I...
2026.01
63
SE
Backbone=Qwen2.5-32B-I...
2026.01
63
LN-PE
Backbone=Qwen2.5-14B-I...
2026.01
62.2
p(True)
Backbone=Qwen2.5-14B-I...
2026.01
62
LN-PE
Backbone=Qwen2.5-7B-In...
2026.01
58.9
SE
Backbone=Qwen2.5-7B-In...
2026.01
57.7
LN-PE
Backbone=Qwen2.5-32B-I...
2026.01
55.9
Feedback
Search any
task
Search any
task