Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Self-doubt detection on MuSR 90-trace
Loading...
83.66
AUROC (Self-doubt)
SELFDOUBT
63.9936
69.0993
74.205
79.3107
Apr 7, 2026
AUROC (Self-doubt)
Delta
Updated 9d ago
Evaluation Results
Method
Method
Links
AUROC (Self-doubt)
Delta
SELFDOUBT
Model=gpt-oss-120b
2026.04
83.66
0.004
SELFDOUBT
Model=Claude Sonnet 4.6
2026.04
83.58
0
SELFDOUBT
Model=Grok 4.1 Fast
2026.04
80.89
0.001
SELFDOUBT
Model=gpt-oss-20b
2026.04
78.67
0.004
SELFDOUBT
Model=Qwen3-14B
2026.04
78.53
0.002
SELFDOUBT
Model=Qwen3
2026.04
76.54
0.009
SELFDOUBT
Model=Gemini 2.5 Flash
2026.04
64.75
0.04
Feedback
Search any
task
Search any
task