Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Answerability Prediction on MATH n=50 (matched pairs)
Loading...
84.1
AUC
Geometry (own_dist)
27.42
42.135
56.85
71.565
May 4, 2026
AUC
F1 Score
Updated 28d ago
Evaluation Results
Method
Method
Links
AUC
F1 Score
Geometry (own_dist)
Model=Llama, Protocol=...
2026.05
84.1
71.4
Geometry (own_dist)
Model=Mistral, Protoco...
2026.05
82.6
71.4
Geometry (own_dist)
Model=Qwen, Protocol=p...
2026.05
78.2
69.4
Refusal (Keyword Classifier)
Model=Mistral, Protoco...
2026.05
73
63
Refusal (Keyword Classifier)
Model=Qwen, Protocol=p...
2026.05
71
59.2
Refusal (Keyword Classifier)
Model=Llama, Protocol=...
2026.05
63
41.3
SC (Self-Consistency)
Model=Llama, Protocol=...
2026.05
62.4
-
SC (Self-Consistency)
Model=Mistral, Protoco...
2026.05
52.4
-
SC (Self-Consistency)
Model=Qwen, Protocol=p...
2026.05
29.6
-
Feedback
Search any
task
Search any
task