Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Answerability Prediction on CODE n=30 (matched pairs)
Loading...
85
AUC
Refusal (Keyword Classifier)
34.976
47.963
60.95
73.937
May 4, 2026
AUC
F1 Score
Updated 28d ago
Evaluation Results
Method
Method
Links
AUC
F1 Score
Refusal (Keyword Classifier)
Model=Qwen
2026.05
85
83
Geometry (own_dist)
Model=Qwen
2026.05
81.8
73.3
Geometry (own_dist)
Model=Mistral
2026.05
79.6
68.9
Geometry (own_dist)
Model=Llama
2026.05
77.4
75.8
Refusal (Keyword Classifier)
Model=Mistral
2026.05
73.3
63.6
Refusal (Keyword Classifier)
Model=Llama
2026.05
63.3
42.1
SC (Self-Consistency)
Model=Mistral
2026.05
49.7
-
SC (Self-Consistency)
Model=Llama
2026.05
44.1
-
SC (Self-Consistency)
Model=Qwen
2026.05
36.9
-
Feedback
Search any
task
Search any
task