Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Correctness Prediction on Medals
Loading...
77
AUROC
Direction
48.92
56.21
63.5
70.79
Sep 12, 2025
AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
Direction
Model=Llama 3.3 70B In...
2025.09
77
Direction
Model=Llama 3.1 8B, Tr...
2025.09
68
Direction
Model=Ministral 8B Ins...
2025.09
67
Verb. conf.
Model=Llama 3.3 70B In...
2025.09
66.5
Direction
Model=Mistral 7B Instr...
2025.09
64.5
Direction
Model=DeepSeek R1 Dist...
2025.09
63.8
Assessor
Model=Mistral 7B Instr...
2025.09
63.8
Assessor
Model=Ministral 8B Ins...
2025.09
62.6
Assessor
Model=Llama 3.1 8B, As...
2025.09
62.3
Assessor
Model=Qwen 2.5 7B Inst...
2025.09
62.2
Assessor
Model=DeepSeek R1 Dist...
2025.09
60.1
Direction
Model=Qwen 2.5 7B Inst...
2025.09
58.6
Assessor
Model=Llama 3.3 70B In...
2025.09
56.8
Verb. conf.
Model=DeepSeek R1 Dist...
2025.09
56.3
Verb. conf.
Model=Mistral 7B Instr...
2025.09
55.8
Verb. conf.
Model=Qwen 2.5 7B Inst...
2025.09
53.1
Verb. conf.
Model=Ministral 8B Ins...
2025.09
50.2
Verb. conf.
Model=Llama 3.1 8B
2025.09
50
Feedback
Search any
task
Search any
task