Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Correctness Prediction on Notable People
Loading...
82.5
AUROC
Direction
48.596
57.398
66.2
75.002
Sep 12, 2025
AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
Direction
Model=DeepSeek R1 Dist...
2025.09
82.5
Direction
Model=Qwen 2.5 7B Inst...
2025.09
80
Direction
Model=Mistral 7B Instr...
2025.09
76
Assessor
Model=Qwen 2.5 7B Inst...
2025.09
72.3
Direction
Model=Llama 3.1 8B, Tr...
2025.09
72.2
Assessor
Model=DeepSeek R1 Dist...
2025.09
70.9
Direction
Model=Llama 3.3 70B In...
2025.09
70.8
Direction
Model=Ministral 8B Ins...
2025.09
68
Assessor
Model=Mistral 7B Instr...
2025.09
67.3
Verb. conf.
Model=Qwen 2.5 7B Inst...
2025.09
63.7
Assessor
Model=Llama 3.1 8B, As...
2025.09
63
Verb. conf.
Model=Mistral 7B Instr...
2025.09
62.5
Assessor
Model=Ministral 8B Ins...
2025.09
62.3
Verb. conf.
Model=DeepSeek R1 Dist...
2025.09
60.5
Verb. conf.
Model=Llama 3.3 70B In...
2025.09
59.4
Assessor
Model=Llama 3.3 70B In...
2025.09
58.3
Verb. conf.
Model=Ministral 8B Ins...
2025.09
50
Verb. conf.
Model=Llama 3.1 8B
2025.09
49.9
Feedback
Search any
task
Search any
task