Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy Question Answering on PolicyBench
Loading...
64.34
Accuracy
Deepseek-R1
57.9648
59.6199
61.275
62.9301
Apr 14, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Deepseek-R1
2026.04
64.34
Claude-3.7-sonnet
2026.04
64.13
Gemini-2.5-Flash
2026.04
63.82
Claude-3.5-Sonnet
2026.04
63.75
o4-mini
2026.04
62.98
QwQ-32B
2026.04
61.67
Gemini-2.0-Flash
2026.04
60.1
GPT-4o
2026.04
59.47
LLaMA-4
2026.04
59.17
Deepseek-V3
2026.04
59.1
Gemma 3-27B
2026.04
58.21
Feedback
Search any
task
Search any
task