Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy Question Answering on PolicyBench Chinese
Loading...
65.33
Accuracy
QwQ-32B
55.9076
58.3538
60.8
63.2462
Apr 14, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
QwQ-32B
2026.04
65.33
Gemini-2.5-Flash
2026.04
63.6
Claude-3.7-sonnet
2026.04
63.19
Deepseek-R1
2026.04
62.24
Claude-3.5-Sonnet
2026.04
62.11
o4-mini
2026.04
60.41
Gemini-2.0-Flash
2026.04
59.35
Deepseek-V3
2026.04
58.82
LLaMA-4
2026.04
58.3
GPT-4o
2026.04
57.53
Gemma 3-27B
2026.04
56.27
Feedback
Search any
task
Search any
task