Share your thoughts, 1 month free Claude Pro on usSee more

Policy Question Answering on PolicyBench Chinese

65.33Accuracy

QwQ-32B

Updated 3mo ago

Evaluation Results

Method	Links
QwQ-32B 2026.04		65.33
Gemini-2.5-Flash 2026.04		63.6
Claude-3.7-sonnet 2026.04		63.19
Deepseek-R1 2026.04		62.24
Claude-3.5-Sonnet 2026.04		62.11
o4-mini 2026.04		60.41
Gemini-2.0-Flash 2026.04		59.35
Deepseek-V3 2026.04		58.82
LLaMA-4 2026.04		58.3
GPT-4o 2026.04		57.53
Gemma 3-27B 2026.04		56.27