Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy Evaluation on PolicyBench Level 2 (US)
Loading...
68.95
Accuracy
Claude 3.5
57.2604
60.2952
63.33
66.3648
Apr 14, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Claude 3.5
Model Variant=Claude-3...
2026.04
68.95
Claude 3.7
Model Variant=Claude-3...
2026.04
68.23
Deepseek R1
2026.04
65.37
Gemini 2.5
Model Variant=Gemini-2...
2026.04
64.91
o4-mini
2026.04
64.71
GPT-4o
2026.04
63.4
Gemini 2.0
Model Variant=Gemini-2...
2026.04
62.25
Gemma 3-27B
2026.04
62.17
LLaMA 4
2026.04
61.17
Deepseek V3
2026.04
58.62
QwQ 32B
2026.04
57.71
Feedback
Search any
task
Search any
task