Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Policy Evaluation on PolicyBench Level 1 (CN)

62.02Accuracy

Deepseek R1

40.939246.412151.88557.3579Apr 14, 2026
Updated 4d ago

Evaluation Results

MethodLinks
62.02
2026.04
55.87
2026.04
55.29
2026.04
54.06
2026.04
53.77
2026.04
49.81
48.61
2026.04
47.87
2026.04
46.01
2026.04
45.93
41.75