Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Policy Question Answering on PolicyBench US

66.43Accuracy

Deepseek-R1

57.662859.938962.21564.4911Apr 14, 2026
Updated 4d ago

Evaluation Results

MethodLinks
66.43
2026.04
65.54
65.39
65.06
64.03
2026.04
61.41
60.84
60.15
2026.04
60.04
59.38
2026.04
58