Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Policy Question Answering on PolicyBench

64.34Accuracy

Deepseek-R1

57.964859.619961.27562.9301Apr 14, 2026
Updated 4d ago

Evaluation Results

MethodLinks
64.34
64.13
63.82
63.75
2026.04
62.98
2026.04
61.67
60.1
2026.04
59.47
2026.04
59.17
59.1
58.21