Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on PPE-P

68.3Accuracy

DeepSeek-BTRM-27B

56.13259.29162.4565.609Feb 13, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
68.3
2026.02
67.1
2026.02
67.1
2026.02
66.1
2026.02
65.8
2026.02
65.3
2026.02
65.3
2026.02
65.3
2026.02
65.1
2026.02
64.7
2026.02
64.6
2026.02
64.2
2026.02
64.1
2026.02
63.9
2026.02
63.4
2026.02
63
2026.02
62.8
2026.02
62.3
2026.02
61.1
2026.02
61
2026.02
60.6
2026.02
59.3
2026.02
56.6