Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcement Learning from Verifiable Rewards on HEAD-QA

100AR

Always-Act

-4235077May 18, 2026
Updated 13d ago

Evaluation Results

MethodLinks
2026.05
10028.520
2026.05
10028.520
2026.05
10028.57
2026.05
99.928.40
2026.05
99.928.49
2026.05
99.828.41
2026.05
9727.71
2026.05
9326.120
2026.05
92.72620
2026.05
90.927.20
2026.05
88.925.420
2026.05
86.624.520
2026.05
84.82420
2026.05
84.62420
2026.05
84.52420
2026.05
83.924.120
2026.05
82.32420
2026.05
82.32415
2026.05
82.3240
2026.05
48.614.21
2026.05
37.8110
2026.05
16.54.80
2026.05
000
2026.05
000
2026.05
000
2026.05
000
2026.05
000
2026.05
000
2026.05
000
2026.05
000