Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Head-QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reinforcement Learning from Verifiable RewardsHEAD-QA
AR100
30
Medical Multiple-Choice Question AnsweringHead-QA (test)
Accuracy28.79
10
Question AnsweringHEAD-QA
AR (%)100
7
Question AnsweringHEAD-QA (eval)
Pathwise Violations (PathV)0
3
Showing 4 of 4 rows