Share your thoughts, 1 month free Claude Pro on usSee more

STEM Reasoning on TheoremQA

36.8Accuracy

Bingo-A

Updated 3mo ago

Evaluation Results

Method	Links
Bingo-A 2025.06		36.8	1,648	32.9
Bingo-E 2025.06		36.7	1,584	33
DAST 2025.06		35.2	2,325	29.8
Demystifying 2025.06		35.1	1,976	30.6
Effi. Reasoning 2025.06		34.8	3,560	26.2
kimi-k1.5 2025.06		34.4	2,136	29.6
Vanilla PPO 2025.06		32.3	4,146	22.7
O1-Pruner 2025.06		28.6	524	27.6