Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Dialogue Evaluation on MathDial (test)

21Accuracy

DPO

Updated 4mo ago

Evaluation Results

Method	Links
DPO 2025.06		21	0	1.85	1.9	2.41	2	1.77	1.58
StepDPO 2025.06		20.67	0	1.86	1.87	2.43	2.03	1.78	1.55
STaR-GATE-D 2025.06		20	0	1.55	1.63	1.95	1.79	1.51	1.31
STaR-GATE 2025.06		10.67	0	2.29	2.25	2.94	2.58	2.3	1.56
Swift 2025.06		9.67	0	2.43	2.41	3.09	2.68	2.45	1.66
Refit 2025.06		6	0	2.43	2.5	3.15	2.68	2.41	1.63
Base 2025.06		0	0	1.9	2.2	2.54	1.81	2.01	1.2