Our new X account is live! Follow @wizwand_team for updates

Long-Context Reasoning on GPQA Diamond (out-of-distribution)

48.5Accuracy

VTC-R1

Updated 4d ago

Evaluation Results

Method	Links
VTC-R1 2026.01		48.5	9.77	9.57
VTC-R1 2026.01		46	10.73	6.96
SFT 2026.01		38.4	13.91	8.35
SFT 2026.01		37.4	14.78	26.88
TokenSkip 2026.01		35.9	15.45	9.93
Base SFT 2026.01		26.3	19.74	14.43