Share your thoughts, 1 month free Claude Pro on usSee more

General Language Capability Evaluation on General Capability Suite Aggregate

62.51General Capability Avg. Accuracy

Base

Updated 1mo ago

Evaluation Results

Method	Links
Base 2026.06		62.51
AlphaToken 2026.06		62.29
TI-DPO 2026.06		61.61
ConfPO 2026.06		61.49
DPO 2026.06		60.8
SePO 2026.06		60.21
Base 2026.06		45.14
AlphaToken 2026.06		44.42
ConfPO 2026.06		44.3
TI-DPO 2026.06		44.28
DPO 2026.06		43.55
SePO 2026.06		42.69
Base 2026.06		41.91
AlphaToken 2026.06		41.37
TI-DPO 2026.06		40.93
ConfPO 2026.06		40.46
DPO 2026.06		40.18
SePO 2026.06		38.59