Share your thoughts, 1 month free Claude Pro on usSee more

General Language Model Evaluation on Average of 10 tasks

45.02Overall Performance

T-SPIN

Updated 5mo ago

Evaluation Results

Method	Links
T-SPIN 2026.01		45.02	0.12
T-SPIN 2026.01		44.9	1.34
SFT 2026.01		44.17	-
T-SPIN 2026.01		43.56	0.24
T-SPIN 2026.01		43.32	0.61
T-SPIN 2026.01		42.71	-
SPIN 2026.01		42.6	-
SPIN 2026.01		42.52	-0.08
Mistral-7B 2026.01		42.51	-
SPIN 2026.01		42.32	1.02
SPIN 2026.01		42.3	-0.22
SPIN 2026.01		41.3	-1