Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Language Capability Evaluation on General Capability Suite Aggregate
Loading...
62.51
General Capability Avg. Accuracy
Base
37.6332
44.0916
50.55
57.0084
Jun 1, 2026
General Capability Avg. Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
General Capability Avg. Accuracy
Base
Backbone=Qwen-3.5-9B,...
2026.06
62.51
AlphaToken
Backbone=Qwen-3.5-9B,...
2026.06
62.29
TI-DPO
Backbone=Qwen-3.5-9B,...
2026.06
61.61
ConfPO
Backbone=Qwen-3.5-9B,...
2026.06
61.49
DPO
Backbone=Qwen-3.5-9B,...
2026.06
60.8
SePO
Backbone=Qwen-3.5-9B,...
2026.06
60.21
Base
Backbone=Gemma-3-4B, W...
2026.06
45.14
AlphaToken
Backbone=Gemma-3-4B, W...
2026.06
44.42
ConfPO
Backbone=Gemma-3-4B, W...
2026.06
44.3
TI-DPO
Backbone=Gemma-3-4B, W...
2026.06
44.28
DPO
Backbone=Gemma-3-4B, W...
2026.06
43.55
SePO
Backbone=Gemma-3-4B, W...
2026.06
42.69
Base
Backbone=Llama-3.2-3B,...
2026.06
41.91
AlphaToken
Backbone=Llama-3.2-3B,...
2026.06
41.37
TI-DPO
Backbone=Llama-3.2-3B,...
2026.06
40.93
ConfPO
Backbone=Llama-3.2-3B,...
2026.06
40.46
DPO
Backbone=Llama-3.2-3B,...
2026.06
40.18
SePO
Backbone=Llama-3.2-3B,...
2026.06
38.59
Feedback
Search any
task
Search any
task