Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Overall Performance on Average (HellaSwag, PiQA, OBQA, COPA, LogiQA, WinoG, SciQ, ARC-E, Lambada)
Loading...
42.5
Accuracy
DoGraph
39.796
40.498
41.2
41.902
Apr 9, 2026
Accuracy
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy
DoGraph
Backbone=LLaMA-1.1B, P...
2026.04
42.5
RegMix
Backbone=LLaMA-1.1B, P...
2026.04
42.1
DoReMi
Backbone=LLaMA-1.1B, P...
2026.04
41.7
Uniform
Backbone=LLaMA-1.1B, P...
2026.04
40.9
Data Mixing Law
Backbone=LLaMA-1.1B, P...
2026.04
40.6
DOGE
Backbone=LLaMA-1.1B, P...
2026.04
40.5
Dynamic Loss-Based
Backbone=LLaMA-1.1B, P...
2026.04
39.9
Feedback
Search any
task
Search any
task