Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Language Understanding on General Downstream Tasks Aggregate
Loading...
59.5
Average Accuracy
PonderLM-2-Pythia-1.4B
47.124
50.337
53.55
56.763
Sep 27, 2025
Average Accuracy
Accuracy Delta
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Accuracy
Accuracy Delta
PonderLM-2-Pythia-1.4B
Evaluation Protocol=5-...
2025.09
59.5
5.4
PonderLM-2-Pythia-1.4B
Evaluation Protocol=0-...
2025.09
58.5
4.4
Ponder-1.4B
Evaluation Protocol=0-...
2025.09
56.5
-
Pythia-1.4B
Evaluation Protocol=0-...
2025.09
54.1
-
PonderLM-2-Pythia-410M
Evaluation Protocol=0-...
2025.09
51.9
4.3
PonderLM-2-Pythia-410M
Evaluation Protocol=5-...
2025.09
51.9
4.3
Ponder-410M
Evaluation Protocol=0-...
2025.09
50.4
-
Pythia-410M
Evaluation Protocol=0-...
2025.09
47.6
-
Feedback
Search any
task
Search any
task