Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Downstream Task Evaluation on Downstream Tasks Aggregate
Loading...
60.49
Accuracy
MeSH
39.0556
44.6203
50.185
55.7497
Oct 9, 2025
Oct 30, 2025
Nov 20, 2025
Dec 11, 2025
Jan 1, 2026
Jan 22, 2026
Feb 12, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
MeSH
Model Backbone=Pythia-...
2025.10
60.49
Recursive baseline
Model Backbone=Pythia-...
2025.10
59.43
MeSH
Model Backbone=Pythia-...
2025.10
58.83
MeSH
Model Backbone=Pythia-...
2025.10
56.85
Recursive baseline
Model Backbone=Pythia-...
2025.10
56.67
Recursive baseline
Model Backbone=Pythia-...
2025.10
54.92
MeSH
Model Backbone=Pythia-...
2025.10
54.71
SPIRALFORMER-L
Model Scale=1.4B, Topo...
2026.02
54.37
Recursive baseline
Model Backbone=Pythia-...
2025.10
52.49
SPIRALFORMER-L
Model Scale=1.4B, Topo...
2026.02
51.75
BASELINE (PYTHIA)
Model Scale=160M, Eval...
2026.02
39.88
Feedback
Search any
task
Search any
task