Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Multi-task performance evaluation benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Multi-task performance evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
GPQA-Diamond, GSM8K, MATH-500, AIME’24, and IFEval Aggregate
ProFit
Avg Score
58.72
25
4d ago
EVA
Virchow2
Mean All
79.4
11
4d ago
Showing 2 of 2 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task