Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Instruction following and reasoning benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Instruction following and reasoning
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Low-resource languages evaluation suite (am, arz, ars, as, ast, az, ba, bn, bo, ceb, cv, cy, fo, ga, gd, gl, gn, ha, ht, ig, jv, kmr, sdh, ky, lb, lo, lus, mg, mi, mn, mt, ny, oc, pap, ps, rn, rw, sd, si, sm, sn, st, su, sw, te, tg, ti, tk, tt, ug, xh, yi, yo, zu)
Kakugo
Wins
5
54
1mo ago
Average of 9 tasks (DollyEval, VicunaEval, GSM8K, MATH, AIME2024, HumanEval, MBPP, LiveCodeBench, GPQA-D)
IOA
Average Performance
31.19
9
1mo ago
Chat and Instruction-following Suite IFEval, AE2, MTB, GSM8K
S2FT (Down)
IFEval
0.695
5
1mo ago
Showing 3 of 3 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Terms of Service
FAQs
Swarm Docs