Instruction following and reasoning

Benchmarks

Dataset Name	SOTA Method	Metric
Low-resource languages evaluation suite (am, arz, ars, as, ast, az, ba, bn, bo, ceb, cv, cy, fo, ga, gd, gl, gn, ha, ht, ig, jv, kmr, sdh, ky, lb, lo, lus, mg, mi, mn, mt, ny, oc, pap, ps, rn, rw, sd, si, sm, sn, st, su, sw, te, tg, ti, tk, tt, ug, xh, yi, yo, zu)	Kakugo	Wins5	54	4mo ago
Average of 9 tasks (DollyEval, VicunaEval, GSM8K, MATH, AIME2024, HumanEval, MBPP, LiveCodeBench, GPQA-D)	IOA	Average Performance31.19	9	4mo ago
Chat and Instruction-following Suite IFEval, AE2, MTB, GSM8K	S2FT (Down)	IFEval0.695	5	4mo ago

Showing 3 of 3 rows