Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Instruction Following on IFBench

68.1Pass@1 (Strict)

Olmo 3.1 Think 32B

2.548819.566936.58553.6031Dec 15, 2025Dec 24, 2025Jan 3, 2026Jan 13, 2026Jan 23, 2026Feb 2, 2026Feb 12, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
68.1
2025.12
55.1
2025.12
47.6
2026.01
47
2025.12
46.3
2026.01
44.3
2026.01
43.2
2026.01
40.5
2026.01
37.8
2025.12
37.3
2025.12
37
2026.01
36.7
2026.01
34.8
2025.12
34.4
2025.12
34
2025.12
32.3
2025.12
29.3
2025.12
29.3
2025.12
28.4
2025.12
27.4
2025.12
26.7
2025.12
23.8
2025.12
22.3
2025.12
22.1
2026.02
20.8
2026.02
20.8
2026.02
20.8
2026.02
20.4
2026.02
20.4
2026.02
20.4
2026.02
20.4
2026.02
20.4
2026.02
20
2026.02
19.6
2026.02
19.6
2026.02
18.8
2026.02
18.8
2026.02
18.8
2026.02
18.8
2026.02
18.4
2026.02
18.4
2026.02
18.4
2026.02
18
2026.02
18
2026.02
18
2026.02
18
2026.02
18
2026.02
17.6
2026.02
17.6
2026.02
17.6
2026.02
16.8
2026.02
15.2
2026.02
14
2026.02
14
2026.02
12.24
2026.02
12
2026.02
11.6
2026.02
11.6
2026.02
11.6
2026.02
11.6
2026.02
11.34
2026.02
11.2
2026.02
10.48
2026.02
10.48
2026.02
9.55
2026.02
8.36
2026.02
7.46
2026.02
5.07