Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following on Human Instructions
Loading...
0.22
Accuracy
Baseline
0.0328
0.0814
0.13
0.1786
May 13, 2026
Accuracy
Latency
Updated 20d ago
Evaluation Results
Method
Method
Links
Accuracy
Latency
Baseline
Model Backbone=Qwen2.5...
2026.05
0.22
9.3
Baseline
Model Backbone=Llama-3...
2026.05
0.192
5.6
AsyncIO
Model Backbone=Llama-3...
2026.05
0.153
9.3
AsyncIO
Model Backbone=Qwen2.5...
2026.05
0.136
14.3
Baseline
Model Backbone=Qwen2.5...
2026.05
0.04
-
Baseline
Model Backbone=Llama-3...
2026.05
0.04
-
Feedback
Search any
task
Search any
task