Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following Evaluation on Dolly Out-of-Distribution
Loading...
49.9
GPT-4o Score
SODA
44.596
45.973
47.35
48.727
Apr 4, 2026
GPT-4o Score
Updated 11d ago
Evaluation Results
Method
Method
Links
GPT-4o Score
SODA
Model=Llama-3.1-8B-Ins...
2026.04
49.9
Teacher
Model=GPT-5-Chat
2026.04
49.8
SODA
Model=Qwen2.5-7B-Instruct
2026.04
49.6
SODA
Model=Llama-3.2-3B-Ins...
2026.04
49.5
GAD
Model=Llama-3.1-8B-Ins...
2026.04
48.8
GAD
Model=Qwen2.5-7B-Instruct
2026.04
48.5
GAD
Model=Llama-3.2-3B-Ins...
2026.04
48.5
SeqKD
Model=Llama-3.1-8B-Ins...
2026.04
47.7
Base
Model=Qwen2.5-7B-Instruct
2026.04
47.6
SeqKD
Model=Qwen2.5-7B-Instruct
2026.04
47.2
SeqKD
Model=Llama-3.2-3B-Ins...
2026.04
47
GAD
Model=Qwen2.5-3B-Instruct
2026.04
46.7
Base
Model=Llama-3.1-8B-Ins...
2026.04
46.6
SODA
Model=Qwen2.5-3B-Instruct
2026.04
46.1
Base
Model=Llama-3.2-3B-Ins...
2026.04
45.8
Base
Model=Qwen2.5-3B-Instruct
2026.04
45.1
SeqKD
Model=Qwen2.5-3B-Instruct
2026.04
44.8
Feedback
Search any
task
Search any
task