Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following Evaluation on SelfInst Out-of-Distribution
Loading...
51.6
GPT-4o Score
SODA
45.36
46.98
48.6
50.22
Apr 4, 2026
GPT-4o Score
Updated 11d ago
Evaluation Results
Method
Method
Links
GPT-4o Score
SODA
Model=Llama-3.1-8B-Ins...
2026.04
51.6
SODA
Model=Qwen2.5-7B-Instruct
2026.04
50.8
SODA
Model=Llama-3.2-3B-Ins...
2026.04
50.5
GAD
Model=Qwen2.5-7B-Instruct
2026.04
50.1
Teacher
Model=GPT-5-Chat
2026.04
49.7
GAD
Model=Llama-3.1-8B-Ins...
2026.04
49.5
GAD
Model=Llama-3.2-3B-Ins...
2026.04
49.1
SeqKD
Model=Llama-3.1-8B-Ins...
2026.04
48.7
Base
Model=Llama-3.1-8B-Ins...
2026.04
48.4
Base
Model=Qwen2.5-7B-Instruct
2026.04
48.3
SeqKD
Model=Qwen2.5-7B-Instruct
2026.04
48.3
SODA
Model=Qwen2.5-3B-Instruct
2026.04
48.2
GAD
Model=Qwen2.5-3B-Instruct
2026.04
47.7
SeqKD
Model=Llama-3.2-3B-Ins...
2026.04
47.1
Base
Model=Llama-3.2-3B-Ins...
2026.04
47
SeqKD
Model=Qwen2.5-3B-Instruct
2026.04
45.7
Base
Model=Qwen2.5-3B-Instruct
2026.04
45.6
Feedback
Search any
task
Search any
task