Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following Evaluation on Vicuna Out-of-Distribution
Loading...
51.9
GPT-4o Score
SODA
46.7
48.05
49.4
50.75
Apr 4, 2026
GPT-4o Score
Updated 11d ago
Evaluation Results
Method
Method
Links
GPT-4o Score
SODA
Model=Llama-3.1-8B-Ins...
2026.04
51.9
GAD
Model=Qwen2.5-7B-Instruct
2026.04
51.4
SODA
Model=Qwen2.5-7B-Instruct
2026.04
51.4
GAD
Model=Llama-3.1-8B-Ins...
2026.04
50.2
Teacher
Model=GPT-5-Chat
2026.04
49.9
SODA
Model=Llama-3.2-3B-Ins...
2026.04
49.9
SODA
Model=Qwen2.5-3B-Instruct
2026.04
49.8
SeqKD
Model=Qwen2.5-7B-Instruct
2026.04
49.5
GAD
Model=Qwen2.5-3B-Instruct
2026.04
49.4
Base
Model=Qwen2.5-7B-Instruct
2026.04
49.1
GAD
Model=Llama-3.2-3B-Ins...
2026.04
48.9
SeqKD
Model=Llama-3.1-8B-Ins...
2026.04
48.7
SeqKD
Model=Llama-3.2-3B-Ins...
2026.04
48.1
SeqKD
Model=Qwen2.5-3B-Instruct
2026.04
48
Base
Model=Llama-3.1-8B-Ins...
2026.04
47.9
Base
Model=Qwen2.5-3B-Instruct
2026.04
47.3
Base
Model=Llama-3.2-3B-Ins...
2026.04
46.9
Feedback
Search any
task
Search any
task