Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Instruction Following Evaluation on Aggregated Multi-Task Suite
Loading...
8.57
Average Normalized Score
EPI
7.27
7.6075
7.945
8.2825
Apr 15, 2026
Average Normalized Score
Updated 3d ago
Evaluation Results
Method
Method
Links
Average Normalized Score
EPI
Base Model=Gemma-2-9B,...
2026.04
8.57
Static Isolation
Base Model=Gemma-2-9B,...
2026.04
8.21
EPI
Base Model=LLaMA-3-8B,...
2026.04
7.98
Full SFT
Base Model=Gemma-2-9B,...
2026.04
7.78
Static Isolation
Base Model=LLaMA-3-8B,...
2026.04
7.68
Full SFT
Base Model=LLaMA-3-8B,...
2026.04
7.32
Feedback
Search any
task
Search any
task