Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
News Classification on AG News
Loading...
87.2
Accuracy
Gemini-2.5-Flash
54.336
62.868
71.4
79.932
Mar 11, 2026
Accuracy
PEEM Accuracy
Response Overall Score
Prompt Effectiveness Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
PEEM Accuracy
Response Overall Score
Prompt Effectiveness Score
Gemini-2.5-Flash
Task Model=Gemini-2.5-...
2026.03
87.2
4.58
4.695
4.41
GPT-4o-mini
Task Model=GPT-4o-mini
2026.03
85.5
4.55
4.764
4.457
Gemma-2-9B-IT
Task Model=Gemma-2-9B-IT
2026.03
59
4.063
4.223
4.061
Qwen-2.5-7B-IT
Task Model=Qwen-2.5-7B-IT
2026.03
56.3
4.048
4.75
4.209
LLaMA-3.1-8B-IT
Task Model=LLaMA-3.1-8...
2026.03
55.6
3.845
4.656
3.958
Feedback
Search any
task
Search any
task