Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Generation on PT-Exams Open Questions
Loading...
77.3
Score
Qwen 3-8B
32.372
44.036
55.7
67.364
Mar 27, 2026
Score
Updated 20d ago
Evaluation Results
Method
Method
Links
Score
Qwen 3-8B
Model Type=Open weight...
2026.03
77.3
Gemma 3-12B
Model Type=Open weight...
2026.03
76.6
Gemma 2-9B
Model Type=Open weight...
2026.03
69.7
AMALIA-9B-DPO
Model Type=Fully open...
2026.03
66
AMALIA-9B-SFT
Model Type=Fully open...
2026.03
62
Ministral-8B
Model Type=Open weight...
2026.03
62
Qwen 2.5-7B
Model Type=Open weight...
2026.03
56.8
EuroLLM-9B
Model Type=Fully open...
2026.03
56.1
Apertus-8B
Model Type=Fully open...
2026.03
54.7
Llama 3.1-8B
Model Type=Open weight...
2026.03
53.8
Gervasio-8B
Model Type=Open weight...
2026.03
53.2
Mistral-7B
Model Type=Open weight...
2026.03
44.6
OLMo 2-7B
Model Type=Fully open...
2026.03
43
Salamandra-7B
Model Type=Fully open...
2026.03
34.1
Feedback
Search any
task
Search any
task