Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General LLM Evaluation on Instruction-Following, Mathematics, and Commonsense Reasoning Combined
Loading...
57
Average Score
Qwen2.5 7B-PC
24.76
33.13
41.5
49.87
May 23, 2025
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Score
Qwen2.5 7B-PC
Backbone=Qwen2.5 7B, S...
2025.05
57
Granite 3.1 8B-PC
Backbone=Granite 3.1 8...
2025.05
56
Olmo 3 7B-PC
Backbone=Olmo 3 7B, St...
2025.05
55
Llama 3.1 8B-PC
Backbone=Llama 3.1 8B,...
2025.05
54
Qwen2.5 7B-NL
Backbone=Qwen2.5 7B, S...
2025.05
53
Olmo 3 7B-NL
Backbone=Olmo 3 7B, St...
2025.05
50
Mistral 7B v0.3-PC
Backbone=Mistral 7B v0...
2025.05
48
Qwen2.5 7B-CP
Backbone=Qwen2.5 7B, S...
2025.05
47
Llama 3.1 8B-NL
Backbone=Llama 3.1 8B,...
2025.05
45
Granite 8B Code-PC
Backbone=Granite 8B Co...
2025.05
43
Olmo 3 7B-CP
Backbone=Olmo 3 7B, St...
2025.05
39
Granite 3.1 8B-CP
Backbone=Granite 3.1 8...
2025.05
37
Granite 8B Code-NL
Backbone=Granite 8B Co...
2025.05
36
Mistral 7B v0.3-NL
Backbone=Mistral 7B v0...
2025.05
35
Llama 3.1 8B-CP
Backbone=Llama 3.1 8B,...
2025.05
34
Mistral 7B v0.3-CP
Backbone=Mistral 7B v0...
2025.05
33
Granite 8B Code-CP
Backbone=Granite 8B Co...
2025.05
29
Granite 3.1 8B-NL
Backbone=Granite 3.1 8...
2025.05
26
Feedback
Search any
task
Search any
task