Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Capability on Aggregated 7-Metric Suite (test)
Loading...
43.9
Average Score
G-Zero
32.1376
35.1913
38.245
41.2987
May 11, 2026
Average Score
Updated 21d ago
Evaluation Results
Method
Method
Links
Average Score
G-Zero
Backbone=Llama-3.1-8B-...
2026.05
43.9
G-Zero
Backbone=Llama-3.1-8B-...
2026.05
43.08
base model
Backbone=Llama-3.1-8B-...
2026.05
42.77
R-Zero
Backbone=Llama-3.1-8B-...
2026.05
40.89
G-Zero
Backbone=Qwen3-8B-Base...
2026.05
35.43
G-Zero
Backbone=Qwen3-8B-Base...
2026.05
34.96
base model
Backbone=Qwen3-8B-Base...
2026.05
33.95
R-Zero
Backbone=Qwen3-8B-Base...
2026.05
32.59
Feedback
Search any
task
Search any
task