Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Generation on CommonGen Hard
Loading...
87.12
Accuracy
Qwen3-235B-A22B
74.432
77.726
81.02
84.314
Aug 19, 2025
Accuracy
Average Score
Improvement Overhead
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Score
Improvement Overhead
Qwen3-235B-A22B
Framework=Reference Mo...
2025.08
87.12
78.22
-
COCO Qwen3-8B with coco(Qwen3-8B)
Framework=COCO Framewo...
2025.08
84.77
74.18
6.2
COCO Qwen3-8B with coco(Llama-3.1-8B)
Framework=COCO Framewo...
2025.08
84.5
74.37
6.5
Aflow-Qwen3-8B
Framework=Multi-Agent...
2025.08
83.89
69.86
-
Qwen3-8B
Framework=Reference Mo...
2025.08
81.15
68.52
-
COCO Llama-3.1-8B with coco(Llama-3.1-8B)
Framework=COCO Framewo...
2025.08
80.81
58.46
0.63
COCO Llama-3.1-8B with coco(Qwen3-8B)
Framework=COCO Framewo...
2025.08
80.28
63.59
9.5
Llama-3.1-8B
Framework=Reference Mo...
2025.08
77
55.48
-
Aflow-Llama3.1-8B
Framework=Multi-Agent...
2025.08
74.92
58.09
-
Feedback
Search any
task
Search any
task