Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Compositional Generalization on COGS
Loading...
83.9
Exact Match Accuracy
GRPO-Binary
81.0088
81.7594
82.51
83.2606
May 6, 2026
Exact Match Accuracy
Updated 27d ago
Evaluation Results
Method
Method
Links
Exact Match Accuracy
GRPO-Binary
Base Model=Llama-3.1-8...
2026.05
83.9
GRPO-Binary
Base Model=Qwen-2.5-7B...
2026.05
83.23
GRPO-Composite
Base Model=Qwen-2.5-7B...
2026.05
83.15
GRPO-Composite
Base Model=Llama-3.1-8...
2026.05
82.88
SFT
Base Model=Qwen-2.5-7B...
2026.05
82.09
SFT
Base Model=Llama-3.1-8...
2026.05
81.12
Feedback
Search any
task
Search any
task