Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Compositional Generalization on SCAN
Loading...
23.44
Length
GRPO-Composite
11.4176
14.5388
17.66
20.7812
May 6, 2026
Length
Turn Left Accuracy
Updated 27d ago
Evaluation Results
Method
Method
Links
Length
Turn Left Accuracy
GRPO-Composite
Base Model=Llama-3.1-8...
2026.05
23.44
70.74
GRPO-Binary
Base Model=Qwen-2.5-7B...
2026.05
22.68
79.13
GRPO-Composite
Base Model=Qwen-2.5-7B...
2026.05
21.81
78.76
GRPO-Binary
Base Model=Llama-3.1-8...
2026.05
18.49
71.52
SFT
Base Model=Qwen-2.5-7B...
2026.05
18.41
78.06
SFT
Base Model=Llama-3.1-8...
2026.05
11.88
69.03
Feedback
Search any
task
Search any
task