Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multistep Reasoning on MuSR
Loading...
41.5
Accuracy
SePT
41.396
41.423
41.45
41.477
Oct 21, 2025
Accuracy
Updated 11d ago
Evaluation Results
Method
Method
Links
Accuracy
SePT
Backbone=Qwen2.5-Math-7B
2025.10
41.5
Base
Backbone=Qwen2.5-Math-7B
2025.10
41.4
GRPO
Backbone=Qwen2.5-Math-7B
2025.10
41.4
Feedback
Search any
task
Search any
task