Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multistep Reasoning on MUSR (Accuracy)
Loading...
61.67
Accuracy
Base
16.1076
27.9363
39.765
51.5937
May 27, 2025
Jul 10, 2025
Aug 23, 2025
Oct 7, 2025
Nov 20, 2025
Jan 3, 2026
Feb 17, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Base
Backbone=Qwen3-4B
2025.05
61.67
Sigmoid capability boundaries
FLOPs budget=10^24, Bo...
2026.02
53.5
Vanilla LoRA
Backbone=Qwen3-4B
2025.05
32.31
Base
Backbone=Llama-3.2-3B
2025.05
30.95
SHE-LoRA
Backbone=Qwen3-4B
2025.05
30.71
Vanilla LoRA
Backbone=Llama-3.2-3B
2025.05
18.05
SHE-LoRA
Backbone=Llama-3.2-3B
2025.05
17.86
Feedback
Search any
task
Search any
task