Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MuSR

Benchmarks

Task NameDataset NameSOTA ResultTrend
Math & LogicMUSR
MUSR Performance42.12
24
ReasoningMuSR (test)
Accuracy73.9
14
Multistep Soft ReasoningMUSR
Accuracy (%)43.1
12
ReasoningMuSR
Accuracy71.89
11
Multi-hop ReasoningMuSR
Accuracy43.12
10
Adding MistakeMuSR
AOC0.731
7
Truncated CoT AnsweringMuSR
AOC33.6
7
Multistep ReasoningMUSR
Accuracy61.67
7
Multistep ReasoningMUSR-fr
Average Score33.79
6
Showing 9 of 9 rows