Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-subject Knowledge on MMLU-Redux
Loading...
67
Accuracy
ReLIFT
46.616
51.908
57.2
62.492
Jun 9, 2025
Accuracy
Length
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Length
ReLIFT
2025.06
67
1,511
RL
2025.06
65.7
863
LUFFY
2025.06
65.2
1,516
SFT then RL(v2)
2025.06
62
1,501
SFT
2025.06
60.1
2,479
RL w/ SFT Loss
2025.06
59.1
2,514
SFT then RL(v1)
2025.06
56.2
1,217
Qwen-Math-Instruct
2025.06
48.1
3,628
Qwen-Math
2025.06
47.4
555
Feedback
Search any
task
Search any
task