Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Academic Reasoning on MMLU-Pro
Loading...
50.7
Pass@1
TRAPO
15.548
24.674
33.8
42.926
Dec 15, 2025
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
TRAPO
Training Paradigm=Semi...
2025.12
50.7
Fully Supervised
Training Paradigm=Supe...
2025.12
49.3
Fully Supervised
Training Paradigm=Supe...
2025.12
48.2
TRAPO
Training Paradigm=Semi...
2025.12
46.8
Sentence-level Entropy
Training Paradigm=Semi...
2025.12
44.5
Token-level Entropy
Training Paradigm=Semi...
2025.12
44
Fully Supervised
Training Paradigm=Supe...
2025.12
43.6
Sentence-level Entropy
Training Paradigm=Unsu...
2025.12
42.7
TTRL
Training Paradigm=Semi...
2025.12
42.7
Self-certainty
Training Paradigm=Semi...
2025.12
41.6
Self-certainty
Training Paradigm=Unsu...
2025.12
41.4
TTRL
Training Paradigm=Unsu...
2025.12
41.3
Token-level Entropy
Training Paradigm=Unsu...
2025.12
40.9
Qwen-Instruct
Training Paradigm=Orig...
2025.12
34.1
Qwen-Base
Training Paradigm=Orig...
2025.12
16.9
Feedback
Search any
task
Search any
task