Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Performance Shift Prediction on MMLU-Pro
Loading...
0.8
R-squared
STS_act
0.3424
0.4612
0.58
0.6988
Mar 3, 2026
R-squared
Updated 3mo ago
Evaluation Results
Method
Method
Links
R-squared
STS_act
Model=Qwen2.5-7B, SFT...
2026.03
0.8
STS_ICL
Model=LLaMA3-8B, SFT D...
2026.03
0.66
STS_ICL
Model=Qwen2.5-7B, SFT...
2026.03
0.61
STS_ICL
Model=Gemma2-9B, SFT D...
2026.03
0.6
STS_act
Model=LLaMA3-8B, SFT D...
2026.03
0.5
STS_act
Model=Gemma2-9B, SFT D...
2026.03
0.36
Feedback
Search any
task
Search any
task