Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Post-argument rating regression on Anthropic dataset September 2024
Loading...
0.67
MSE
MS-PS-MLP
0.522588
1.517619
2.51265
3.507681
Jan 15, 2026
MSE
RMSE
MAE
R2
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
MSE
RMSE
MAE
R2
Accuracy
MS-PS-MLP
Prompting Strategy=MS-...
2026.01
0.67
0.8185
0.6339
0.811
46.2
Baseline-2-o3
LLM Backbone=OpenAI-o3...
2026.01
0.9882
0.9941
0.731
0.721
37.9
Baseline-1-o3
LLM Backbone=OpenAI-o3...
2026.01
1.0068
1.0034
0.7394
0.716
37.4
Baseline-2-Gemma
LLM Backbone=Gemma, Pr...
2026.01
3.5854
1.8935
1.5347
-0.011
20.1
Baseline-1-Gemma
LLM Backbone=Gemma, Pr...
2026.01
4.3553
2.0869
1.6988
-0.229
18.4
Feedback
Search any
task
Search any
task