Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Individualized Feedback Simulation on Individualized Feedback 210 instances, 50 players (test)
Loading...
1.19
MAE
BG-Persona
1.1616
1.3533
1.545
1.7367
Jun 1, 2026
MAE
Pair Score
Extraction Score
Comment Quality Score (P)
Comment Quality Score (R)
Comment Quality Score (S)
Updated 1d ago
Evaluation Results
Method
Method
Links
MAE
Pair Score
Extraction Score
Comment Quality Score (P)
Comment Quality Score (R)
Comment Quality Score (S)
BG-Persona
decoding=greedy
2026.06
1.19
84.3
76
5.7
5
4.3
GPT-5.4
decoding=greedy
2026.06
1.24
73.4
68
5.4
4.7
3.9
Qwen3.5-397B
decoding=greedy
2026.06
1.45
64.6
60
5
4.2
3.4
Gemini-3.1-Flash
decoding=greedy
2026.06
1.49
70.7
70
5.3
4.5
3.6
Qwen3.5-27B
decoding=greedy
2026.06
1.9
65.7
52
5
3.8
3.2
Feedback
Search any
task
Search any
task