Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Peer Review Feedback Generation on ICLR papers
Loading...
45.8
Combined Success Rate
GPT-5.2
0.04
11.92
23.8
35.68
Apr 13, 2026
Combined Success Rate
Combined Success Confidence Interval (CI)
Validity Only Rate
Validity Only Confidence Interval (CI)
Author Action Only Rate
Author Action Only Confidence Interval (CI)
Updated 4d ago
Evaluation Results
Method
Method
Links
Combined Success Rate
Combined Success Confidence Interval (CI)
Validity Only Rate
Validity Only Confidence Interval (CI)
Author Action Only Rate
Author Action Only Confidence Interval (CI)
GPT-5.2
Bootstrap iterations (...
2026.04
45.8
1
46.3
1
45.8
1
Gemini-3-flash
Bootstrap iterations (...
2026.04
37.9
0.9
39.4
0.9
37.9
0.9
GOODPOINT-DPO
Training Protocol=DPO,...
2026.04
14.7
0.5
14.9
0.5
14.7
0.5
GOODPOINT-SFT
Training Protocol=SFT,...
2026.04
9.2
0.5
9.7
0.5
9.2
0.5
Qwen3-8b (Base)
Model Scale=8b, Traini...
2026.04
8
0.6
8.1
0.6
8
0.6
Llama3.1-8b-Instruct
Model Scale=8b, Traini...
2026.04
1.8
0.3
1.8
0.3
1.8
0.3
Feedback
Search any
task
Search any
task