Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ICLR

Benchmarks

Task NameDataset NameSOTA ResultTrend
Paper Quality EvaluationICLR 2025 (test)
Kendall Tau Correlation48.08
32
Paper Acceptance DecisionICLR submissions 2025
Accuracy89.8
17
Binary deficiency detectiongold-labeled ICLR (test)
Accuracy86
16
Node RetrievalICLR 2025 (500 papers)
Recall @ 90.172
16
Paper Acceptance DecisionICLR 2025 (test)
Accuracy71.92
15
Scientific Idea GenerationICLR 2024
Absolute Novelty4.22
12
Multi-turn role-playICLR
Success Rate (SR)96.2
12
Review Score GenerationICLR 2025
Average Review Score6.4
10
Scientific Review Feedback GenerationICLR LLM-as-a-Judge 2025 (test)
Actionability Score3.38
9
Scientific Review Feedback GenerationICLR Human Evaluation 2025 (test)
Actionability3.46
9
Fine-grained multi-label classificationICLR gold-labeled (test)
Jaccard Similarity74.24
8
Holistic Technical Quality EvaluationICLR 2025
Originality3.35
8
Discrimination between Good Faith and Problematic agents (Peer Review)ICLR 20.2:1
Cohen's d1.82
6
Insight Discovery and GuidanceICLR Poster 2025
Guided Paper Percentage82.4
6
Insight Discovery and GuidanceICLR Spotlight 2025
Percentage Guided82.1
6
Insight Discovery and GuidanceICLR Oral 2025
Guidance Rate82.2
6
Insight Discovery and GuidanceICLR Overall 2025
Percentage Guided82.4
6
Empathetic DialogueICLR
Success Rate (SR)96.7
5
Issue IdentificationICLR 100-paper corpus 2026
Caught3,024
4
Faithfulness discriminationICLR
AUC54.4
4
Research solution evaluationICLR problems 2026 N=20 (test)
Feasibility Win%56
4
Citation Coverage EvaluationICLR 2025
Avg Cites45.73
3
Coverage-based AlignmentICLR 50 submissions 2026
Str-Cov88.6
3
Score-based AlignmentICLR 2026 (50 submissions)
R-MSE0.148
3
Research idea quality evaluationICLR Rejected Papers 2025
Mean Score2.689
2
Showing 25 of 28 rows