Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Coverage-based Alignment on ICLR 50 submissions 2026
Loading...
88.6
Str-Cov
PaperAudit
84.232
85.366
86.5
87.634
Jan 7, 2026
Str-Cov
Weak-Cov
AI-Extra
Sym-Cov
Updated 4d ago
Evaluation Results
Method
Method
Links
Str-Cov
Weak-Cov
AI-Extra
Sym-Cov
PaperAudit
reviewer=GPT-5, judge=...
2026.01
88.6
59.1
46.8
44.4
Baseline
reviewer=GPT-5, judge=...
2026.01
87.6
56.8
49.5
43.3
DeepReview
reviewer=GPT-5, judge=...
2026.01
84.4
48.2
48.8
41.2
Feedback
Search any
task
Search any
task