Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Gap discovery quality on Scientist-Bench 27 tasks
Loading...
5
Gaps/Task
AI-Supervisor (RWM)
1.88
2.69
3.5
4.31
Mar 25, 2026
Gaps/Task
Precision
Recall
Best Align
Updated 2mo ago
Evaluation Results
Method
Method
Links
Gaps/Task
Precision
Recall
Best Align
AI-Supervisor (RWM)
backbone=Qwen-72B-Inst...
2026.03
5
80.7
100
4.44
LLM-only brainstorm
backbone=Qwen-72B-Inst...
2026.03
4.9
67.9
92.6
4.15
Divergent-convergent
backbone=Qwen-72B-Inst...
2026.03
2
75.5
92.6
4.04
Feedback
Search any
task
Search any
task