Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task Performance on Scientific Discovery Scenario
Loading...
95.3
Average Score
CASTER
89.996
91.373
92.75
94.127
Jan 27, 2026
Average Score
Updated 3mo ago
Evaluation Results
Method
Method
Links
Average Score
CASTER
Strategy=CASTER
2026.01
95.3
Force Strong
Strategy=Force Strong
2026.01
95.2
Force Weak
Strategy=Force Weak
2026.01
90.2
Feedback
Search any
task
Search any
task