Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Quality Evaluation on Science reasoning tasks
Loading...
38.5
Param & Constraint Acc
Force Strong
35.068
35.959
36.85
37.741
Jan 27, 2026
Param & Constraint Acc
Scientific Validity
Robustness
Code Quality
Updated 3mo ago
Evaluation Results
Method
Method
Links
Param & Constraint Acc
Scientific Validity
Robustness
Code Quality
Force Strong
Routing Strategy=Force...
2026.01
38.5
28.8
19.2
9.4
CASTER
Routing Strategy=CASTER
2026.01
38.2
29.5
18.2
9.9
Force Weak
Routing Strategy=Force...
2026.01
35.2
27.5
16
9.8
Feedback
Search any
task
Search any
task