Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Computing Agent Capabilities Checklist Evaluation
Loading...
-
Reasoning Rate
No plottable results for Reasoning Rate (PERCENT).
Metric
Reasoning Rate (PERCENT)
Code Expansion Rate (PERCENT)
Debugging Success Rate (PERCENT)
Refinement Success Rate (PERCENT)
Review Success Rate (PERCENT)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reasoning Rate
Code Expansion Rate
Debugging Success Rate
Refinement Success Rate
Review Success Rate
No evaluation results found.
Feedback
Search any
task
Search any
task