Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Software Engineering Evaluation
Loading...
0.335
Functional Correctness
FrugalGPT
0.31825
0.326625
0.335
0.343375
Jan 27, 2026
Functional Correctness
Robustness & Security
Engineering Quality
Code Style
Updated 4d ago
Evaluation Results
Method
Method
Links
Functional Correctness
Robustness & Security
Engineering Quality
Code Style
FrugalGPT
2026.01
0.335
0.225
0.158
0.08
CASTER
2026.01
0.335
0.23
0.158
0.085
Feedback
Search any
task
Search any
task