Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on PUPA (test)
Loading...
91.85
Score
GEPA
80.3788
83.3569
86.335
89.3131
Jul 25, 2025
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
GEPA
Backbone=Qwen3 8B, Opt...
2025.07
91.85
GRPO
Backbone=Qwen3 8B, Opt...
2025.07
86.66
GEPA+Merge
Backbone=Qwen3 8B, Opt...
2025.07
86.26
MIPROv2
Backbone=Qwen3 8B
2025.07
81.55
Baseline
Backbone=Qwen3 8B
2025.07
80.82
Feedback
Search any
task
Search any
task