| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Prompt Optimization Benchmark | LinGO | Accuracy69 | 24 | 4d ago | |
| VisEval | PromptAgent | Accuracy (Easy)0.77 | 10 | 4d ago | |
| DABench | PromptBreeder | Acc (Easy)80 | 10 | 4d ago | |
| HotpotQA, IFBench, HoVer, PUPA, AIME, and LiveBench-Math 2018-2025 (test) | GEPA | HotpotQA Score69 | 8 | 4d ago | |
| DSG-1K | CRAFT | DSGScore0.91 | 7 | 4d ago | |
| P2-hard | Maestro | DSGScore92 | 7 | 4d ago | |
| Dataset with human annotations (test) | LinGO (RAG) | Accuracy69 | 4 | 4d ago |