| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Prompt Optimization Benchmark | LinGO | Accuracy69 | 24 | 1mo ago | |
| Logical Reasoning, Mathematical Calculation, and Knowledge Intensive tasks Average | MemAPO | Average Performance (%)70.7 | 20 | 25d ago | |
| VisEval | PromptAgent | Accuracy (Easy)0.77 | 10 | 1mo ago | |
| DABench | PromptBreeder | Acc (Easy)80 | 10 | 1mo ago | |
| HotpotQA, IFBench, HoVer, PUPA, AIME, and LiveBench-Math 2018-2025 (test) | GEPA | HotpotQA Score69 | 8 | 1mo ago | |
| DSG-1K | CRAFT | DSGScore0.91 | 7 | 1mo ago | |
| P2-hard | Maestro | DSGScore92 | 7 | 1mo ago | |
| 42 LLM benchmarks Aggregate (overall) | System+Task Optimized | Average Score67.14 | 5 | 12d ago | |
| Flickr | EDITOR | Mean CLIP Score77.62 | 4 | 1mo ago | |
| Dataset with human annotations (test) | LinGO (RAG) | Accuracy69 | 4 | 1mo ago |