Prompt Optimization

Benchmarks

Dataset Name	SOTA Method	Metric
XSum	EGE	Hypervolume (HV)0.1626	72	2mo ago
Prompt Optimization Benchmark	LinGO	Accuracy69	24	4mo ago
Logical Reasoning, Mathematical Calculation, and Knowledge Intensive tasks Average	MemAPO	Average Performance (%)70.7	20	4mo ago
VisEval	PromptAgent	Accuracy (Easy)0.77	10	4mo ago
DABench	PromptBreeder	Acc (Easy)80	10	4mo ago
ST	SCULPT	Best Score71.7	8	1mo ago
FF	LLMLingua	Best Score98.9	8	1mo ago
CJ	SCULPT	Best Score76.9	8	1mo ago
DQA	SCULPT	Best Score82.3	8	1mo ago
GoE	WPRO0.5	Best Performance43	8	1mo ago
BT	CRAFT	Best Score62	8	1mo ago
HotpotQA, IFBench, HoVer, PUPA, AIME, and LiveBench-Math 2018-2025 (test)	GEPA	HotpotQA Score69	8	4mo ago
DSG-1K	CRAFT	DSGScore0.91	7	4mo ago
P2-hard	Maestro	DSGScore92	7	4mo ago
10-task prompt optimization suite GSM8K MMLU BBH	ReElicit	Average Win/Tie Rate81	5	2mo ago
product-gen (test)	Bayesian	Accuracy92.2	5	2mo ago
trip-advisory (test)	COPRO-R	Accuracy81.1	5	2mo ago
code-explain (test)	COPRO-R	Accuracy84.2	5	2mo ago
42 LLM benchmarks Aggregate (overall)	System+Task Optimized	Average Score67.14	5	3mo ago
CB	TRAS	Accuracy85.7	4	1mo ago
Biosses	TRAS	Accuracy70.4	4	1mo ago
Penguins	TRAS	Accuracy68.6	4	1mo ago
Geometric Shapes	TRAS	Accuracy63.3	4	1mo ago
Causal Judgment	TRAS	Accuracy64.4	4	1mo ago
GEPA Evaluation Suite Aggregate	LEVI	Aggregate Score62.02	4	2mo ago

Showing 25 of 32 rows