Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Prompt Optimization on HotpotQA
Loading...
63
Score
LEVI
41.5032
47.0841
52.665
58.2459
May 10, 2026
Score
Optimization Budget (# Rollouts)
Updated 22d ago
Evaluation Results
Method
Method
Links
Score
Optimization Budget (# Rollouts)
LEVI
Task Model=Qwen 3 8B
2026.05
63
2,750
GEPA
Task Model=Qwen 3 8B
2026.05
62.33
6,871
MIPROv2
Task Model=Qwen 3 8B
2026.05
55.33
-
Baseline
Task Model=Qwen 3 8B
2026.05
42.33
-
Feedback
Search any
task
Search any
task