Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Question Answering on HotPotQA (Task Score, Brier)
Loading...
57.6
Task Score
GEPA-C
55.416
55.983
56.55
57.117
May 20, 2026
Task Score
Brier Score
Updated 12d ago
Evaluation Results
Method
Method
Links
Task Score
Brier Score
GEPA-C
Optimizer LLM=GPT-5
2026.05
57.6
0.41
GEPA-C
Optimizer LLM=GPT-5-mini
2026.05
57.6
0.41
GEPA-C
Optimizer LLM=Gemini-3...
2026.05
57.6
0.41
GEPA-C
Optimizer LLM=Gemini-3...
2026.05
57.6
0.41
RPT
Optimizer LLM=GPT-5
2026.05
55.5
0.438
RPT
Optimizer LLM=GPT-5-mini
2026.05
55.5
0.438
RPT
Optimizer LLM=Gemini-3...
2026.05
55.5
0.438
RPT
Optimizer LLM=Gemini-3...
2026.05
55.5
0.438
Feedback
Search any
task
Search any
task