Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on BBH Disambiguation QA
Loading...
71
Accuracy (BBH Disambiguation QA)
TextGrad
55.4
59.45
63.5
67.55
Nov 25, 2025
Accuracy (BBH Disambiguation QA)
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy (BBH Disambiguation QA)
TextGrad
Backbone=GPT-4o, Optim...
2025.11
71
evaluation-instructed prompt optimization
Backbone=GPT-4o, Optim...
2025.11
69
APE
Backbone=GPT-4o, Optim...
2025.11
67
Self-Refine
Backbone=LLaMA-3.1, Op...
2025.11
66
evaluation-instructed prompt optimization
Backbone=LLaMA-3, Opti...
2025.11
65
Self-Refine
Backbone=LLaMA-3, Opti...
2025.11
65
evaluation-instructed prompt optimization
Backbone=LLaMA-3.1, Op...
2025.11
65
TextGrad
Backbone=LLaMA-3.1, Op...
2025.11
65
TextGrad
Backbone=LLaMA-3, Opti...
2025.11
64
LLM only
Backbone=LLaMA-3.1, Op...
2025.11
64
Pro-Refine
Backbone=LLaMA-3.1, Op...
2025.11
64
LLM only
Backbone=LLaMA-3, Opti...
2025.11
63
Pro-Refine
Backbone=LLaMA-3, Opti...
2025.11
63
APE
Backbone=LLaMA-3, Opti...
2025.11
63
APE
Backbone=LLaMA-3.1, Op...
2025.11
63
Pro-Refine
Backbone=GPT-4o, Optim...
2025.11
63
LLM only
Backbone=GPT-4o, Optim...
2025.11
58
Self-Refine
Backbone=GPT-4o, Optim...
2025.11
56
Feedback
Search any
task
Search any
task