Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on GeoShape BBH
Loading...
90
Accuracy
MemAPO
20.32
38.41
56.5
74.59
Mar 23, 2026
Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
MemAPO
Backbone=GPT-4o-mini
2026.03
90
MemAPO
Backbone=Qwen3-8B
2026.03
77
TextGrad
Backbone=GPT-4o-mini
2026.03
76
ProTeGi
Backbone=GPT-4o-mini
2026.03
71
CoT
Backbone=Qwen3-8B
2026.03
64
Step-Back
Backbone=GPT-4o-mini
2026.03
61
Step-Back
Backbone=Qwen3-8B
2026.03
58
ProTeGi
Backbone=Qwen3-8B
2026.03
56
SPO
Backbone=GPT-4o-mini
2026.03
54
RaR
Backbone=Qwen3-8B
2026.03
53
CoT
Backbone=GPT-4o-mini
2026.03
47
RaR
Backbone=GPT-4o-mini
2026.03
43
PromptBreeder
Backbone=GPT-4o-mini
2026.03
42
TextGrad
Backbone=Qwen3-8B
2026.03
39
OPRO
Backbone=Qwen3-8B
2026.03
36
IO
Backbone=GPT-4o-mini
2026.03
35
OPRO
Backbone=GPT-4o-mini
2026.03
35
SPO
Backbone=Qwen3-8B
2026.03
28
IO
Backbone=Qwen3-8B
2026.03
23
PromptBreeder
Backbone=Qwen3-8B
2026.03
23
Feedback
Search any
task
Search any
task