Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Deduction (5 Objects) on Test Set
Loading...
61.1
Accuracy
TextReg
41.34
46.47
51.6
56.73
May 20, 2026
Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Accuracy
TextReg
Test Engine=Llama-3.1-...
2026.05
61.1
TextGrad
Test Engine=Llama-3.1-...
2026.05
59.8
CoT
Test Engine=Llama-3.1-...
2026.05
59.7
TextReg
Test Engine=Phi-3.5-Mi...
2026.05
57.9
REVOLVE
Test Engine=Llama-3.1-...
2026.05
57.7
CoT
Test Engine=Phi-3.5-Mi...
2026.05
57
TextReg
Test Engine=Qwen2-7B-I...
2026.05
55.3
REVOLVE
Test Engine=Qwen2-7B-I...
2026.05
54.4
TextReg
Test Engine=Llama-3-8B...
2026.05
53.3
CoT
Test Engine=Llama-3-8B...
2026.05
52.6
CoT
Test Engine=Qwen2-7B-I...
2026.05
51.6
REVOLVE
Test Engine=Llama-3-8B...
2026.05
51.5
TextGrad
Test Engine=Qwen2-7B-I...
2026.05
51.3
TextGrad
Test Engine=Phi-3.5-Mi...
2026.05
46.1
REVOLVE
Test Engine=Phi-3.5-Mi...
2026.05
43.4
TextGrad
Test Engine=Llama-3-8B...
2026.05
42.1
Feedback
Search any
task
Search any
task