Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Graph Reasoning on G-Real
Loading...
100
Success Rate
AgentSquare
-4
23
50
77
May 11, 2026
Success Rate
Updated 22d ago
Evaluation Results
Method
Method
Links
Success Rate
AgentSquare
Base LLM=gpt-5.4-nano,...
2026.05
100
FixedSolver
Base LLM=gpt-5.4-nano,...
2026.05
79.5
w/o Protocol-Reliability Obj
Base LLM=gpt-5.4-nano,...
2026.05
79.5
EGL-SCA (Full)
Base LLM=gpt-5.4-nano
2026.05
79.5
w/o SCA Routing
Base LLM=gpt-5.4-nano,...
2026.05
78
w/o Instruction Evol.
Base LLM=gpt-5.4-nano,...
2026.05
76
MA-GTS
Base LLM=gpt-5.4-nano,...
2026.05
72
w/o Tool Growth
Base LLM=gpt-5.4-nano,...
2026.05
69.5
ReAct
Base LLM=gpt-5.4-nano,...
2026.05
19.5
Few-Shot
Base LLM=gpt-5.4-nano,...
2026.05
6.5
Reflexion
Base LLM=gpt-5.4-nano,...
2026.05
5.5
ExpeL
Base LLM=gpt-5.4-nano,...
2026.05
5.5
w/o Tool Use
Base LLM=gpt-5.4-nano,...
2026.05
3
Chain-of-Thought (CoT)
Base LLM=gpt-5.4-nano,...
2026.05
2
Direct Prompting
Base LLM=gpt-5.4-nano,...
2026.05
1.5
GEPA
Base LLM=gpt-5.4-nano,...
2026.05
0
Feedback
Search any
task
Search any
task