Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Tasks on Frames
Loading...
70.45
Accuracy
Debate
59.3324
62.2187
65.105
67.9913
May 21, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Debate
Backbone LLM=GPT-4o, S...
2025.05
70.45
Self-Refine
Backbone LLM=GPT-4o, S...
2025.05
67.89
MAS-ZERO
Backbone LLM=GPT-4o, S...
2025.05
65.18
CoT-SC
Backbone LLM=GPT-4o, S...
2025.05
63.58
CoT
Backbone LLM=GPT-4o, S...
2025.05
59.76
Feedback
Search any
task
Search any
task