Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning and Planning on Plasma
Loading...
85
Accuracy
CoT
55.36
63.055
70.75
78.445
May 11, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
CoT
Backbone=DeepSeek-V3-6...
2026.05
85
CoT
Backbone=DeepSeek-V3-6...
2026.05
78.4
Vanilla
Backbone=DeepSeek-V3-6...
2026.05
78.1
Vanilla
Backbone=DeepSeek-V3-6...
2026.05
77.4
ANCHOR
Backbone=Qwen2.5-72B,...
2026.05
76.4
CoT
Backbone=Qwen2.5-72B,...
2026.05
74.5
BIRD
Backbone=Qwen2.5-72B,...
2026.05
71.8
Vanilla
Backbone=Qwen2.5-72B,...
2026.05
64.9
Vanilla
Backbone=Qwen2.5-72B,...
2026.05
60.4
CoT
Backbone=Qwen2.5-72B,...
2026.05
56.5
Feedback
Search any
task
Search any
task