Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Distractor Generation on Discrete_40
Loading...
3.25
Plausibility
MCTS-guided reasoning reconstruction framework
2.5116
2.7033
2.895
3.0867
Aug 15, 2025
Plausibility
Coherence
Updated 1mo ago
Evaluation Results
Method
Method
Links
Plausibility
Coherence
MCTS-guided reasoning reconstruction framework
Backbone Model=Claude-...
2025.08
3.25
2.64
MCTS-guided reasoning reconstruction framework
Backbone Model=Deepsee...
2025.08
3.16
2.59
CoT
Backbone Model=Deepsee...
2025.08
2.89
2.27
MCTS-guided reasoning reconstruction framework
Backbone Model=GPT-4o
2025.08
2.88
2.43
CoT
Backbone Model=Claude-...
2025.08
2.85
2.39
MCTS-guided reasoning reconstruction framework
Backbone Model=GPT-3.5...
2025.08
2.81
2.23
MCTS-guided reasoning reconstruction framework
Backbone Model=LLaMA-3...
2025.08
2.77
2.32
CoT
Backbone Model=GPT-4o
2025.08
2.72
2.25
CoT
Backbone Model=GPT-3.5...
2025.08
2.64
2.12
CoT
Backbone Model=LLaMA-3...
2025.08
2.54
2.13
Feedback
Search any
task
Search any
task