Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Combinatorial Optimization on Crew Scheduling (test)
Loading...
63.53
Average Performance
MCTS-AHD
30.2708
38.9054
47.54
56.1746
May 17, 2026
Average Performance
Validity Rate
Updated 15d ago
Evaluation Results
Method
Method
Links
Average Performance
Validity Rate
MCTS-AHD
Budget=B=16
2026.05
63.53
86.21
MEMOIR (GPT-5-mini w/ GPT-5 critic)
Budget=B=16
2026.05
62.4
89.66
MEMOIR (GPT-5-mini)
Budget=B=16
2026.05
57.83
75.86
GPT-5-mini
Budget=B=16
2026.05
57.35
65.52
AIDE
Budget=B=16
2026.05
56
72.41
FunSearch
Budget=B=16
2026.05
55.64
62.07
GreedyRefine
Budget=B=16
2026.05
55.45
62.07
ReEvo
Budget=B=16
2026.05
52.74
65.52
GPT-5 Chat
Budget=B=16
2026.05
51.65
75.86
Classical Solver
Budget=B=16
2026.05
45.5
-
o3-mini-high
Budget=B=16
2026.05
44.12
65.52
MEMOIR (Qwen2.5-Coder-32B)
Budget=B=16
2026.05
31.55
55
Feedback
Search any
task
Search any
task