Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Meeting Planning on Natural Plan
Loading...
95
Accuracy
MultiGA (GPT-5 seed)
-3.8
21.85
47.5
73.15
Nov 21, 2025
Accuracy
Token Usage per Question
Updated 16d ago
Evaluation Results
Method
Method
Links
Accuracy
Token Usage per Question
MultiGA (GPT-5 seed)
Eval Model=Qwen3, Shot...
2025.11
95
81,770
MultiGA (Ensemble GA)
Eval Model=Qwen3, Shot...
2025.11
88
71,499
GPT-5
Shot count=N-shot
2025.11
80.6
9,873
Qwen3
Shot count=0-shot
2025.11
79.6
7,877
GPT-5
Shot count=0-shot
2025.11
77.5
10,997
MultiGA (Llama-4 seed)
Eval Model=Qwen3, Shot...
2025.11
75
107,670
MultiGA (Ensemble GB)
Eval Model=GPT-5, Shot...
2025.11
69
92,784
Qwen3
Shot count=N-shot
2025.11
62.2
6,345
MultiGA (Gemma-2 seed)
Eval Model=Qwen3, Shot...
2025.11
32
52,673
MultiGA (Phi-4-Mini seed)
Eval Model=Qwen3, Shot...
2025.11
0
-
Feedback
Search any
task
Search any
task