Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on BBQ (Accuracy and Token Usage)
Loading...
99.5
Accuracy
MultiGA (Llama-4 seed)
86.916
90.183
93.45
96.717
Nov 21, 2025
Accuracy
Tokens per Question
Updated 16d ago
Evaluation Results
Method
Method
Links
Accuracy
Tokens per Question
MultiGA (Llama-4 seed)
Eval Model=Qwen3
2025.11
99.5
23,596
MultiGA (Ensemble GA)
Eval Model=Qwen3
2025.11
96.6
36,574
Qwen3
Shot count=0-shot
2025.11
96.1
168
MultiGA (GPT-5 seed)
Eval Model=Qwen3
2025.11
96.1
15,355
MultiGA (Gemma-2 seed)
Eval Model=Qwen3
2025.11
96.1
9,523
GPT-5
Shot count=0-shot
2025.11
95.1
128
MultiGA (Phi-4-Mini seed)
Eval Model=Qwen3
2025.11
92.3
33,629
MultiGA (Ensemble GB)
Eval Model=GPT-5
2025.11
87.4
37,723
Feedback
Search any
task
Search any
task