Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Yes/No Question Answering on BoolQ (test)
Loading...
79.2
Accuracy
4o
48
56.1
64.2
72.3
Jul 1, 2021
Jan 28, 2022
Aug 27, 2022
Mar 26, 2023
Oct 23, 2023
May 21, 2024
Dec 19, 2024
Accuracy
Delta Accuracy
√ → X
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Delta Accuracy
√ → X
4o
Model Family=ChatGPT
2024.12
79.2
4.9
11.3
o1-preview
Model Family=ChatGPT
2024.12
78.7
4.9
13.2
R1
Model Family=DeepSeek
2024.12
78.1
1.6
7.9
o1-mini
Model Family=ChatGPT
2024.12
74.1
4.2
15.6
CLINE
mode=fine-tuned
2021.07
73.9
-
-
RoBERTa
mode=fine-tuned
2021.07
69.6
-
-
V3
Model Family=DeepSeek
2024.12
69
9.2
28.5
3.5-turbo
Model Family=ChatGPT
2024.12
62.5
12.1
34
BERT
mode=fine-tuned
2021.07
60.9
-
-
2-7B
Model Family=Llama
2024.12
52.8
8.7
26.5
3-8B
Model Family=Llama
2024.12
50.1
20.3
58.2
3.1-8B
Model Family=Llama
2024.12
49.2
20.4
58.8
Feedback
Search any
task
Search any
task