Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Boolean Question Answering on BoolQ (test)

86.7Accuracy (Avg)

QWEN3-14B

51.901660.935869.9779.0042Dec 20, 2022Jun 27, 2023Jan 3, 2024Jul 10, 2024Jan 16, 2025Jul 24, 2025Jan 30, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.01
86.7-----
2026.01
86.42-----
2025.10
86.2---3-8.8
2026.01
84.92-----
2026.01
83.43-----
2026.01
83.33-----
2026.01
83.09-----
2026.01
78.87-----
2025.10
77.5---5-17.2
2026.01
75.47-----
2026.01
74.04-----
2022.12
69.42.162.8---
2022.12
69.33.857.3---
2022.12
692.661.5---
2026.01
68.65-----
2022.12
68.32.362.7---
2022.12
682.559.8---
2025.10
67.2---51.4
2026.01
66.21-----
2022.12
66.23.454.6---
2026.01
65.69-----
2022.12
65.54.951.8---
2022.12
65.55.250.4---
2022.12
65.20.963.4---
2022.12
65.25.649.7---
2022.12
65.11.661.1---
2022.12
64.85.349.3---
2022.12
64.76.449.3---
2022.12
63.82.756.4---
2022.12
63.72.256---
2022.12
63.56.351---
2022.12
62.63.355.6---
2022.12
62.3354.3---
2022.12
61.23.950.4---
2022.12
61.2451.1---
2022.12
61.23.351.9---
2022.12
613.849.7---
2022.12
60.83.549.6---
2022.12
604.349.5---
2026.01
57.4-----
2026.01
53.24-----
2026.03
---74.7--
2026.03
---83.8--
2026.03
---55.48--
2026.03
---96.91--
2026.03
---58.82--
2026.03
---78.56--
2026.03
---56.79--
2026.03
---89.85--
2026.03
---83.65--
2026.03
---85.05--
2026.03
---58.24--
2026.03
---64.22--
2026.03
---50.31--
2026.03
---68.72--
2026.03
---66.5--
2026.03
---77.77--
2026.03
---55.77--
2026.03
---63.85--