Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on Consumer QA (Cos. QA)
Loading...
98
Accuracy
Qwen3-30B-A3B-Instruct
21.04
41.02
61
80.98
Apr 26, 2026
Accuracy
F1 Score
Judge Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score
Judge Score
Qwen3-30B-A3B-Instruct
Evaluation Protocol=Ze...
2026.04
98
97
92
GPT-4o
Evaluation Protocol=Ze...
2026.04
98
98
81
LegalDrill-1.7B
Evaluation Protocol=Di...
2026.04
96
96
89
LegalDrill-1.7B
Evaluation Protocol=Di...
2026.04
94
94
88
LegalDrill-0.6B
Evaluation Protocol=Di...
2026.04
86
84
75
LegalDrill-0.6B
Evaluation Protocol=Di...
2026.04
84
83
77
DeepSeek ESFT-16B
Evaluation Protocol=Ze...
2026.04
81
80
74
Qwen3-1.7B
Evaluation Protocol=Ze...
2026.04
79
58
70
Qwen3-0.6B
Evaluation Protocol=Ze...
2026.04
69
34
66
Law-LLM-13B
Evaluation Protocol=Ze...
2026.04
49
38
36
DiscLaw-13B
Evaluation Protocol=Ze...
2026.04
24
18
11
Feedback
Search any
task
Search any
task