Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Entailment on Sara Entailment
Loading...
86
Accuracy
Qwen3-30B-A3B-Instruct
6.96
27.48
48
68.52
Apr 26, 2026
Accuracy
F1 Score
Judge Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score
Judge Score
Qwen3-30B-A3B-Instruct
Evaluation Protocol=Ze...
2026.04
86
43
51
GPT-4o
Evaluation Protocol=Ze...
2026.04
83
81
62
LegalDrill-0.6B
Evaluation Protocol=Di...
2026.04
75
42
41
LegalDrill-1.7B
Evaluation Protocol=Di...
2026.04
75
46
43
LegalDrill-0.6B
Evaluation Protocol=Di...
2026.04
74
45
44
LegalDrill-1.7B
Evaluation Protocol=Di...
2026.04
73
39
42
Qwen3-1.7B
Evaluation Protocol=Ze...
2026.04
66
38
37
Qwen3-0.6B
Evaluation Protocol=Ze...
2026.04
59
27
18
Law-LLM-13B
Evaluation Protocol=Ze...
2026.04
51
38
18
DeepSeek ESFT-16B
Evaluation Protocol=Ze...
2026.04
46
44
27
DiscLaw-13B
Evaluation Protocol=Ze...
2026.04
10
7
5
Feedback
Search any
task
Search any
task