Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Diagnostic Reasoning on ER-Reason
Loading...
72.14
Final Accuracy
SEA (Qwen-8b)
40.6696
48.8398
57.01
65.1802
Apr 8, 2026
Final Accuracy
ΔAcc@50
ΔAcc@100
Updated 9d ago
Evaluation Results
Method
Method
Links
Final Accuracy
ΔAcc@50
ΔAcc@100
SEA (Qwen-8b)
Backbone Model=Qwen-8b
2026.04
72.14
20
35
GPT-5.2 (Zeroshot + Dual Memory)
Evaluation Protocol=Me...
2026.04
69.44
15
17
SEA (Qwen-4b)
Backbone Model=Qwen-4b
2026.04
68.5
23
32
Qwen-8b (RL-DiagnosticRewardOnly)
Evaluation Protocol=Tr...
2026.04
59.06
1
3
Qwen-8b (SFT)
Evaluation Protocol=Tr...
2026.04
56.82
-2
1
Qwen-8b (SFT + Dual Memory)
Evaluation Protocol=Me...
2026.04
54.21
19
21
GPT-5.2
Evaluation Protocol=Ze...
2026.04
53.16
8
13
Qwen-8b
Evaluation Protocol=Ze...
2026.04
44.61
1
6
Qwen-4b
Evaluation Protocol=Ze...
2026.04
41.88
-5
8
Feedback
Search any
task
Search any
task