Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on HellaSwag (Accuracy, Delta, and Confidence Intervals)

89.3Accuracy

MoE-Sieve (Qwen1.5-MoE-A2.7B)

14.4233.8653.372.74Mar 25, 2026Mar 28, 2026Apr 1, 2026Apr 5, 2026Apr 9, 2026Apr 13, 2026Apr 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
89.30.730.53-
2026.03
88.5---
2026.03
80.70.170.71-
2026.03
80.5---
2026.04
60.6---
2026.04
59.8---
2026.04
58.6---
2026.04
57.8---
2026.04
57.5---
55.3---
2026.04
55.2---
2026.04
54.9---
2026.04
51.6---
2026.04
49.3---
2026.03
39---
2026.03
30---
2026.03
29---
2026.03
20.3---
2026.03
17.3---