Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Out-of-Domain Reasoning Aggregation on OOD Average

63.57Accuracy

Qwen3-4B-Thinking-2507 + BET

28.646837.713446.7855.8466May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
63.572,8652.338
2026.05
62.415,4341.224
2026.05
62.396,5731
2026.05
61.163,4281.88
2026.05
60.376,7170.947
2026.05
58.833,8991.59
2026.05
54.25,5471.03
2026.05
52.594,7891.157
2026.05
50.345,3051
2026.05
45.54,9491
2026.05
45.432,3082.141
2026.05
44.763,1811.531
2026.05
43.664,0931.16
2026.05
38.735,5401
2026.05
38.332,2992.385
2026.05
38.243,8361.426
2026.05
36.795,2611
2026.05
36.665,0651.035
2026.05
35.124,4631.132
2026.05
34.555,0010.988
2026.05
33.314,9230.968
2026.05
29.995,0500.85