Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Output Length Prediction on GSM8K (test)
Loading...
19.57
MAE
ProD-D
17.9692
28.7746
39.58
50.3854
Apr 9, 2026
MAE
Updated 9d ago
Evaluation Results
Method
Method
Links
MAE
ProD-D
Served Model=Llama-3-8B
2026.04
19.57
ProD-M
Served Model=Llama-3-8B
2026.04
20.32
Noise Radius
Served Model=Llama-3-8B
2026.04
20.39
S^3
Served Model=Llama-3-8B
2026.04
24.41
TRAIL-last
Served Model=Llama-3-8B
2026.04
24.51
TRAIL-mean
Served Model=Llama-3-8B
2026.04
26.29
EGTP
Served Model=Llama-3-8B
2026.04
28.25
ProD-D
Served Model=Qwen-2.5-7B
2026.04
30.35
ProD-M
Served Model=Qwen-2.5-7B
2026.04
30.8
Constant Median
Served Model=Llama-3-8B
2026.04
32.07
Noise Radius
Served Model=Qwen-2.5-7B
2026.04
32.94
TRAIL-last
Served Model=Qwen-2.5-7B
2026.04
35.13
S^3
Served Model=Qwen-2.5-7B
2026.04
41.6
TRAIL-mean
Served Model=Qwen-2.5-7B
2026.04
44.04
EGTP
Served Model=Qwen-2.5-7B
2026.04
49.5
Constant Median
Served Model=Qwen-2.5-7B
2026.04
59.59
Feedback
Search any
task
Search any
task