Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Execution Output Prediction on LiveCodeBench
Loading...
90.92
Accuracy
Vanilla
85.6992
87.0546
88.41
89.7654
Jun 1, 2026
Accuracy
Token Count
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Token Count
Vanilla
Model=Qwen3-4B-Thinkin...
2026.06
90.92
2,898
DEER
Model=Qwen3-4B-Thinkin...
2026.06
86.6
2,683
CUSUM
Model=Qwen3-4B-Thinkin...
2026.06
86.6
2,621
Dynasor
Model=Qwen3-4B-Thinkin...
2026.06
85.9
2,596
Feedback
Search any
task
Search any
task