Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Test-Time Scaling Selection on RouterBench Cost-Sensitive

0.8337Reward

Oracle

0.5437480.6190240.69430.769576May 29, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2026.05
0.833757.3810.6
2026.05
0.829754.8710.5
2026.05
0.753546.523.3
2026.05
0.718443.7554
2026.05
0.707946.8849.4
2026.05
0.702244.567.9
2026.05
0.700641.7548.6
2026.05
0.69844270.6
2026.05
0.696636.549
2026.05
0.693742.1276.7
2026.05
0.65941.38326
2026.05
0.630147644
2026.05
0.58850.751,558.2
2026.05
0.5731531,358.7
2026.05
0.558945660.6
2026.05
0.554952.121,762.8