Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Claim-level Confidence Calibration on BeyondAIME

0.301SNR Gain

Qwen3-4B-Instruct-confidence-min

0.006680.083090.15950.23591Dec 22, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
0.30178.861.625.825.10.8153.7
2025.12
0.18374.152.631.130.50.9349.7
2025.12
0.06860.177.42121.10.88777
2025.12
0.02967.671.929.4261.02771.9
0.02761.756.345.641.91.80955.8
0.01971.989.11310.40.44989.1
2025.12
0.01871.788.913.210.60.46988.9