Share your thoughts, 1 month free Claude Pro on usSee more

Model Utility Evaluation on XSTest

96.4CR

Qwen2.5-7B-Instruct

Updated 3mo ago

Evaluation Results

Method	Links
Qwen2.5-7B-Instruct 2026.04		96.4
FineSteer 2026.04		96
AlphaSteer 2026.04		95.7
FineSteer 2026.04		95.5
Llama-3.1-8B-Instruct 2026.04		92.8
FineSteer 2026.04		90.6
FineSteer 2026.04		89.9
AlphaSteer 2026.04		88
AlphaSteer 2026.04		70.4
TruthFlow 2026.04		66.4
AlphaSteer 2026.04		60
BiPO 2026.04		8.4
TruthFlow 2026.04		7.2
BiPO 2026.04		7.2