Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Model Utility Evaluation on XSTest
Loading...
96.4
CR
Qwen2.5-7B-Instruct
3.632
27.716
51.8
75.884
Apr 16, 2026
CR
Updated 1mo ago
Evaluation Results
Method
Method
Links
CR
Qwen2.5-7B-Instruct
Base Model=Qwen2.5-7B-...
2026.04
96.4
FineSteer
Base Model=Qwen2.5-7B-...
2026.04
96
AlphaSteer
Base Model=Qwen2.5-7B-...
2026.04
95.7
FineSteer
Base Model=Qwen2.5-7B-...
2026.04
95.5
Llama-3.1-8B-Instruct
Base Model=Llama-3.1-8...
2026.04
92.8
FineSteer
Base Model=Llama-3.1-8...
2026.04
90.6
FineSteer
Base Model=Llama-3.1-8...
2026.04
89.9
AlphaSteer
Base Model=Llama-3.1-8...
2026.04
88
AlphaSteer
Base Model=Qwen2.5-7B-...
2026.04
70.4
TruthFlow
Base Model=Llama-3.1-8...
2026.04
66.4
AlphaSteer
Base Model=Llama-3.1-8...
2026.04
60
BiPO
Base Model=Llama-3.1-8...
2026.04
8.4
TruthFlow
Base Model=Qwen2.5-7B-...
2026.04
7.2
BiPO
Base Model=Qwen2.5-7B-...
2026.04
7.2
Feedback
Search any
task
Search any
task