Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety and Utility Evaluation on XSTest
Loading...
98.8
Utility Score
ASTRA
72.592
79.396
86.2
93.004
Mar 23, 2026
Utility Score
Updated 2mo ago
Evaluation Results
Method
Method
Links
Utility Score
ASTRA
Base Model=LLaVA-v1.5
2026.03
98.8
NullSteer
Base Model=LLaVA-v1.5
2026.03
98.8
Vanilla
Base Model=LLaVA-v1.5
2026.03
98
NullSteer
Base Model=MiniGPT-4
2026.03
87.8
Vanilla
Base Model=MiniGPT-4
2026.03
87.6
ASTRA
Base Model=MiniGPT-4
2026.03
87.6
NullSteer
Base Model=Qwen2-VL
2026.03
74.5
ASTRA
Base Model=Qwen2-VL
2026.03
74
Vanilla
Base Model=Qwen2-VL
2026.03
73.6
Feedback
Search any
task
Search any
task