Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Over-Safety Evaluation on XSTest (XST Metric)
Loading...
23.2
XST Over-Safety
TELLME
5.728
10.264
14.8
19.336
Feb 7, 2025
XST Over-Safety
Updated 6d ago
Evaluation Results
Method
Method
Links
XST Over-Safety
TELLME
Backbone=Qwen2.5-7B
2025.02
23.2
TELLME NT-Xent
Backbone=Qwen2.5-7B
2025.02
22.4
TELLME NT-Xent
Backbone=Llama-3.1-8B
2025.02
21.2
TELLME
Backbone=Llama-3.1-8B
2025.02
18
SFT
Backbone=Llama-3.1-8B
2025.02
16.4
Origin
Backbone=Qwen2.5-7B
2025.02
16
SFT
Backbone=Mistral-7B-v0.3
2025.02
15.6
Origin
Backbone=Mistral-7B-v0.3
2025.02
14.4
SFT
Backbone=Qwen2.5-7B
2025.02
12
TELLME NT-Xent
Backbone=Mistral-7B-v0.3
2025.02
10.8
TELLME
Backbone=Mistral-7B-v0.3
2025.02
9.2
Origin
Backbone=Llama-3.1-8B
2025.02
6.4
Feedback
Search any
task
Search any
task