Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Risk Identification on IS-Bench
Loading...
69.9
Step Accuracy
GPT-5.1
44.732
51.266
57.8
64.334
May 29, 2026
Step Accuracy
Precision
Recall
F1 Score
Updated 2d ago
Evaluation Results
Method
Method
Links
Step Accuracy
Precision
Recall
F1 Score
GPT-5.1
2026.05
69.9
29.1
64.2
38.9
Qwen-3-VL-32B
Model size=32B
2026.05
66.7
27.8
70.8
41
EMBGUARD-4B
Model size=4B
2026.05
63.1
25.7
71.7
38.3
Gemini-2.5-Pro
2026.05
49.9
22.2
88.2
40.7
EMBGUARD-2B
Model size=2B
2026.05
45.7
19.1
76.7
30.5
Feedback
Search any
task
Search any
task