Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on Redteaming Resistance Benchmark (RRB) 919-example subset
Loading...
5.9
HarmR
Base LLM
4.08
16.365
28.65
40.935
Oct 19, 2025
HarmR
Help@S
Updated 26d ago
Evaluation Results
Method
Method
Links
HarmR
Help@S
Base LLM
Backbone=Mistral-NeMo-...
2025.10
5.9
3.25
Naive RAG
Backbone=Mistral-NeMo-...
2025.10
9.9
3.29
Base LLM
Backbone=Qwen-2.5-14B-...
2025.10
10.5
2.14
Base Agent
Backbone=Qwen-2.5-14B-...
2025.10
11.8
2.15
Base Agent
Backbone=Mistral-NeMo-...
2025.10
12.7
3.1
Base LLM
Backbone=Qwen-2.5-7B-I...
2025.10
16.2
2.49
Naive RAG
Backbone=Qwen-2.5-14B-...
2025.10
21.7
2.2
Base Agent
Backbone=Qwen-2.5-7B-I...
2025.10
29.1
2.52
Base LLM
Backbone=Qwen-2.5-3B-I...
2025.10
29.8
2.51
Ft. Agent
Backbone=Qwen-2.5-7B-I...
2025.10
32.4
2.55
Base Agent
Backbone=Qwen-2.5-3B-I...
2025.10
39.8
2.6
Naive RAG
Backbone=Qwen-2.5-7B-I...
2025.10
44.3
2.6
Naive RAG
Backbone=Qwen-2.5-3B-I...
2025.10
50.9
2.42
Ft. Agent
Backbone=Qwen-2.5-3B-I...
2025.10
51.4
2.58
Feedback
Search any
task
Search any
task