Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning-Level Denial-of-Service on OS Environment Injection (test)
Loading...
80
E2E Success
OTora
74.8
76.15
77.5
78.85
May 9, 2026
E2E Success
RTI
Hit Rate
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
E2E Success
RTI
Hit Rate
Accuracy
OTora
Model=LLaMA-3.1-70B, A...
2026.05
80
10.8
86
95.7
OTora
Model=DeepSeek-V2-67B,...
2026.05
77
10.1
84
95
OTora
Model=GPT-OSS-120B, At...
2026.05
76
10
84
96
OTora
Model=Qwen-2.5-32B, At...
2026.05
75
9.5
83
94.8
Feedback
Search any
task
Search any
task