Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning-Level Denial-of-Service on WebShop Environment Injection (test)
Loading...
87
E2E Success
OTora
81.8
83.15
84.5
85.85
May 9, 2026
E2E Success
RTI Multiplier
Hit Rate
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
E2E Success
RTI Multiplier
Hit Rate
Accuracy
OTora
Model=LLaMA-3.1-70B, A...
2026.05
87
9.7
92
96.1
OTora
Model=GPT-OSS-120B, At...
2026.05
85
9.1
90
96.3
OTora
Model=DeepSeek-V2-67B,...
2026.05
84
9.2
90
95.5
OTora
Model=Qwen-2.5-32B, At...
2026.05
82
8.6
89
95.2
Feedback
Search any
task
Search any
task