Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Robustness Evaluation benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Robustness Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
SI-Score size synthetic
RobustViT
R@1
66.5
31
4d ago
SI-Score rotation synthetic
RobustViT
R@1
58
31
4d ago
SI-Score location synthetic
RobustViT
R@1
48.3
31
4d ago
MultiRLVR
Master-RM
FPR (%)
0.02
20
4d ago
MATH
AdvJudge-Zero
FPR (%)
0
20
4d ago
GSM8K
Master-RM
FPR (%)
0
20
4d ago
AIME
Master-RM
FPR
0
20
4d ago
SA-1b photos
CIN
Identity Bit Accuracy
100
9
4d ago
Meta AI images
CIN
Identity Bit Acc
100
9
4d ago
LLMBar
Qwen3-30B-A3B-Thinking-2507
Accuracy
83.07
8
4d ago
BiasBench
Qwen2.5-32B-Instruct
Accuracy
82.5
8
4d ago
Lexical Variation (abbr.)
Mamba
Jensen-Shannon Divergence
0.0476
8
4d ago
CIFAR-100-C
Deep ens. (LPBN)
mCE
43.15
8
4d ago
CartPole A=9.5 (test)
+DR
Average Reward
231.8
6
4d ago
CartPole A=9.0 (test)
+ESN-OA-PT
Average Reward
830.9
6
4d ago
CartPole A=8.5 (test)
+ESN-OA-PT
Average Reward
810.1
6
4d ago
CartPole A=8.0 (test)
+ESN-OA-PT
Average Reward
1,000
6
4d ago
ImageNet Robustness Variants (IN-A, IN-R, IN-Sketch, IN-C) (test)
SiameseIM
IN-A Top-1 Acc
43.8
5
4d ago
SA-V
Video Seal
Identity Bit Accuracy
100
4
4d ago
MovieGen
Video Seal
Identity Bit Acc
100
4
4d ago
Stress Tests
SAT
Quantity Stress Score
58.1
4
4d ago
Lexical Variation typos
Mamba
Jensen-Shannon Divergence
0.0761
4
4d ago
Lexical Variation synonym
Mamba
Jensen-Shannon Divergence
0.013
4
4d ago
Lexical Variation spelling
Mamba
Jensen-Shannon Divergence
0.0054
4
4d ago
Lexical Variation (punctuation)
LLaMA
Jensen-Shannon Divergence
0.174
4
4d ago
Showing 25 of 39 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Terms of Service
FAQs