Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Robustness Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Robustness Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
SI-Score size synthetic
RobustViT
R@1
66.5
31
3mo ago
SI-Score rotation synthetic
RobustViT
R@1
58
31
3mo ago
SI-Score location synthetic
RobustViT
R@1
48.3
31
3mo ago
MultiRLVR
Master-RM
FPR (%)
0.02
20
3mo ago
MATH
AdvJudge-Zero
FPR (%)
0
20
3mo ago
GSM8K
Master-RM
FPR (%)
0
20
3mo ago
AIME
Master-RM
FPR
0
20
3mo ago
HellaSwag SAGE-generated
GPT-4o
Overall Accuracy (OA)
74.17
12
21d ago
SA-1b photos
CIN
Identity Bit Accuracy
100
9
3mo ago
Meta AI images
CIN
Identity Bit Acc
100
9
3mo ago
Perturbation Dataset
L4L
Change Accuracy
62.56
8
2mo ago
LLMBar
Qwen3-30B-A3B-Thinking-2507
Accuracy
83.07
8
3mo ago
BiasBench
Qwen2.5-32B-Instruct
Accuracy
82.5
8
3mo ago
Lexical Variation (abbr.)
Mamba
Jensen-Shannon Divergence
0.0476
8
3mo ago
CIFAR-100-C
Deep ens. (LPBN)
mCE
43.15
8
3mo ago
VizWiz
Latent Denoising
Accuracy
70.9
6
1mo ago
RWQA
Latent Denoising
Accuracy
72.9
6
1mo ago
NaturalBench
Latent Denoising
GACC
33.5
6
1mo ago
CartPole A=9.5 (test)
+DR
Average Reward
231.8
6
3mo ago
CartPole A=9.0 (test)
+ESN-OA-PT
Average Reward
830.9
6
3mo ago
CartPole A=8.5 (test)
+ESN-OA-PT
Average Reward
810.1
6
3mo ago
CartPole A=8.0 (test)
+ESN-OA-PT
Average Reward
1,000
6
3mo ago
OmniEarth
SkyNative
ICA
50
5
15d ago
ImageNet Robustness Variants (IN-A, IN-R, IN-Sketch, IN-C) (test)
SiameseIM
IN-A Top-1 Acc
43.8
5
3mo ago
CommonsenseQA
DeepSeek-R1-Distill-LLaMA-8B
VAcc
79.07
4
1mo ago
Showing 25 of 48 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs