Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Natural Language Inference on MultiNLI controlled shortcut injection
Loading...
32.3
Accuracy
SG
3.7
11.125
18.55
25.975
Apr 14, 2026
Accuracy
Worst-Group Accuracy (WGA)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Worst-Group Accuracy (WGA)
SG
2026.04
32.3
1
DFR
2026.04
27.3
1
ERM
2026.04
27
1
JTT
2026.04
26.8
3
NFL
2026.04
4.8
3
Feedback
Search any
task
Search any
task