Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NLI

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-fidelity Multi-armed BanditNLI shared evaluation pool
Mean Cost-Weighted Pseudo-Regret3,277.7
18
Natural Language InferenceNLI adversarial benchmark (test)
Average Score75.4
18
Natural Language InferenceNLI
Accuracy91.2
14
Natural Language InferenceNLI ANLI and HANS (unseen)
ANLI Score32.4
9
Prompt Injection DetectionNLI
Detection Rate (TPR/FPR)100
8
Natural Language InferenceNLI domain average
Best Accuracy87.5
8
Prompt LocalizationNLI
RL Score97.9
3
Natural Language InferenceNLI (test)
Relative CPU Speed2.89
2
Showing 8 of 8 rows