Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Natural Language Processing benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Natural Language Processing
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
T0 MTest11 P3 (test)
Full FT
Accuracy
61.4
42
4d ago
7 NLP Tasks (test)
Fine-tuned
Average Accuracy
88.9
20
4d ago
T0 benchmark
T0*
RTE
85.8
18
4d ago
NLP
SAIR
Cost per 1K Requests ($)
0.007
15
4d ago
T0 Without SCloze dataset HyperT5 variant (test)
FiD-ICL
Accuracy
60.6
14
4d ago
decaNLP Tasks (unseen)
Diana
AN'
34.69
14
4d ago
decaNLP Tasks seen (test)
Multitask
AN Score
77.97
14
4d ago
SuperGLUE Full, excl. ReCoRD (dev)
Multi-CLS BERT
Macro Avg Score
70.03
13
4d ago
SuperGLUE 1k samples, excl. ReCoRD (dev)
Multi-CLS BERT
Macro Avg Score
65.84
13
4d ago
SuperGLUE 100 samples, excl. ReCoRD (dev)
Multi-CLS BERT
Macro Avg Score
59.88
13
4d ago
GLUE 1k samples (dev)
Multi-CLS BERT
Macro Avg Score
76.27
13
4d ago
GLUE 100 samples (dev)
Multi-CLS BERT
Macro Avg Score
64.24
13
4d ago
BERT NLP Task Suite (ANLI, Rotten Tomatoes, CoLA, SMS) (test)
TA
ANLI Accuracy
51.5
12
4d ago
7 NLP Tasks Aggregate T5-Base T5-Large (average)
Fine-tuned
Accuracy (%)
88.9
12
4d ago
8 NLP Tasks (avg)
TSV-M
Accuracy
82.21
10
4d ago
eleven NLP tasks
Traditional MTL
Average Accuracy
73.1
10
4d ago
GLUE
RMT-KD
Red Score
80.9
8
4d ago
FLAN 8-task subset: arc_challenge, cosmos_qa, definite_pronoun_resolution, glue_qqp, hellaswag, mnli, squad_v1, sst2
FFA-LoRA
Closed-book QA
71
7
4d ago
Federated Dataset Personalization 2
FedDPA-T
Paraphrasing Accuracy
90.5
6
4d ago
Federated Dataset 1 (Personalization)
FedDPA-T
Paraphrasing Score
0.805
6
4d ago
29 public NLP benchmarks (average)
GLaM
Accuracy
68.1
6
4d ago
Federated Dataset Test-Time Personalization 2
FedDPA-F
Paraphrasing
71.64
4
4d ago
Federated Dataset 1 Test-Time Personalization
FedDPA-F
Paraphrase Accuracy
78.1
4
4d ago
12 NLP task categories Average across distributions
FlexLoRA
Avg Improvement (%)
1.56
4
4d ago
12 NLP task categories Heavy-Tail (S) distribution
FlexLoRA
Avg Percentage Improvement
0.0166
4
4d ago
Showing 25 of 30 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Terms of Service
FAQs