Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Natural Language Processing benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Natural Language Processing
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
T0 MTest11 P3 (test)
Full FT
Accuracy
61.4
42
1mo ago
7 NLP Tasks (test)
Fine-tuned
Average Accuracy
88.9
38
15d ago
T0 benchmark
T0*
RTE
85.8
18
1mo ago
NLP
SAIR
Cost per 1K Requests ($)
0.007
15
1mo ago
T0 Without SCloze dataset HyperT5 variant (test)
FiD-ICL
Accuracy
60.6
14
1mo ago
decaNLP Tasks (unseen)
Diana
AN'
34.69
14
1mo ago
decaNLP Tasks seen (test)
Multitask
AN Score
77.97
14
1mo ago
SuperGLUE Full, excl. ReCoRD (dev)
Multi-CLS BERT
Macro Avg Score
70.03
13
1mo ago
SuperGLUE 1k samples, excl. ReCoRD (dev)
Multi-CLS BERT
Macro Avg Score
65.84
13
1mo ago
SuperGLUE 100 samples, excl. ReCoRD (dev)
Multi-CLS BERT
Macro Avg Score
59.88
13
1mo ago
GLUE 1k samples (dev)
Multi-CLS BERT
Macro Avg Score
76.27
13
1mo ago
GLUE 100 samples (dev)
Multi-CLS BERT
Macro Avg Score
64.24
13
1mo ago
BERT NLP Task Suite (ANLI, Rotten Tomatoes, CoLA, SMS) (test)
TA
ANLI Accuracy
51.5
12
1mo ago
7 NLP Tasks Aggregate T5-Base T5-Large (average)
Fine-tuned
Accuracy (%)
88.9
12
1mo ago
8 NLP Tasks (avg)
TSV-M
Accuracy
82.21
10
1mo ago
eleven NLP tasks
Traditional MTL
Average Accuracy
73.1
10
1mo ago
GLUE
RMT-KD
Red Score
80.9
8
1mo ago
FLAN 8-task subset: arc_challenge, cosmos_qa, definite_pronoun_resolution, glue_qqp, hellaswag, mnli, squad_v1, sst2
FFA-LoRA
Closed-book QA
71
7
1mo ago
Federated Dataset Personalization 2
FedDPA-T
Paraphrasing Accuracy
90.5
6
1mo ago
Federated Dataset 1 (Personalization)
FedDPA-T
Paraphrasing Score
0.805
6
1mo ago
29 public NLP benchmarks (average)
GLaM
Accuracy
68.1
6
1mo ago
Federated Dataset Test-Time Personalization 2
FedDPA-F
Paraphrasing
71.64
4
1mo ago
Federated Dataset 1 Test-Time Personalization
FedDPA-F
Paraphrase Accuracy
78.1
4
1mo ago
12 NLP task categories Average across distributions
FlexLoRA
Avg Improvement (%)
1.56
4
1mo ago
12 NLP task categories Heavy-Tail (S) distribution
FlexLoRA
Avg Percentage Improvement
0.0166
4
1mo ago
Showing 25 of 30 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs