Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Natural Language Processing benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Natural Language Processing
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
T0 MTest11 P3 (test)
Full FT
Accuracy
61.4
42
3mo ago
7 NLP Tasks (test)
Fine-tuned
Average Accuracy
88.9
38
2mo ago
T0 benchmark
T0*
RTE
85.8
18
3mo ago
NLP
SAIR
Cost per 1K Requests ($)
0.007
15
3mo ago
T0 Without SCloze dataset HyperT5 variant (test)
FiD-ICL
Accuracy
60.6
14
3mo ago
decaNLP Tasks (unseen)
Diana
AN'
34.69
14
3mo ago
decaNLP Tasks seen (test)
Multitask
AN Score
77.97
14
3mo ago
SuperGLUE Full, excl. ReCoRD (dev)
Multi-CLS BERT
Macro Avg Score
70.03
13
3mo ago
SuperGLUE 1k samples, excl. ReCoRD (dev)
Multi-CLS BERT
Macro Avg Score
65.84
13
3mo ago
SuperGLUE 100 samples, excl. ReCoRD (dev)
Multi-CLS BERT
Macro Avg Score
59.88
13
3mo ago
GLUE 1k samples (dev)
Multi-CLS BERT
Macro Avg Score
76.27
13
3mo ago
GLUE 100 samples (dev)
Multi-CLS BERT
Macro Avg Score
64.24
13
3mo ago
BERT NLP Task Suite (ANLI, Rotten Tomatoes, CoLA, SMS) (test)
TA
ANLI Accuracy
51.5
12
3mo ago
7 NLP Tasks Aggregate T5-Base T5-Large (average)
Fine-tuned
Accuracy (%)
88.9
12
3mo ago
8 NLP Tasks (avg)
TSV-M
Accuracy
82.21
10
3mo ago
eleven NLP tasks
Traditional MTL
Average Accuracy
73.1
10
3mo ago
BigBench II
PromptCOS
Accuracy Degradation (%)
-0.37
9
9d ago
GLUE
RMT-KD
Red Score
80.9
8
3mo ago
FLAN 8-task subset: arc_challenge, cosmos_qa, definite_pronoun_resolution, glue_qqp, hellaswag, mnli, squad_v1, sst2
FFA-LoRA
Closed-book QA
71
7
3mo ago
LLaMA Language Task Suite (SST-2, RTE, CB, BoolQ, WSC, WiC, MultiRC, COPA, ReCoRD, SQuAD, DROP) 7B
FO
SST-2 Accuracy
95
6
1mo ago
Federated Dataset Personalization 2
FedDPA-T
Paraphrasing Accuracy
90.5
6
3mo ago
Federated Dataset 1 (Personalization)
FedDPA-T
Paraphrasing Score
0.805
6
3mo ago
29 public NLP benchmarks (average)
GLaM
Accuracy
68.1
6
3mo ago
Hebrew NLP Benchmarks
Hebatron
SNLI Accuracy
91.2
4
21d ago
Federated Dataset Test-Time Personalization 2
FedDPA-F
Paraphrasing
71.64
4
3mo ago
Showing 25 of 33 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs