Share your thoughts, 1 month free Claude Pro on usSee more

Natural Language Processing

Benchmarks

Dataset Name	SOTA Method	Metric
T0 MTest11 P3 (test)		Accuracy61.4	42	4mo ago
7 NLP Tasks (test)		Average Accuracy88.9	38	3mo ago
T0 benchmark	T0*	RTE85.8	18	4mo ago
NLP	SAIR	Cost per 1K Requests ($)0.007	15	4mo ago
T0 Without SCloze dataset HyperT5 variant (test)		Accuracy60.6	14	4mo ago
decaNLP Tasks (unseen)	Diana	AN'34.69	14	4mo ago
decaNLP Tasks seen (test)		AN Score77.97	14	4mo ago
NLP-LR	TTT+CT-KV	Accuracy47.6	13	18d ago
SuperGLUE Full, excl. ReCoRD (dev)	Multi-CLS BERT	Macro Avg Score70.03	13	4mo ago
SuperGLUE 1k samples, excl. ReCoRD (dev)	Multi-CLS BERT	Macro Avg Score65.84	13	4mo ago
SuperGLUE 100 samples, excl. ReCoRD (dev)	Multi-CLS BERT	Macro Avg Score59.88	13	4mo ago
GLUE 1k samples (dev)	Multi-CLS BERT	Macro Avg Score76.27	13	4mo ago
GLUE 100 samples (dev)	Multi-CLS BERT	Macro Avg Score64.24	13	4mo ago
BERT NLP Task Suite (ANLI, Rotten Tomatoes, CoLA, SMS) (test)	TA	ANLI Accuracy51.5	12	4mo ago
7 NLP Tasks Aggregate T5-Base T5-Large (average)		Accuracy (%)88.9	12	4mo ago
8 NLP Tasks (avg)	TSV-M	Accuracy82.21	10	4mo ago
eleven NLP tasks		Average Accuracy73.1	10	4mo ago
BigBench II	PromptCOS	Accuracy Degradation (%)-0.37	9	2mo ago
GLUE	RMT-KD	Red Score80.9	8	4mo ago
FLAN 8-task subset: arc_challenge, cosmos_qa, definite_pronoun_resolution, glue_qqp, hellaswag, mnli, squad_v1, sst2	FFA-LoRA	Closed-book QA71	7	4mo ago
LLaMA Language Task Suite (SST-2, RTE, CB, BoolQ, WSC, WiC, MultiRC, COPA, ReCoRD, SQuAD, DROP) 7B		SST-2 Accuracy95	6	2mo ago
Federated Dataset Personalization 2	FedDPA-T	Paraphrasing Accuracy90.5	6	4mo ago
Federated Dataset 1 (Personalization)	FedDPA-T	Paraphrasing Score0.805	6	4mo ago
29 public NLP benchmarks (average)	GLaM	Accuracy68.1	6	4mo ago
PAWS (test)	cLA	Accuracy94.77	4	1mo ago

Showing 25 of 37 rows