Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Bio

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringBio (test)
LLM-Judge Score82.9
105
Long-form GenerationBio
LLM-Judge Score81
59
Factuality CorrectionBIO (test)
Precision51
44
Uncertainty QuantificationBIO
PCC-0.129
32
Factuality CorrectionBIO dataset
Factual Precision93
24
Question AnsweringBio
Few-Shot Accuracy84.3
17
Long-form Biography GenerationBio FactScore
FactScore81.2
17
Question AnsweringBio poison @ Position 10, k=10 (test)
Robustness Score (LLM-J)79.9
15
Question AnsweringBio poison @ Position 1, k=10 (test)
Rob. LLM-J Score79.3
15
Conformal Predictionbio (test)
Marginal Coverage90
14
Topic ModelingBio
IRBO100
13
Topic ModelingBio
NPMI0.191
13
Document ClusteringBio (test)
NMI0.557
13
Tabular ClassificationBIO M (test)
Macro F180.1
9
Regressionbio
Coverage90.57
8
Factuality EvaluationBIO (test)
FS Score88.9
8
AMR ParsingBIO
Smatch62.8
8
Factuality EvaluationBio
Precision14.1
6
Long-form generationBio
PIA RLLMJ Score69.8
6
Retrieval Question AnsweringBio
MRR0.15
6
Conjunctive Query AnsweringBio queries (test)
AUC91
6
Conformal PredictionBio
Empirical Coverage90
4
Regressionbio (test)
Max Conditional Coverage Deviation4.7
4
Tabular RegressionBio
Cmarg0.9
3
AMP ClassificationBIO (test)
Error Rate2.4
1
Showing 25 of 25 rows