Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Bio

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringBio (test)
LLM-Judge Score82.9
105
Long-form GenerationBio
LLM-Judge Score81
45
Factuality CorrectionBIO (test)
Precision51
44
Factuality CorrectionBIO dataset
Factual Precision93
24
Long-form Biography GenerationBio FactScore
FactScore81.2
17
Question AnsweringBio poison @ Position 10, k=10 (test)
Robustness Score (LLM-J)79.9
15
Question AnsweringBio poison @ Position 1, k=10 (test)
Rob. LLM-J Score79.3
15
Conformal Predictionbio (test)
Marginal Coverage90
14
Topic ModelingBio
IRBO100
13
Topic ModelingBio
NPMI0.191
13
Document ClusteringBio (test)
NMI0.557
13
Tabular ClassificationBIO M (test)
Macro F180.1
9
Factuality EvaluationBIO (test)
FS Score88.9
8
AMR ParsingBIO
Smatch62.8
8
Retrieval Question AnsweringBio
MRR0.15
6
Conjunctive Query AnsweringBio queries (test)
AUC91
6
Conformal PredictionBio
Empirical Coverage90
4
Regressionbio (test)
Max Conditional Coverage Deviation4.7
4
Showing 18 of 18 rows