Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LFQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
WatermarkingLFQA
TPR (FPR < 10^-4)100
40
Pairwise RankingLFQA
Pairwise Preference Accuracy77.24
13
SycophancyLFQA
Sycophancy (PD, L)0.276
6
Answer quality evaluationLFQA
GPT-4o Score4.115
4
Multi-bit WatermarkingLFQA
Perplexity2.636
4
Long-form Question AnsweringLFQA
AIS (Decomposition)90.9
4
Long-Form Question AnsweringLFQA (test)
R-L38.2
3
Machine-Generated Text DetectionLFQA (10% Editing)
TPR99.9
3
Machine-Generated Text DetectionLFQA No Editing
TPR100
3
Showing 9 of 9 rows