Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EXPERTQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Fact-checkingExpertQA
Balanced Accuracy61.1
25
Long-form Question AnsweringExpertQA
ROUGE-L23.34
18
Attributable Text GenerationExpertQA v1 (test)
AutoAIS0.6612
9
General Question AnsweringExpertQA
Reward0.2385
8
Question AnsweringEXPERTQA (test)
Claim Recall19.27
6
Retrieval-Augmented GenerationExpertQA
Faithfulness73.9
5
Medical Long-form AnsweringExpertQA Biomed
Relevance3.7
4
URL Health and Self-CorrectionExpertQA
Total URLs7,985
3
Showing 8 of 8 rows