Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-form QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
AI-generated text detectionLong-form QA 3K generations corpus
Detection Accuracy (1% FPR)100
42
AI-generated text detectionLong-form QA 46K ShareGPT-augmented corpus
Detection Accuracy (1% FPR)100
18
AI-generated text detectionLong-form QA 9K pooled generations corpus
Detection Accuracy (at 1% FPR)100
18
Long-form QALong-form QA Short Q, Long A (test)
GPT4 Score6.182
15
Long-form Question AnsweringLong-form QA (test)
Win Rate vs. Holistic Reward61.7
13
Faithfulness EvaluationLong-Form QA
Correlation (Human Judgment)0.795
2
AI-generated text detectionLong-form QA 46K ShareGPT-augmented corpus 1.0 (test)
Detection Accuracy (1% FPR)-
0
AI-generated text detectionLong-form QA 9K pooled generations corpus 1.0 (test)
Accuracy (1% FPR)-
0
AI-generated text detectionLong-form QA 3K generations corpus 1.0 (test)
Detection Acc (1% FPR)-
0
Showing 9 of 9 rows