Generative Question Answering

Benchmarks

Dataset Name	SOTA Method	Metric
Bolmo Evaluation Suite GenQA 7B	Llama 3.1 70B	GenQA Average81.6	39	3mo ago
MsMARCO (test)	Match-LSTM	ROUGE Score40.7	18	4mo ago
MsMARCO (dev)	RAG	ROUGE Score57.2	11	4mo ago
SimpleQA	RCSP	HALL Score92	10	1mo ago
MedHallu		HALL Score67.33	10	1mo ago
WebQuestions		HALL Score52	10	1mo ago
TruthfulQA (test)	Plan-and-Solve	HALL Score82.33	10	1mo ago
HaluEval (test)	Plan-and-Solve	HALL Rate50.33	10	1mo ago
Lu Xun's essay collections	CharacterBot	Content Score3.758	10	4mo ago
SQuAD Clean (test)	SDBN-p	Exact Match (EM)72.84	8	1mo ago
Amazon (test)	Prior-Aug	EM57.99	8	4mo ago
Reddit (test)		EM61.19	8	4mo ago
BioASQ (test)	SWEP	EM43.01	8	4mo ago
NYT (test)	SWEP	EM76.42	8	4mo ago
Wiki (test)	SWEP	EM73.34	8	4mo ago
FatwaQA	Gemini-3-Pro	Accuracy67	7	4mo ago
SQuAD Double-Char (test)	SDBN-p	EM69.52	5	1mo ago
SQuAD Keyboard-Char (test)	SDBN-p	EM69.04	5	1mo ago
DriveLM (test)	DriveLM-Agent	BLEU-453.09	5	4mo ago
TruthfulQA	KLAS	ROUGE-164.5	4	1mo ago
SQuAD Del-Word (test)	SDBN	EM51.7	3	1mo ago
SQuAD Del-Char (test)	SDBN	Exact Match (EM)54.1	3	1mo ago
SQuAD	Blended RAG	EM57.63	3	4mo ago

Showing 23 of 23 rows