Question Answering Utility Evaluation

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
CAPID (test)	Llama-3.1-8B (FT)	GPT-4 Score79		2	4mo ago
Reddit (test)	Llama-3.1-8B (FT)	GPT-4 Score0.8		2	4mo ago

Showing 2 of 2 rows