Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ConFiQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multiple Choice Question AnsweringConFiQA MC
F1 Score91.2
42
Open-ended Question AnsweringConFiQA (test)
F1 Score95.7
36
Multi-step Reasoning Question AnsweringConFiQA MR (test)
F1 Score91.3
36
Open-book generation under knowledge conflictConFiQA 1,500 subset
Ps Score81.07
32
Context-faithful Question AnsweringConFiQA
MR13.21
24
Retrieval FollowingConFiQA MC 1.0 (test)
Pc54.9
20
Retrieval FollowingConFiQA MR 1.0 (test)
Pc61.2
20
Retrieval FollowingConFiQA QA 1.0 (test)
Pc92.3
20
Open-book generation under knowledge conflictConFiQA MR 1,500
Ps Score59.8
16
Question AnsweringConFiQA (out-of-domain)
Hit84.63
12
Context-faithful ReasoningConFiQA MC
Pc38.8
8
Context-faithful Multi-hop ReasoningConFiQA MR
Pc45.4
8
Question AnsweringConFiQA-QA counterfactual contexts
Accuracy81.2
7
Question AnsweringConFiQA
F1 Score94.3
6
Question AnsweringConFiQA MR
F1 Score89.6
6
Multiple ChoiceConFiQA MC
Ps Score53.4
4
Machine ReadingConFiQA MR
Ps Score54.47
4
Question AnsweringConFiQA QA
Ps74.73
4
Showing 18 of 18 rows