Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Open-ended Question Answering benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Open-ended Question Answering
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
ConFiQA (test)
ProbeRAG
F1 Score
95.7
36
4d ago
ActivityNet
LLaVA-OneVision
Accuracy
62.3
29
1mo ago
Vad-Reasoning-Plus
Qwen3VL-Thinking
BLEU-3
0.106
27
1mo ago
MSVD
MiniGPT4-Video
Accuracy
73.92
22
1mo ago
TruthfulQA
MoLaCE
Neutral Accuracy
74.24
15
1mo ago
SAGE Web Search
GPT-5
Weighted Recall (Com. Sci.)
35.1
12
1mo ago
MMAD (test)
MAU-GPT
ROUGE-1
0.7026
12
1mo ago
HybridQA (test)
ToT
Accuracy
91
11
1mo ago
MoreHopQA (test)
RouteGoT
Accuracy
77
11
1mo ago
HotpotQA (test)
RouteGoT
Accuracy
88
11
1mo ago
TREC-DL-NF (S5)
MinosEval
Kendall's Tau (K)
68.61
11
1mo ago
ANTIQUE (S5)
MinosEval
Kendall's Tau (K)
65.97
11
1mo ago
Proposed LLM-based evaluation benchmark OEQ
GPT-4o-Mini-Audio
Completeness
96.9
9
1mo ago
QAEGO4D (test)
GroundVQAB
ROUGE
30.4
9
1mo ago
CrossAlpaca-Eval en 2.0
Qwen2.5-7B-Instruction
GPT-4o Score
8.58
8
4d ago
Earth Observation
Qwen3
Judge Score
97.05
7
3d ago
TGIF
MiniGPT4-Video
Accuracy
0.7222
6
1mo ago
Satcom Open-Ended
Qwen3
Judge Score
83
5
3d ago
EO and Earth Sciences Open-Ended QA with Context
Qwen3
Judge Score
81.81
5
3d ago
EO and Earth Sciences Open-Ended QA
EVE-Instruct
Judge Score
96.4
5
3d ago
VinDr subset
LobA
F1 Score
54.2
5
1mo ago
CrossAlpaca-Eval zh 2.0
Qwen2.5-7B-AdaMCOT
GPT-4o Score
8.53
4
4d ago
AGMMU Open-Ended (test)
AgriChat
BLEU-4
43
4
1mo ago
Showing 23 of 23 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs