Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WebQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringWebQA
CACC46.7
40
Uncertainty EstimationWebQA
AUROC73.57
30
Multimodal Question AnsweringWebQA
F1-Recall90.92
22
Multi-modal Retrieval (Image Query)WebQA
Recall@2043.55
21
Multi-modal Retrieval (Text Query)WebQA
Recall@2076.52
21
Multi-modal retrieval (Text to Text/Image-Text)WebQA
Recall@584.7
19
Poisoned Sample DetectionWebQA (IID)
Recall100
16
Poisoned sample detectionWebQA NIID-1
Recall99.12
16
Watermark DetectionWebQA
Rank1.05
16
Image-based Question AnsweringWebQA
Accuracy53.9
14
Narrative ReasoningWebQA (test)
BLEURT0.623
14
Visual Question AnsweringWebQA image segment 1.0 (test)
Accuracy49.8
12
Multimodal Question AnsweringWebQA k=2
ROrig@k64.8
8
Multimodal RetrievalWebQA 2
R@583.15
6
Multimodal RetrievalWebQA 1
Recall@595.19
6
RetrievalWebQA (test)
Recall@574.9
5
Open-domain Question AnsweringWebQA (test)
Accuracy53.1
5
Text-to-Text Retrieval (qt -> Ct)WebQA
Recall@584.7
4
Visual Question AnsweringWebQA (val)
FL53
4
Showing 19 of 19 rows