Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CWQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Knowledge Graph Question AnsweringCWQ
Hit@179.3
212
Knowledge Graph Question AnsweringCWQ (test)
Hits@176.9
125
Multi-Hop Knowledge Graph Question AnsweringCWQ
Hits@181.4
64
Knowledge Base Question AnsweringCWQ (test)
F1 Score81.3
44
Knowledge Base Question AnsweringCWQ Freebase (test)
Hits@186
38
Multi-hop Question AnsweringCWQ
Pass@155.8
36
Knowledge Base Question AnsweringCWQ
Hits@186.3
30
Question AnsweringCWQ
Accuracy23.62
30
Discriminative EvaluationCWQ (test)
Binary Accuracy92.88
24
Knowledge Base Question AnsweringCWQ
Answer F151.74
18
Question AnsweringCWQ
Hits@172.5
17
Knowledge Base CompletionCWQ 50% KB
MRR61.4
16
Knowledge Base CompletionCWQ (30% KB)
MRR58.8
16
Knowledge Graph Question AnsweringCWQ Switzerland
Accuracy68
14
Entity LinkingCWQ (test)
Precision82.43
13
Knowledge Graph Question AnsweringCWQ
# Calls1
12
Knowledge Base Question AnsweringCWQ 50% KB
Hits@150.8
12
Knowledge Base Question AnsweringCWQ 30% KB
Hits@150.2
12
Hallucination DetectionCWQ
F1 Score87.4
11
Hallucination DetectionCWQ
F1 Score87.4
11
Multi-hop ReasoningCWQ
Hits@182.2
10
Knowledge Graph Question AnsweringCWQ multi-entity
F1 Score64.9
7
Knowledge Base Question AnsweringCWQ (hidden test)
Accuracy67.1
7
Hallucination DetectionCWQ
LLM Calls0
6
Complex Question AnsweringCWQ
Total Score14.85
4
Showing 25 of 28 rows