Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

bAbI

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringbAbI (test)
Mean Error0.032
54
sys-bAbI tasksys-bAbI original (test)
Gap7.95
22
Spatial ReasoningbAbI (test)
Accuracy24
20
Question AnsweringbAbI
Accuracy26.8
16
Question AnsweringbAbI 10k (test)
Task 1: 1 Supporting Fact Error0
15
Question AnsweringbAbI 1k (train test)
Task: 1 Supporting Fact Acc50
12
Question AnsweringbAbI 10k 1.0 (test)
Mean Error Rate21
10
Question AnsweringbAbI 1.0 (test)
Task 1 Accuracy0.4
10
Dialogue Response GenerationbAbI Dialogue Task 4 OOV
Per-response Accuracy100
9
Dialogue Response GenerationbAbI Dialogue Task 5
Per-response Accuracy99.6
9
Dialogue Response GenerationbAbI Dialogue Task 3
Accuracy (Per-response)96.3
9
Question AnsweringbAbI 1K examples 1.0 (test)
Average Error Rate4.55
8
Dialog ReasoningbAbI dialog tasks (test)
Issuing API calls Error Rate0
8
Reading ComprehensionbAbi 1K (test)
Maximum Accuracy90.1
7
DialogbAbI dialog
Average Error Rate1.5
7
Textual Question AnsweringbAbI English 10k (test)
Failed Tasks Count (Error > 5%)0
7
ReasoningBABI 2-Choice
Delta Accuracy28.9
6
Spatial ReasoningbAbI original (test)
Task 17 Accuracy99.88
6
Question AnsweringbAbI QA 1k
Failed Tasks Count5
6
Spatial reasoningbAbI Task 1 qa1 50 samples
Accuracy19
5
ReasoningbAbI (test)
Acc (Single Supporting Fact)96.27
5
Question AnsweringbAbI 10K synthesized samples
Accuracy26.8
3
Question AnsweringbAbi-Mix
Average Error Rate11.8
3
Question AnsweringbAbI v1.2 (test)
Task 1: Single Supporting Fact1
2
Showing 24 of 24 rows