Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Short-text Multi-doc Question Answering on RGB noise robustness testbed (test)
Loading...
96
EM (Noise 0)
Our model (PAM QA)
86.2968
88.8159
91.335
93.8541
Nov 15, 2023
EM (Noise 0)
EM (Noise 0.2)
EM (Noise 0.4)
EM (Noise 0.6)
EM (Noise 0.8)
Updated 4d ago
Evaluation Results
Method
Method
Links
EM (Noise 0)
EM (Noise 0.2)
EM (Noise 0.4)
EM (Noise 0.6)
EM (Noise 0.8)
Our model (PAM QA)
2023.11
96
90.67
90
85.5
67.33
GPT3.5-Turbo
2023.11
95.67
94.67
91
87.67
70.67
Qwen-14B-Chat
2023.11
94.67
92
88
85.3
69.67
Baichuan2-13B-Chat
2023.11
93
90.33
89
82.33
63.33
ChatGLM3-6B
status=Released after...
2023.11
91.67
90
89
84.67
66.33
ChatGLM2-6B
2023.11
86.67
82.33
76.67
72.33
54
Feedback
Search any
task
Search any
task