Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Short-text Multi-doc Question Answering on RGB noise robustness testbed (test)
Loading...
96
EM (Noise 0)
Our model (PAM QA)
86.2968
88.8159
91.335
93.8541
Nov 15, 2023
EM (Noise 0)
EM (Noise 0.2)
EM (Noise 0.4)
EM (Noise 0.6)
EM (Noise 0.8)
Updated 1mo ago
Evaluation Results
Method
Method
Links
EM (Noise 0)
EM (Noise 0.2)
EM (Noise 0.4)
EM (Noise 0.6)
EM (Noise 0.8)
Our model (PAM QA)
2023.11
96
90.67
90
85.5
67.33
GPT3.5-Turbo
2023.11
95.67
94.67
91
87.67
70.67
Qwen-14B-Chat
2023.11
94.67
92
88
85.3
69.67
Baichuan2-13B-Chat
2023.11
93
90.33
89
82.33
63.33
ChatGLM3-6B
status=Released after...
2023.11
91.67
90
89
84.67
66.33
ChatGLM2-6B
2023.11
86.67
82.33
76.67
72.33
54
Feedback
Search any
task
Search any
task