Share your thoughts, 1 month free Claude Pro on usSee more

Short-text Multi-doc Question Answering on RGB noise robustness testbed (test)

96EM (Noise 0)

Our model (PAM QA)

Updated 5mo ago

Evaluation Results

Method	Links
Our model (PAM QA) 2023.11		96	90.67	90	85.5	67.33
GPT3.5-Turbo 2023.11		95.67	94.67	91	87.67	70.67
Qwen-14B-Chat 2023.11		94.67	92	88	85.3	69.67
Baichuan2-13B-Chat 2023.11		93	90.33	89	82.33	63.33
ChatGLM3-6B 2023.11		91.67	90	89	84.67	66.33
ChatGLM2-6B 2023.11		86.67	82.33	76.67	72.33	54