Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on PQAref 908 samples (test)
Loading...
3
Max References Per Answer Count
M2
-0.12
0.69
1.5
2.31
Jan 16, 2026
Max References Per Answer Count
Count of Answers with No References
Count of Hallucinated PMIDs
BERTScore (F1)
Updated 6d ago
Evaluation Results
Method
Method
Links
Max References Per Answer Count
Count of Answers with No References
Count of Hallucinated PMIDs
BERTScore (F1)
M2
Model=Mistral-7B-Instr...
2026.01
3
5
3
90
GPT-4 T
Model=GPT-4 Turbo
2026.01
1
2
0
90
0-M2
Model=Mistral-7B-Instr...
2026.01
0
165
26
84
Feedback
Search any
task
Search any
task