Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
String-level response similarity on RA-QA Multiple-choice, Discriminative tasks
Loading...
0.85
BERTScore
CareAQA-operaCT
-0.1172
0.1339
0.385
0.6361
Mar 6, 2026
BERTScore
METEOR
Updated 1mo ago
Evaluation Results
Method
Method
Links
BERTScore
METEOR
CareAQA-operaCT
Backbone=OPERA-CT
2026.03
0.85
82.96
RAMoEA-QA
2026.03
0.85
83.17
CareAQA-operaGT
Backbone=OPERA-GT
2026.03
0.84
81.72
PENGI
2026.03
-0.08
0
Feedback
Search any
task
Search any
task