Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RA-QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
String-level response similarityRA-QA Global, Discriminative tasks
BERTScore0.9
8
String-level response similarityRA-QA Multiple-choice, Discriminative tasks
BERTScore0.85
4
String-level response similarityRA-QA Single-Verify, Discriminative tasks
BERTScore94
4
Discriminative tasksRA-QA
Accuracy72
4
Regression tasksRA-QA
MAE2.29
3
Showing 5 of 5 rows