Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering Sufficiency Prediction on CouldAsk Benchmark
Loading...
0.7878
BBC Score
Identify-then-Verify
0.632944
0.673147
0.71335
0.753553
Dec 6, 2025
BBC Score
Yelp Score
Reddit Score
QA2 Score
BanditQA Score
SQuAD v2 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
BBC Score
Yelp Score
Reddit Score
QA2 Score
BanditQA Score
SQuAD v2 Score
Identify-then-Verify
2025.12
0.7878
0.6848
0.6965
0.7846
0.8333
0.695
MIGRES
2025.12
0.6691
0.5636
0.5815
0.7703
0.8182
0.821
Autorater
2025.12
0.6389
0.6181
0.5949
0.6793
0.8065
0.8287
Feedback
Search any
task
Search any
task