Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Distractor Generation on Human Evaluation Set (test)
Loading...
4.14
Relevance
GPT-3
2.476
2.908
3.34
3.772
Apr 19, 2026
Relevance
Difficulty
Fluency
Updated 1mo ago
Evaluation Results
Method
Method
Links
Relevance
Difficulty
Fluency
GPT-3
Selection Strategy=k-NN
2026.04
4.14
3.51
4.78
GPT-3
Selection Strategy=COT...
2026.04
3.74
3.15
4.65
Ground-truth
2026.04
3.73
3.28
4.58
GPT-3
Selection Strategy=clu...
2026.04
3.45
2.93
4.56
TinyLlama
2026.04
3.34
2.97
4.47
T5
Method=beam
2026.04
3
2.51
3.9
T5
Method=contrast
2026.04
2.54
2.18
3.58
Feedback
Search any
task
Search any
task