Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Distractor Generation on CLOTH Easy
Loading...
0
Invalid Ratio
Qwen 2.5 7B
-0.272
1.564
3.4
5.236
Nov 3, 2025
Invalid Ratio
Updated 14d ago
Evaluation Results
Method
Method
Links
Invalid Ratio
Qwen 2.5 7B
training=DCDG-trained
2025.11
0
Gemma 2 9B
training=DCDG-trained
2025.11
0.2
Llama 3.1 8B
training=DCDG-trained
2025.11
0.2
GPT-4o
prompting=5-shot
2025.11
1.6
GPT-4o
prompting=0-shot
2025.11
6.8
Feedback
Search any
task
Search any
task