Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Distractor Generation on CLOTH Hard
Loading...
4.2
Invalid Ratio
Qwen 2.5 7B
3.692
7.121
10.55
13.979
Nov 3, 2025
Invalid Ratio
Updated 14d ago
Evaluation Results
Method
Method
Links
Invalid Ratio
Qwen 2.5 7B
training=DCDG-trained
2025.11
4.2
Gemma 2 9B
training=DCDG-trained
2025.11
5.1
Llama 3.1 8B
training=DCDG-trained
2025.11
5.2
GPT-4o
prompting=5-shot
2025.11
6.8
GPT-4o
prompting=0-shot
2025.11
16.9
Feedback
Search any
task
Search any
task