Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Distractor Effectiveness on EduAgent (test)
Loading...
0.7933
Agreement Accuracy
QG-SMS
0.592268
0.644459
0.69665
0.748841
Mar 7, 2025
Agreement Accuracy
Consistency Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Agreement Accuracy
Consistency Accuracy
QG-SMS
Scoring Protocol=Pairw...
2025.03
0.7933
74.67
KDA_large
Scoring Protocol=Indiv...
2025.03
0.7733
-
Vanilla
Scoring Protocol=Pairw...
2025.03
0.7333
64
Metrics
Scoring Protocol=Pairw...
2025.03
0.72
62.67
Reference
Scoring Protocol=Pairw...
2025.03
0.6933
56
ChatEval
Scoring Protocol=Pairw...
2025.03
0.6933
56
QSalience
Scoring Protocol=Indiv...
2025.03
0.68
-
Swap
Scoring Protocol=Pairw...
2025.03
0.68
53.33
BERTScore
Scoring Protocol=Indiv...
2025.03
0.6533
-
CoT
Scoring Protocol=Pairw...
2025.03
0.6
28
Feedback
Search any
task
Search any
task