Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning and Question Answering on BoolQ, RTE, HellaSWAG, ARC, OpenBookQA, and PiQA
Loading...
67.24
Avg Accuracy
Before finetune
34.324
42.8695
51.415
59.9605
Jun 12, 2024
Avg Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg Accuracy
Before finetune
Backbone=Mistral-7B-in...
2024.06
67.24
Target LLM
Backbone=Mistral-7B-in...
2024.06
66.93
ULD
Backbone=Mistral-7B-in...
2024.06
66.85
NPO+GD
Backbone=Mistral-7B-in...
2024.06
61.77
NPO+KL
Backbone=Mistral-7B-in...
2024.06
61.14
Offset-NPO+KL
Backbone=Mistral-7B-in...
2024.06
58.72
GA+GD
Backbone=Mistral-7B-in...
2024.06
58.34
Offset-DPO+KL
Backbone=Mistral-7B-in...
2024.06
56.59
DPO+KL
Backbone=Mistral-7B-in...
2024.06
56.34
GA+KL
Backbone=Mistral-7B-in...
2024.06
55.41
NPO
Backbone=Mistral-7B-in...
2024.06
54.73
DPO+GD
Backbone=Mistral-7B-in...
2024.06
53.91
Offset-GA+KL
Backbone=Mistral-7B-in...
2024.06
53.78
DPO
Backbone=Mistral-7B-in...
2024.06
48.12
GA
Backbone=Mistral-7B-in...
2024.06
35.59
Feedback
Search any
task
Search any
task