Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Judgment Consistency on CoinFlip (unseen)
Loading...
51.8
Baseline Score
UNWAVERING-FQ (+ SFT + DPO)
48.472
49.336
50.2
51.064
Oct 3, 2023
Baseline Score
Metric M Value
Metric M Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Baseline Score
Metric M Value
Metric M Rate
UNWAVERING-FQ (+ SFT + DPO)
Type=O
2023.10
51.8
18.2
35.14
Vicuna-7B + SFT
Type=L
2023.10
51.4
18
35.02
UNWAVERING-FQ (+ SFT + DPO)
Type=L
2023.10
50.8
27.2
53.54
Vicuna-7B + SFT
Type=C
2023.10
50.6
2.8
5.53
Vicuna-7B + SFT
Type=O
2023.10
50.6
37.2
73.52
UNWAVERING-FQ (+ SFT + DPO)
Type=C
2023.10
50.4
0.2
0.4
Vicuna-7B
Type=C
2023.10
50.2
0
0
Vicuna-7B
Type=O
2023.10
49
49
100
Vicuna-7B
Type=L
2023.10
48.6
17
34.98
Feedback
Search any
task
Search any
task