Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Symbolic Reasoning on Coin Flip (Accuracy, Average Accuracy)
Loading...
80.75
Accuracy
RM-Primed
58.13
64.0025
69.875
75.7475
Mar 20, 2026
Accuracy
Average Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Accuracy
RM-Primed
Model=Llama3-8B
2026.03
80.75
68.23
RM-Primed
Model=GPT-3.5
2026.03
79.75
66.01
RM-Primed (R+)
Model=Llama3-8B, R+ se...
2026.03
78.75
65.94
RM-Primed (R+)
Model=GPT-3.5, R+ sele...
2026.03
78
64.99
Few-shot CoT
Model=GPT-3.5
2026.03
74.25
61.15
Contrastive CoT
Model=GPT-3.5, Reflect...
2026.03
69.25
60.33
Contrastive CoT
Model=Llama3-8B, Refle...
2026.03
66.25
61.13
Few-shot CoT
Model=Llama3-8B
2026.03
65.75
64.19
Contrastive CoT
Model=Llama3-8B, Refle...
2026.03
60.75
59.56
Contrastive CoT
Model=GPT-3.5, Reflect...
2026.03
59
57.01
Feedback
Search any
task
Search any
task