Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (PPL)
Loading...
24.8
Perplexity (PPL)
ShadowCoT
24.108
28.779
33.45
38.121
Apr 8, 2025
Perplexity (PPL)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Perplexity (PPL)
ShadowCoT
Target Model=Mistral-7B
2025.04
24.8
DarkMind
Target Model=Mistral-7B
2025.04
34.2
BadChain
Target Model=Mistral-7B
2025.04
42.1
Feedback
Search any
task
Search any
task