Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on GSM8k (Usage/Attack Accuracy)

55.95Attack Accuracy

No-shield

-1.998813.045628.0943.1344Oct 16, 2024
Updated 1mo ago

Evaluation Results

MethodLinks
2024.10
55.95--
2024.10
55.07--
2024.10
54.91--
2024.10
54.55--
2024.10
53.83--
2024.10
53.67--
2024.10
53.07--
2024.10
51.31--
2024.10
49.75--
2024.10
47.79--
2024.10
40.94--
2024.10
40.5--
2024.10
37.07--
2024.10
35.18--
2024.10
32.67--
2024.10
21.53--
2024.10
20.92--
2024.10
16.81--
2024.10
14.96--
2024.10
12.5--
2024.10
10.81--
2024.10
6.22033.11
2024.10
6.09--
2024.10
5.61--
2024.10
4.58--
2024.10
4.56--
2024.10
4.15--
2024.10
4.05--
2024.10
3.91030.1
2024.10
2.84--
2024.10
2.41015.51
2024.10
2.36--
2024.10
1.74--
2024.10
1.43--
2024.10
1.34--
2024.10
1.29--
2024.10
1.04037.13
2024.10
0.93--
2024.10
0.43--
2024.10
0.23--