Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Symbolic Reasoning on Symbolic Equal
Loading...
42.7
Acc (Clean, Avg)
CD-CoT
31.364
34.307
37.25
40.193
Oct 31, 2024
Acc (Clean, Avg)
Acc (Irrelevant, Easy)
Acc (Irrelevant, Medium)
Acc (Irrelevant, Hard)
Acc (Irrelevant, Avg)
Acc (Inaccurate, Easy)
Acc (Inaccurate, Medium)
Acc (Inaccurate, Hard)
Acc (Inaccurate, Avg)
Updated 4d ago
Evaluation Results
Method
Method
Links
Acc (Clean, Avg)
Acc (Irrelevant, Easy)
Acc (Irrelevant, Medium)
Acc (Irrelevant, Hard)
Acc (Irrelevant, Avg)
Acc (Inaccurate, Easy)
Acc (Inaccurate, Medium)
Acc (Inaccurate, Hard)
Acc (Inaccurate, Avg)
CD-CoT
Additional Information...
2024.10
42.7
44.7
42.7
44
43.8
42.6
41.3
42.7
42.2
SCO
Additional Information...
2024.10
38.5
34.9
33.4
32.7
33.7
34
34.1
34.5
34.2
CC
Additional Information...
2024.10
37.8
33.8
32.7
32
32.8
31.3
33
29.9
31.4
Base
2024.10
32.7
28.1
25.1
23
25.4
29.1
26.1
22.7
26
BT
Additional Information...
2024.10
31.8
26
22.7
22.6
23.8
26.3
22.7
22.9
24
Feedback
Search any
task
Search any
task