Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Incorrect Reasoning Path Detection on DeepScaleR

64.24Accuracy

TokUR (AU)

9.01623.35337.6952.027May 16, 2025
Updated 5d ago

Evaluation Results

MethodLinks
2025.05
64.2476.3971.73
2025.05
63.8976.0571.43
2025.05
61.4773.5568.91
2025.05
56.666.3158.18
2025.05
55.8364.9156.71
2025.05
54.6177.5267.3
2025.05
53.5276.2665.47
2025.05
52.6460.5550.61
2025.05
52.4360.2350.27
2025.05
5274.2262.55
2025.05
51.9373.8560.75
2025.05
51.3573.3361.08
2025.05
48.0966.5746.44
2025.05
47.3765.6645.43
2025.05
44.93--
2025.05
43.9185.3365.25
2025.05
43.8985.3165.2
2025.05
43.8984.9265.57
2025.05
39.0376.7256.15
2025.05
37.5173.0142.89
2025.05
37.4873.0548.76
2025.05
35.87--
2025.05
35.5567.6835.18
2025.05
34.1367.0533.83
2025.05
29.7559.1432.64
2025.05
28.6555.926.8
2025.05
28.3755.4226.46
2025.05
26.8552.8224.48
2025.05
25.7183.5547.56
2025.05
25.7183.5247.48
2025.05
25.5282.8746.76
2025.05
25.4850.1625.08
2025.05
24.86--
2025.05
2271.6529.99
2025.05
21.7671.9333.81
2025.05
17.5259.5817.48
2025.05
17.3356.0914.74
2025.05
17.355.7614.55
2025.05
16.8353.8413.93
2025.05
16.556.8818.04
2025.05
16.354.7315.5
2025.05
16.2333.6418.06
2025.05
14.25--
2025.05
14.2348.6813.77
2025.05
12.5846.312.94
2025.05
11.1443.1412.34