Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Medical Reasoning on MedCaseReasoning (test)

72.5Accuracy

GPT-5-mini

27.2639.00550.7562.495Nov 18, 2025
Updated 16d ago

Evaluation Results

MethodLinks
2025.11
72.5---
2025.11
7273.70.1920.12
2025.11
70.270.20.1950.084
2025.11
69.172.30.1960.103
2025.11
66.562.90.2840.263
2025.11
65.566.30.2110.03
2025.11
64.4---
2025.11
62.365.40.2480.165
2025.11
57.567.30.3080.283
2025.11
56.8---
2025.11
55.168.40.2270.071
2025.11
54.364.40.2920.242
2025.11
44.958.30.4720.48
2025.11
44.358.70.5060.512
2025.11
42.967.30.2890.252
2025.11
40.4---
2025.11
40.456.70.3810.338
2025.11
40.459.50.5610.563
2025.11
31.7---
2025.11
31.564.70.5850.612
2025.11
31.563.90.5090.546
2025.11
31.164.80.330.34
2025.11
30.965.90.4260.471
2025.11
30.162.30.5270.568
2025.11
29.8---
2025.11
29.849.20.5190.513
2025.11
29.865.10.6610.665
2966.20.3560.394