Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Medical Decision-Making on MMLUPH (test)

78.48Accuracy

MAC (Ours w/ GPTs)

12.762429.823746.88563.9463Jul 25, 2025
Updated 21d ago

Evaluation Results

MethodLinks
2025.07
78.4878.4778.7297.676.05
2025.07
77.3477.3477.5597.4774.77
2025.07
71.3571.4572.2596.7968.09
2025.07
71.3571.4572.2596.7968.09
2025.07
70.4670.471.1796.6967.07
2025.07
68.7768.7468.9996.6265.18
2025.07
67.9367.9569.4996.464.33
2025.07
67.8167.9168.4996.5264.14
2025.07
67.7767.868.0896.464.08
2025.07
67.6267.7668.3796.563.94
2025.07
66.0666.0266.396.2262.18
2025.07
65.3165.5766.9896.1261.38
2025.07
62.7163.2965.0195.9958.64
2025.07
626262.495.7557.62
2025.07
60.6861.1964.1195.6156.57
2025.07
60.1960.8663.3495.5655.99
2025.07
59.0559.160.3195.4354.39
2025.07
45.3648.7361.1393.9442.06
2025.07
44.8445.4448.8294.0338.82
2025.07
44.4246.1451.9393.8339.11
2025.07
38.9641.4553.0893.434.44
2025.07
32.4932.9144.0392.6827
2025.07
15.2913.1948.3290.6412.05