Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Health-related dialogue and decision-making on HealthBench Main

46.38Average Score

GPT-5

0.547212.446124.34536.2439Oct 5, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
46.3863.2234.857.1237.1240.1637.8454.2644.6862.561.3153.3250.02
2025.10
36.2954.4429.1747.1624.7433.7126.3934.2537.8464.1554.3949.2440.93
2025.10
33.1620.168.922.1118.417.3128.2424.4533.9558.0152.446.840.03
2025.10
33.0312.521.7924.2519.4618.2920.816.0537.9561.3548.5545.6234.84
2025.10
32.6151.2121.3539.0522.9232.7825.2232.6134.4359.9947.6946.3538.8
2025.10
31.6947.2425.338.4921.5831.7924.1530.5535.9659.1951.7545.3136.23
2025.10
31.1853.9823.5137.0322.3629.421.7945.9334.7860.6554.3244.8134.84
2025.10
26.3838.5923.2531.4219.4218.1316.4445.2332.650.6241.2245.4927.61
2025.10
26.0441.4924.1735.6413.4720.8615.5933.7330.6156.0851.4541.3626.95
2025.10
25.6951.5421.8728.616.5223.4218.3838.2328.7858.3744.5941.8131.65
25.1345.4216.527.9815.2625.3416.4230.6928.5749.3543.514327.24
2025.10
22.1946.5419.5226.1213.7819.3118.2826.7225.557.444.6840.2627.4
2025.10
21.2136.5818.2123.9712.0119.7512.8729.1526.3554.9349.9837.9723.16
2025.10
19.3544.659.1214.6614.0520.4513.0933.6922.4546.9835.8435.8121.56
2025.10
18.6534.212.4622.879.6115.959.7630.3424.3648.0738.3437.2920.37
2025.10
16.2534.1518.0317.738.1415.1210.9126.5421.1154.2441.5733.8117.05
2025.10
15.7733.4814.9717.587.6915.298.2230.0120.350.0538.5431.7318.55
2025.10
14.9728.418.5914.087.4314.288.8632.4820.6745.6842.4531.0917.79
2025.10
8.1718.36.46.344.817.832.516.4612.9625.7224.323.7710.67
2025.10
6.2512.53.025.623.147.13.2811.9411.819.6523.9423.138.63
2025.10
5.9310.042.076.123.326.083.0213.7511.5423.3717.8322.49.31
2025.10
2.310.480.471.640.853.980.5812.597.089.4720.6615.635.19