Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

High-level Diagnostic Reasoning on EndoAgentBench

76.03CAP (CAR)

GPT-4o

0.286819.950939.61559.2791Aug 10, 2025
Updated 15d ago

Evaluation Results

MethodLinks
2025.08
76.0350.1963.1
2025.08
6478.2471.13
2025.08
61.1854.6957.93
2025.08
53.2935.8344.55
2025.08
52.9120.5436.71
2025.08
47.0925.5236.29
2025.08
41.0720.5430.8
2025.08
3.22.252.72