Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge-Driven Structured EHR Understanding and Reasoning on Synthea
Loading...
58.7
K-R1 AUC
Gemini 2.5
52.564
54.157
55.75
57.343
Nov 11, 2025
K-R1 AUC
K-U1 AUC
K-R2 AUC
K-R3 AUC
Updated 17d ago
Evaluation Results
Method
Method
Links
K-R1 AUC
K-U1 AUC
K-R2 AUC
K-R3 AUC
Gemini 2.5
Model Type=General LLM...
2025.11
58.7
-
54.1
-
Qwen-32B
Model Type=General LLM...
2025.11
58.3
-
51
-
GPT-3.5 Turbo
Model Type=General LLM...
2025.11
58.1
-
55.4
52.9
Gemini-2.0
Model Type=General LLM...
2025.11
57.7
52
56.2
51.6
GPT-4o
Model Type=General LLM...
2025.11
55.6
55
53.2
51
Gemini 1.5
Model Type=General LLM...
2025.11
55.6
-
-
-
DeepSeek-V3
Model Type=General LLM...
2025.11
52.8
-
-
-
DeepSeek-V2.5
Model Type=General LLM...
2025.11
-
51
-
-
Qwen-72B
Model Type=General LLM...
2025.11
-
-
-
52.2
Feedback
Search any
task
Search any
task