Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Biomedical Temporal Reasoning on ChronoTQA 1.0 (120-question stratified subsample)
Loading...
100
Cross-disease Comparison
GPT-4o-mini
90.64
93.07
95.5
97.93
May 21, 2026
Cross-disease Comparison
Temporal Window Performance
Temporal Differential Dx
Phenopackets Onset (Free-text)
Static Drug (Control)
Static Gene (Control)
Temporal Mean (Per-Model)
Static Mean (Per-Model)
Gap (Static - Temporal) (pp)
Updated 12d ago
Evaluation Results
Method
Method
Links
Cross-disease Comparison
Temporal Window Performance
Temporal Differential Dx
Phenopackets Onset (Free-text)
Static Drug (Control)
Static Gene (Control)
Temporal Mean (Per-Model)
Static Mean (Per-Model)
Gap (Static - Temporal) (pp)
GPT-4o-mini
Access=Web chat, Retri...
2026.05
100
58
82
0
80
75
52.9
77.8
24.8
Gemini
Access=Web chat, Retri...
2026.05
100
83
75
12
100
100
64
100
36
Claude
Access=Web chat, Retri...
2026.05
100
75
55
0
80
75
51
77.8
26.8
DeepSeek V3
Access=Web chat, Retri...
2026.05
91
67
36
6
80
75
45.1
77.8
32.7
Feedback
Search any
task
Search any
task