Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Citation URL Validity Analysis on DRBench
Loading...
5.4
Non-resolving Rate
gemini-2.5-flash-search
4.876
8.413
11.95
15.487
Apr 3, 2026
Non-resolving Rate
Hallucination Rate
Stale Link Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Non-resolving Rate
Hallucination Rate
Stale Link Rate
gemini-2.5-flash-search
Provider=Google, URLs=...
2026.04
5.4
4.6
0.8
gpt-4.1
Provider=OpenAI, URLs=336
2026.04
5.4
5.4
0
gemini-2.5-pro-search
Provider=Google, URLs=...
2026.04
5.9
4.8
1.1
gpt-4.1-mini
Provider=OpenAI, URLs=296
2026.04
7.4
7.4
0
claude-3-5-sonnet-search
Provider=Anthropic, UR...
2026.04
7.8
3
4.8
claude-3-7-sonnet-search
Provider=Anthropic, UR...
2026.04
8.5
3.2
5.2
gpt-4o-mini-search-prev.
Provider=OpenAI, URLs=402
2026.04
8.7
8.7
0
gpt-4o-search-preview
Provider=OpenAI, URLs=387
2026.04
8.8
8.8
0
openai-deepresearch
Provider=OpenAI, URLs=...
2026.04
10.1
3.5
6.6
gemini-2.5-pro-deepres.
Provider=Google, URLs=...
2026.04
18.5
13.3
5.2
Feedback
Search any
task
Search any
task