Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Summarization on TL;DR (Completeness, Groundedness, Relevance)
Loading...
43
Completeness
Gemini 2.5 Pro
-0.68
10.66
22
33.34
Dec 1, 2025
Completeness
Groundedness
Relevance
Updated 4d ago
Evaluation Results
Method
Method
Links
Completeness
Groundedness
Relevance
Gemini 2.5 Pro
Model Variant=Gemini 2...
2025.12
43
42
-
Gemini 2.5 Flash
Model Variant=Gemini 2...
2025.12
40
40
-
GPT-OSS 120B
Model Variant=GPT-OSS...
2025.12
40
39
-
Gemini 2.0 Flash
Model Variant=Gemini 2...
2025.12
39
41
-
Jury-on-Demand
Jury Configuration=Jur...
2025.12
38
43
-
Claude 3.7
Model Variant=Claude 3.7
2025.12
37
39
-
GPT-OSS 20B
Model Variant=GPT-OSS 20B
2025.12
34
29
-
Gemma 3
Model Variant=Gemma 3
2025.12
14
42
-
LLAMA 3.2
Model Variant=LLAMA 3.2
2025.12
9
10
-
DeepSeek R1
Model Variant=DeepSeek R1
2025.12
5
13
-
Phi 4
Model Variant=Phi 4
2025.12
1
11
-
Feedback
Search any
task
Search any
task