Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Answer Generation on CORAL
Loading...
28.9
F1
Ours
2.796
9.573
16.35
23.127
Jan 19, 2026
F1
LLM Judgement Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1
LLM Judgement Score
Ours
LLM=Qwen-2.5-7b
2026.01
28.9
46.8
Claude
LLM=Sonnet-3.5
2026.01
27.4
-
ChatGPT
LLM=GPT-3.5
2026.01
26.8
-
EvoRAG
LLM=Qwen-2.5-7b
2026.01
25.1
-
UniConv
LLM=Mistral-2-7b
2026.01
24.3
-
AgenticLM
LLM=Qwen-3-8b
2026.01
24.1
44.7
Ours
LLM=Qwen-2.5-3b
2026.01
22.4
43.2
AgenticLM
LLM=Qwen-3-4b
2026.01
22.1
42.6
ChatQA
LLM=LLaMA-3-8b
2026.01
20.3
-
SFT
LLM=Qwen-2.5-7b
2026.01
18.8
43
SFT
LLM=Qwen-2.5-3b
2026.01
15.2
42.3
Search-R1
LLM=Qwen-2.5-3b
2026.01
3.9
41.2
Search-R1
LLM=Qwen-2.5-7b
2026.01
3.8
43
Feedback
Search any
task
Search any
task