Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Conversational response generation on MD2Dial
Loading...
31.2
F1 Score
ChatR1-7b
9.672
15.261
20.85
26.439
Oct 15, 2025
F1 Score
BERTScore
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
BERTScore
ChatR1-7b
RAG=RAG, LLM=Qwen-7b,...
2025.10
31.2
84.5
ChatR1 (w/o Rint.)
RAG=RAG, LLM=Qwen-3b,...
2025.10
26.4
77.4
ChatR1-3b
RAG=RAG, LLM=Qwen-3b,...
2025.10
26
83.1
SFT
RAG=No, LLM=Qwen-3b, T...
2025.10
25.4
84.2
QR Search R1
RAG=RAG, LLM=Qwen-3b,...
2025.10
23.1
82.1
ChatGPT (DI)
RAG=No, LLM=GPT-3.5, T...
2025.10
21.6
81.7
Qwen-Instr. (RAG)
RAG=RAG, LLM=Qwen-3b,...
2025.10
18.8
75.1
CoT R1
RAG=No, LLM=Qwen-3b, T...
2025.10
18
80.2
IRCoT
RAG=RAG, LLM=Qwen-3b,...
2025.10
13.3
67.5
Qwen-Instr. (DI)
RAG=No, LLM=Qwen-3b, T...
2025.10
13.2
64.4
Qwen-Instr. (CoT)
RAG=No, LLM=Qwen-3b, T...
2025.10
10.5
63.6
Feedback
Search any
task
Search any
task