Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Conversational Response Generation on QReCC
Loading...
31
F1 Score
ChatR1-7b
12.072
16.986
21.9
26.814
Oct 15, 2025
F1 Score
BERTScore
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
BERTScore
ChatR1-7b
RAG=RAG, LLM=Qwen-7b,...
2025.10
31
80.7
ChatR1-3b
RAG=RAG, LLM=Qwen-3b,...
2025.10
28
79.2
ChatR1 (w/o Rint.)
RAG=RAG, LLM=Qwen-3b,...
2025.10
27
78.5
ChatRetriever +Mis.
RAG=RAG, LLM=Mistral 7...
2025.10
26.3
-
UniConv
RAG=RAG, LLM=Mistral 7...
2025.10
26.2
-
conv-ANCE +Mis.
RAG=RAG, LLM=Mistral 7...
2025.10
25.9
-
Claude (DI)
RAG=No, LLM=Claude, Tr...
2025.10
25
-
SFT
RAG=No, LLM=Qwen-3b, T...
2025.10
23.3
80
ChatGPT (DI)
RAG=No, LLM=GPT-3.5, T...
2025.10
22.6
75.6
QR Search R1
RAG=RAG, LLM=Qwen-3b,...
2025.10
20.4
79.6
CoT R1
RAG=No, LLM=Qwen-3b, T...
2025.10
17.7
72.6
Qwen-Instr. (RAG)
RAG=RAG, LLM=Qwen-3b,...
2025.10
15.5
64.5
Qwen-Instr. (DI)
RAG=No, LLM=Qwen-3b, T...
2025.10
13.3
55.3
IRCoT
RAG=RAG, LLM=Qwen-3b,...
2025.10
13.1
55.6
Qwen-Instr. (CoT)
RAG=No, LLM=Qwen-3b, T...
2025.10
12.8
58.5
Feedback
Search any
task
Search any
task