Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Conversational response generation on INSCIT
Loading...
33.2
F1
UniConv
12.192
17.646
23.1
28.554
Oct 15, 2025
F1
BERTScore
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1
BERTScore
UniConv
RAG=RAG, LLM=Mistral 7...
2025.10
33.2
-
ChatR1-3b
RAG=RAG, LLM=Qwen-3b,...
2025.10
33.2
85.5
ChatR1-7b
RAG=RAG, LLM=Qwen-7b,...
2025.10
32.8
85.5
ChatR1 (w/o Rint.)
RAG=RAG, LLM=Qwen-3b,...
2025.10
31.3
84.4
ChatRetriever +Mis.
RAG=RAG, LLM=Mistral 7...
2025.10
30.3
-
QR Search R1
RAG=RAG, LLM=Qwen-3b,...
2025.10
27.5
84
Claude (DI)
RAG=No, LLM=Claude, Tr...
2025.10
27
-
conv-ANCE +Mis.
RAG=RAG, LLM=Mistral 7...
2025.10
24.8
-
CoT R1
RAG=No, LLM=Qwen-3b, T...
2025.10
24.1
84
ChatGPT (DI)
RAG=No, LLM=GPT-3.5, T...
2025.10
22.8
81.1
IRCoT
RAG=RAG, LLM=Qwen-3b,...
2025.10
20.4
67.3
Qwen-Instr. (DI)
RAG=No, LLM=Qwen-3b, T...
2025.10
17.9
58.1
SFT
RAG=No, LLM=Qwen-3b, T...
2025.10
16.9
56.9
Qwen-Instr. (CoT)
RAG=No, LLM=Qwen-3b, T...
2025.10
16.4
62.4
Qwen-Instr. (RAG)
RAG=RAG, LLM=Qwen-3b,...
2025.10
13
49.3
Feedback
Search any
task
Search any
task