Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Dialogue Agent Interaction on DEMO Collaboration set
Loading...
8.65
Goal Achievement Score
Llama3.1-8B-Instruct w/ AMPO
7.5268
7.8184
8.11
8.4016
May 4, 2025
Goal Achievement Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Goal Achievement Score
Llama3.1-8B-Instruct w/ AMPO
Interaction Partner=GP...
2025.05
8.65
Llama3.1-8B-Instruct
Interaction Partner=GP...
2025.05
8.56
Llama3.1-8B-Instruct w/ GRPO
Interaction Partner=GP...
2025.05
8.35
Llama3.1-8B-Instruct w/ AMPO
Interaction Partner=Cl...
2025.05
8.12
Llama3.1-8B-Instruct w/ GRPO
Interaction Partner=Cl...
2025.05
8.09
Qwen2.5-7B-Instruct w/ AMPO
Interaction Partner=GP...
2025.05
7.95
Qwen2.5-7B-Instruct w/ GRPO
Interaction Partner=GP...
2025.05
7.9
Llama3.1-8B-Instruct
Interaction Partner=Cl...
2025.05
7.89
Qwen2.5-7B-Instruct w/ AMPO
Interaction Partner=Cl...
2025.05
7.82
Qwen2.5-7B-Instruct w/ GRPO
Interaction Partner=Cl...
2025.05
7.81
Qwen2.5-7B-Instruct
Interaction Partner=GP...
2025.05
7.73
Qwen2.5-7B-Instruct
Interaction Partner=Cl...
2025.05
7.57
Feedback
Search any
task
Search any
task