Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Dialogue Agent Interaction on DEMO Non-Collaboration set
Loading...
8.03
Goal Achievement Score
Llama3.1-8B-Instruct w/ AMPO
7.5204
7.6527
7.785
7.9173
May 4, 2025
Goal Achievement Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Goal Achievement Score
Llama3.1-8B-Instruct w/ AMPO
Interaction Partner=Cl...
2025.05
8.03
Llama3.1-8B-Instruct w/ GRPO
Interaction Partner=GP...
2025.05
7.96
Llama3.1-8B-Instruct w/ AMPO
Interaction Partner=GP...
2025.05
7.95
Qwen2.5-7B-Instruct w/ AMPO
Interaction Partner=Cl...
2025.05
7.87
Qwen2.5-7B-Instruct w/ GRPO
Interaction Partner=Cl...
2025.05
7.84
Llama3.1-8B-Instruct w/ GRPO
Interaction Partner=Cl...
2025.05
7.81
Qwen2.5-7B-Instruct w/ AMPO
Interaction Partner=GP...
2025.05
7.77
Qwen2.5-7B-Instruct w/ GRPO
Interaction Partner=GP...
2025.05
7.7
Llama3.1-8B-Instruct
Interaction Partner=GP...
2025.05
7.68
Qwen2.5-7B-Instruct
Interaction Partner=GP...
2025.05
7.62
Qwen2.5-7B-Instruct
Interaction Partner=Cl...
2025.05
7.62
Llama3.1-8B-Instruct
Interaction Partner=Cl...
2025.05
7.54
Feedback
Search any
task
Search any
task