Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Dialogue Agent Interaction on DEMO Average
Loading...
8.14
Goal Achievement Score
Llama3.1-8B-Instruct w/ AMPO
7.5888
7.7319
7.875
8.0181
May 4, 2025
Goal Achievement Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Goal Achievement Score
Llama3.1-8B-Instruct w/ AMPO
Interaction Partner=GP...
2025.05
8.14
Llama3.1-8B-Instruct w/ GRPO
Interaction Partner=GP...
2025.05
8.06
Llama3.1-8B-Instruct w/ AMPO
Interaction Partner=Cl...
2025.05
8.05
Llama3.1-8B-Instruct
Interaction Partner=GP...
2025.05
7.92
Llama3.1-8B-Instruct w/ GRPO
Interaction Partner=Cl...
2025.05
7.89
Qwen2.5-7B-Instruct w/ AMPO
Interaction Partner=Cl...
2025.05
7.86
Qwen2.5-7B-Instruct w/ GRPO
Interaction Partner=Cl...
2025.05
7.83
Qwen2.5-7B-Instruct w/ AMPO
Interaction Partner=GP...
2025.05
7.81
Qwen2.5-7B-Instruct w/ GRPO
Interaction Partner=GP...
2025.05
7.72
Qwen2.5-7B-Instruct
Interaction Partner=GP...
2025.05
7.65
Llama3.1-8B-Instruct
Interaction Partner=Cl...
2025.05
7.63
Qwen2.5-7B-Instruct
Interaction Partner=Cl...
2025.05
7.61
Feedback
Search any
task
Search any
task