Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Conversation Evaluation on Robot Domain
Loading...
88.57
Human Score
GOOD
65.6588
71.6069
77.555
83.5031
Aug 20, 2025
Human Score
LLM Score
Updated 26d ago
Evaluation Results
Method
Method
Links
Human Score
LLM Score
GOOD
inference_type=prompt inf
2025.08
88.57
86.9
GOOD
inference_type=prob inf
2025.08
87.22
89.1
Full Context
2025.08
66.54
84.63
Feedback
Search any
task
Search any
task