Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Conversation Evaluation on Grocery Domain
Loading...
75.78
Human Score (%)
GOOD
68.2608
70.2129
72.165
74.1171
Aug 20, 2025
Human Score (%)
LLM Score (%)
Updated 26d ago
Evaluation Results
Method
Method
Links
Human Score (%)
LLM Score (%)
GOOD
inference_type=prob inf
2025.08
75.78
79.44
GOOD
inference_type=prompt inf
2025.08
72.39
79.86
Full Context
2025.08
68.55
76.94
Feedback
Search any
task
Search any
task