Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-Oriented Dialogue on STAR
Loading...
68
F1 Score
AnyTOD XXL
28.168
38.509
48.85
59.191
Jun 2, 2025
F1 Score
Accuracy
BLEU Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
Accuracy
BLEU Score
AnyTOD XXL
Interpretable=true, Gr...
2025.06
68
68
44.3
CoDial (4o, 5-mini:l)
Interpretable=true, Gr...
2025.06
59.2
60.2
46.5
CoDial (4o, 4o-mini)
Interpretable=true, Gr...
2025.06
58.5
60.1
45.2
SGP-TOD
Interpretable=false, G...
2025.06
53.5
53.2
-
SAM
Interpretable=false, G...
2025.06
51.2
49.8
-
CoDial (4o, 4o-mini) − RI
Interpretable=true, Gr...
2025.06
36.6
36.1
23
BERT + Schema
Interpretable=false, G...
2025.06
29.7
32.4
-
Feedback
Search any
task
Search any
task