Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EN

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question Answeringen multifield
F1 Score44.29
21
Dialect RobustnessEN
Success Rate57
11
Graph parsingen
LF Score94.15
7
Spoken Dialogue Generationen short (test)
WER2.79
3
Spoken Dialogue Generationen (test)
cpSIM43.7
3
Text-to-SpeechEN
WER3.1
3
Function InvocationEN Ver. (Dual)
Token Usage1,300.7
3
Function InvocationEN Ver. (Single)
Invocation Accuracy0.9
3
Showing 8 of 8 rows