Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

English scenario

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination MitigationEnglish Scenario Aggregated
Hallucination Rate0.7
8
Automatic Speech RecognitionEnglish Scenario Aggregate
Hallucination Rate0.7
8
Mobile device operationEnglish scenario Multi-app Advanced Instruction
Completion Rate (CR)100
3
Mobile device operationEnglish scenario External app Advanced Instruction
Completion Rate (CR)97.1
3
Mobile device operationEnglish scenario External app Basic Instruction
Completion Rate (CR)100
3
Mobile device operationEnglish scenario System app Advanced Instruction
Completion Rate (CR)85.3
3
Mobile device operationEnglish scenario System app Basic Instruction
Completion Rate (CR)100
3
Mobile device operationEnglish scenario Multi-app Basic Instruction
Completion Rate (CR)100
2
Showing 8 of 8 rows