Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TEXTCRAFT-SYNTH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Task ExecutionTEXTCRAFT-SYNTH Hard (eval)
Success Rate88
2
Task ExecutionTEXTCRAFT-SYNTH Medium (eval)
Success Rate98
2
Task ExecutionTEXTCRAFT-SYNTH All (eval)
Success Rate96
2
Task ExecutionTEXTCRAFT-SYNTH 8K context Hard (evaluation set)
Success Rate88
2
Task ExecutionTEXTCRAFT-SYNTH 8K context Medium (evaluation set)
SR96
2
Task ExecutionTEXTCRAFT-SYNTH 8K context All (test)
Success Rate95
2
Showing 6 of 6 rows