Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NoisyToolBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool Learning under Instructions Beyond Tool CapabilitiesNoisyToolBench IBTC 1.0 (test)
A1 Score98
32
Tool Learning under Instruction with ErrorNoisyToolBench IwE 1.0 (test)
A1 Success Rate74
32
Tool Learning under Instruction with Multiple RequestsNoisyToolBench IMR 1.0 (test)
A1 Score90
32
Tool Learning under Instruction with Missing Key InformationNoisyToolBench IMKI 1.0 (test)
A1 Success Rate94
32
Tool-usingNoisyToolBench IBTC
Average Steps1
32
Tool-usingNoisyToolBench IwE
Average Steps1.3
32
Tool-usingNoisyToolBench IMR
Average Steps1.03
32
Tool-usingNoisyToolBench IMKI
Average Steps1.26
32
Showing 8 of 8 rows