Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ToolHop

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop tool useToolHop
Answer Correctness53.97
43
Multi-hop Tool-useToolHop unseen (test)
Accuracy43.1
4
Multi-hop Tool-useToolHop rand (test)
Accuracy31.1
4
Showing 3 of 3 rows