Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Skill retrieval on ToolBench
Loading...
97
Precision@R
ShardMemo
66.84
74.67
82.5
90.33
Jan 29, 2026
Precision@R
StepRed
Updated 3mo ago
Evaluation Results
Method
Method
Links
Precision@R
StepRed
ShardMemo
Model=GPT-OSS-120B, R=...
2026.01
97
1.94
Embedding similarity
Model=GPT-OSS-120B, R=3
2026.01
88
1.81
Static skill library
Model=GPT-OSS-120B, R=3
2026.01
86
1.76
BM25 retrieval
Model=GPT-OSS-120B, R=3
2026.01
85
1.72
Recency
Model=GPT-OSS-120B, R=3
2026.01
79
1.45
Random
Model=GPT-OSS-120B, R=3
2026.01
71
1.58
Trace kNN
Model=GPT-OSS-120B, R=3
2026.01
68
1.67
Feedback
Search any
task
Search any
task