Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool selection on MetaTool similar choices subtask (test)
Loading...
83.4
Accuracy
OATS-S1
41.072
52.061
63.05
74.039
Mar 13, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
OATS-S1
Latency=3.5 ms, Hardwa...
2026.03
83.4
Vicuna-7b
Latency=∼2–5 s, Hardwa...
2026.03
73.5
ChatGPT (GPT-3.5)
Latency=∼1–3 s, Hardwa...
2026.03
69.1
Static Embedding (SE)
Latency=3.7 ms, Hardwa...
2026.03
66.4
Vicuna-13b
Latency=∼3–8 s, Hardwa...
2026.03
58.2
Average (9 LLMs)
Hardware=GPU
2026.03
57
LLaMA2-13b
Latency=∼2–5 s, Hardwa...
2026.03
44.1
BM25
Latency=0.4 ms, Hardwa...
2026.03
42.7
Feedback
Search any
task
Search any
task