Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Recommendation on Instruments (test)
Loading...
34.7
Hit@1
CoARS
0.2552
9.1976
18.14
27.0824
Apr 11, 2026
Hit@1
Updated 5d ago
Evaluation Results
Method
Method
Links
Hit@1
CoARS
Backbone=Qwen3-8B
2026.04
34.7
CoARS
Backbone=Qwen3-4B
2026.04
27.12
RecoWorld
Backbone=Qwen3-8B
2026.04
22.22
RecoWorld
Backbone=Qwen3-4B
2026.04
18.54
iAgent
Backbone=GPT-5.4-mini
2026.04
15.3
iAgent
Backbone=Qwen3-8B
2026.04
12.54
iAgent
Backbone=Qwen3-4B
2026.04
8.9
AFL
Backbone=GPT-5.4-mini
2026.04
8.11
AFL
Backbone=Qwen3-8B
2026.04
6.24
Reflexion
Backbone=GPT-5.4-mini
2026.04
5.48
AFL
Backbone=Qwen3-4B
2026.04
5.15
Reflexion
Backbone=Qwen3-8B
2026.04
2.65
Reflexion
Backbone=Qwen3-4B
2026.04
1.58
Feedback
Search any
task
Search any
task