Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Action Prediction on Human Evaluation User Actions Dataset (test)
Loading...
79
Win Rate
LongNAP
27.52
40.885
54.25
67.615
Mar 6, 2026
Win Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate
LongNAP
Model=Qwen 2.5 VL
2026.03
79
Few-shot RAG
Model=Gemini
2026.03
58.5
Zero-shot
Model=Gemini
2026.03
55
Few-shot RAG
Model=Qwen 2.5 VL
2026.03
45
Zero-shot
Model=Qwen 2.5 VL
2026.03
33
SFT
Model=Qwen 2.5 VL
2026.03
29.5
Feedback
Search any
task
Search any
task