Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Future Action Prediction on LongNAP Unseen Users (test)
Loading...
0.16
User Metric MU
Zero-shot
0.156
0.183
0.21
0.237
Mar 6, 2026
User Metric MU
User Metric 2
User Metric 6
User Metric 12
User Metric 14
User Metric 16
Updated 1mo ago
Evaluation Results
Method
Method
Links
User Metric MU
User Metric 2
User Metric 6
User Metric 12
User Metric 14
User Metric 16
Zero-shot
Model=Qwen VL
2026.03
0.16
0.22
0.13
0.14
0.16
0.14
Few-shot RAG
Model=Qwen VL
2026.03
0.17
0.23
0.15
0.18
0.14
0.18
SFT
Model=Qwen VL
2026.03
0.17
0.22
0.14
0.16
0.15
0.17
Zero-shot
Model=Gemini
2026.03
0.22
0.27
0.22
0.2
0.2
0.21
Few-shot RAG
Model=Gemini
2026.03
0.23
0.26
0.23
0.22
0.21
0.22
LongNAP
Model=Qwen VL
2026.03
0.26
0.31
0.28
0.22
0.21
0.26
Feedback
Search any
task
Search any
task