Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Interactive Agent Tasks on Mind2Web
Loading...
18.86
Task Success Rate
MNL
-0.7544
4.3378
9.43
14.5222
Dec 12, 2025
Task Success Rate
Success Accuracy
Memory Usage
Trajectory Length
Updated 4d ago
Evaluation Results
Method
Method
Links
Task Success Rate
Success Accuracy
Memory Usage
Trajectory Length
MNL
Base Model=DeepSeek-V3...
2025.12
18.86
67.55
12
395
ACE
Base Model=DeepSeek-V3...
2025.12
15.82
57.8
580
58,602
Vanilla model
Base Model=DeepSeek-V3...
2025.12
15.49
66.32
-
-
MNL
Base Model=Qwen3-8B
2025.12
2.02
15.64
695
556
Vanilla model
Base Model=Qwen3-8B
2025.12
1.35
11.54
-
-
Memento
Base Model=DeepSeek-V3...
2025.12
0.34
12.6
1,707
4,822
Memento
Base Model=Qwen3-8B
2025.12
0
0.18
1,707
4,749
ACE
Base Model=Qwen3-8B
2025.12
0
0
363
24,284
Feedback
Search any
task
Search any
task