Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Agent Tasks on Mind2Web
Loading...
18.86
Task Success Rate
MNL
-0.7544
4.3378
9.43
14.5222
Dec 12, 2025
Task Success Rate
Success Accuracy
Memory Usage
Trajectory Length
Updated 1mo ago
Evaluation Results
Method
Method
Links
Task Success Rate
Success Accuracy
Memory Usage
Trajectory Length
MNL
Base Model=DeepSeek-V3...
2025.12
18.86
67.55
12
395
ACE
Base Model=DeepSeek-V3...
2025.12
15.82
57.8
580
58,602
Vanilla model
Base Model=DeepSeek-V3...
2025.12
15.49
66.32
-
-
MNL
Base Model=Qwen3-8B
2025.12
2.02
15.64
695
556
Vanilla model
Base Model=Qwen3-8B
2025.12
1.35
11.54
-
-
Memento
Base Model=DeepSeek-V3...
2025.12
0.34
12.6
1,707
4,822
Memento
Base Model=Qwen3-8B
2025.12
0
0.18
1,707
4,749
ACE
Base Model=Qwen3-8B
2025.12
0
0
363
24,284
Feedback
Search any
task
Search any
task