Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Step success rate on AndroidControl (Cat-Unseen)
Loading...
61.2
Step Success Rate (SSR)
T3A* (Qwen2.5-7B-Instruct-ASL)
25.32
34.635
43.95
53.265
Jun 21, 2025
Step Success Rate (SSR)
Updated 11d ago
Evaluation Results
Method
Method
Links
Step Success Rate (SSR)
T3A* (Qwen2.5-7B-Instruct-ASL)
Type=Fine-Tuned, Agent...
2025.06
61.2
M3A (GPT-4o)
Type=Prompt-Driven, Ag...
2025.06
60.8
T3A* (Qwen2.5-7B-Instruct-SFT)
Type=Fine-Tuned, Agent...
2025.06
59.7
T3A (GPT-4o)
Type=Prompt-Driven, Ag...
2025.06
56.5
M3A (Gemini-2.5-Flash)
Type=Prompt-Driven, Ag...
2025.06
47.9
T3A* (Gemini-2.5-Flash)
Type=Prompt-Driven, Ag...
2025.06
44.7
T3A (Gemini-2.5-Flash)
Type=Prompt-Driven, Ag...
2025.06
44.4
SeeAct (Gemini-2.5-Flash)
Type=Prompt-Driven, Ag...
2025.06
33.7
SeeAct (GPT-4o)
Type=Prompt-Driven, Ag...
2025.06
30.6
T3A* (Qwen2.5-7B-Instruct)
Type=Prompt-Driven, Ag...
2025.06
26.7
Feedback
Search any
task
Search any
task