Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Step success rate on AndroidControl (IDD)
Loading...
69.4
Step Success Rate (SSR)
T3A* (Qwen2.5-7B-Instruct-ASL)
24.368
36.059
47.75
59.441
Jun 21, 2025
Step Success Rate (SSR)
Updated 11d ago
Evaluation Results
Method
Method
Links
Step Success Rate (SSR)
T3A* (Qwen2.5-7B-Instruct-ASL)
Type=Fine-Tuned, Agent...
2025.06
69.4
T3A* (Qwen2.5-7B-Instruct-SFT)
Type=Fine-Tuned, Agent...
2025.06
67.4
M3A (GPT-4o)
Type=Prompt-Driven, Ag...
2025.06
60.8
T3A (GPT-4o)
Type=Prompt-Driven, Ag...
2025.06
56.1
M3A (Gemini-2.5-Flash)
Type=Prompt-Driven, Ag...
2025.06
49.4
T3A (Gemini-2.5-Flash)
Type=Prompt-Driven, Ag...
2025.06
49.1
T3A* (Gemini-2.5-Flash)
Type=Prompt-Driven, Ag...
2025.06
46.8
SeeAct (Gemini-2.5-Flash)
Type=Prompt-Driven, Ag...
2025.06
36.7
SeeAct (GPT-4o)
Type=Prompt-Driven, Ag...
2025.06
31.5
T3A* (Qwen2.5-7B-Instruct)
Type=Prompt-Driven, Ag...
2025.06
26.1
Feedback
Search any
task
Search any
task