Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Decision Inference on AI2-THOR
Loading...
70.2
Success Rate
Ours
43.68
50.565
57.45
64.335
Feb 18, 2025
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
Ours
2025.02
70.2
o3-mini
2025.02
67.7
ReFT
Framework=Reasoning
2025.02
64.2
KTO
Framework=RLHF
2025.02
62.8
Skywork
Framework=RLAIF
2025.02
48.3
DeepSeek
2025.02
47.6
SFT
2025.02
44.7
Feedback
Search any
task
Search any
task