Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Reasoning on ScienceWorld 30 tasks
Loading...
84.7
Score
SwiftSage
34.468
47.509
60.55
73.591
Mar 18, 2026
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
SwiftSage
Backbone=SWIFT + SAGE...
2026.03
84.7
CLIN
Backbone=GPT-4, Regime...
2026.03
59.8
Retrospex
Backbone=Flan-T5-large...
2026.03
56
Retrospex: IL-T5
Backbone=Flan-T5-large...
2026.03
48.8
SwiftSage: Reflexion
Backbone=GPT-4, Regime...
2026.03
45.3
SwiftSage: ReAct
Backbone=GPT-4, Regime...
2026.03
36.4
Feedback
Search any
task
Search any
task