Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language-driven scene representation on ALFRED Object Shift [OS]
Loading...
83.92
F1 Score
Falcon-7B
67.54
71.7925
76.045
80.2975
May 7, 2026
F1 Score
iRecall
GED
Updated 26d ago
Evaluation Results
Method
Method
Links
F1 Score
iRecall
GED
Falcon-7B
Approach=CLM
2026.05
83.92
66.38
2.34
Mistral-7B
Approach=CLM
2026.05
82.72
63.46
2.75
LLaMA 3.1-8B
Approach=CLM
2026.05
80.44
58.86
4.27
Vicuna-7B
Approach=IM
2026.05
80.03
51.67
4.4
Alpaca
Approach=IM
2026.05
77.27
43.98
4.64
Flan-T5
Approach=Seq2Seq
2026.05
76.67
47.43
3.48
T5
Approach=Seq2Seq
2026.05
68.17
36.99
4.51
Feedback
Search any
task
Search any
task