Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language-driven scene representation on ALFRED Template Shift [TS]
Loading...
84.9
F1 Score
Falcon-7B
72.8152
75.9526
79.09
82.2274
May 7, 2026
F1 Score
iRecall
GED
Updated 26d ago
Evaluation Results
Method
Method
Links
F1 Score
iRecall
GED
Falcon-7B
Approach=CLM
2026.05
84.9
65.07
2.52
Mistral-7B
Approach=CLM
2026.05
83.97
62.93
2.75
LLaMA 3.1-8B
Approach=CLM
2026.05
81.87
59.65
4.08
Vicuna-7B
Approach=IM
2026.05
81.37
53.39
4
Alpaca
Approach=IM
2026.05
80.01
46.92
4.29
Flan-T5
Approach=Seq2Seq
2026.05
79.57
46.32
3.54
T5
Approach=Seq2Seq
2026.05
73.28
35.16
4.45
Feedback
Search any
task
Search any
task