Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language-driven scene representation on ALFRED In-Distribution [ID]
Loading...
84.28
F1 Score
Mistral-7B
70.5832
74.1391
77.695
81.2509
May 7, 2026
F1 Score
Instance Recall
GED
Updated 26d ago
Evaluation Results
Method
Method
Links
F1 Score
Instance Recall
GED
Mistral-7B
Approach=CLM
2026.05
84.28
68.58
2.58
Falcon-7B
Approach=CLM
2026.05
84.2
64.93
2.51
LLaMA 3.1-8B
Approach=CLM
2026.05
82.16
64.46
3.88
Vicuna-7B
Approach=IM
2026.05
81.3
54.47
4.09
Alpaca
Approach=IM
2026.05
80.59
45.36
4
Flan-T5
Approach=Seq2Seq
2026.05
78.05
39.31
3.81
T5
Approach=Seq2Seq
2026.05
71.11
30.38
5.04
Feedback
Search any
task
Search any
task