Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Free-language reasoning on RoboFAC Simulation
Loading...
32.6
ROUGE-L (TI)
KITE+Qwen2.5-7B+QLoRA
20.12
23.36
26.6
29.84
Apr 8, 2026
ROUGE-L (TI)
ROUGE-L (FE)
ROUGE-L (HL)
ROUGE-L (LL)
SBERT Cosine Similarity (TI)
SBERT Cosine Similarity (FE)
SBERT Cosine Similarity (HL)
SBERT Cosine Similarity (LL)
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROUGE-L (TI)
ROUGE-L (FE)
ROUGE-L (HL)
ROUGE-L (LL)
SBERT Cosine Similarity (TI)
SBERT Cosine Similarity (FE)
SBERT Cosine Similarity (HL)
SBERT Cosine Similarity (LL)
KITE+Qwen2.5-7B+QLoRA
Evidence Representatio...
2026.04
32.6
31.4
30.2
29.6
69.8
84.5
80.6
80.3
RoboFAC-7B
Finetuned=true
2026.04
32.3
29.9
30.1
24.5
70.1
84.2
80.8
79.4
KITE + Qwen2.5-VL-7B
Evidence Representatio...
2026.04
29.5
24.8
24.1
19
68
82.9
79.8
77.9
Qwen2.5-VL-7B
2026.04
20.6
19.4
23
15.7
54.6
44.8
68.3
65.7
Feedback
Search any
task
Search any
task