Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Failure Reasoning and Correction on Dream2Fix (test)
Loading...
91.3
ROUGE-L
Dream2Fix-VLM
4.668
27.159
49.65
72.141
Mar 13, 2026
ROUGE-L
Cosine Similarity
Binary Success Rate
Fuzzy Match Score
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROUGE-L
Cosine Similarity
Binary Success Rate
Fuzzy Match Score
Accuracy
Dream2Fix-VLM
2026.03
91.3
94.1
95
77.1
81.3
GPT-4o
2026.03
50.2
53.8
62.3
61.5
19.7
Gemini-1.5-Flash
2026.03
40.6
51.8
67.9
49.6
17.9
Qwen2-VL-72B
2026.03
24.8
38.6
56.7
24.8
14.3
Qwen2.5-VL-7B
2026.03
17
39.1
55.1
24.1
11.3
LLaVA-NeXT-34B
2026.03
13
12.7
15.3
45.3
4.4
LLaVA-NeXT-7B
2026.03
8
9.9
11.1
32.2
2.2
Feedback
Search any
task
Search any
task