Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Dialogue-level Guidance Quality Evaluation on Dialogue-level evaluation (N=54)
Loading...
3.3
State Tracking
VIGiA
1.5424
1.9987
2.455
2.9113
Feb 22, 2026
State Tracking
Instruction Clarity
Plan Adherence
Updated 1mo ago
Evaluation Results
Method
Method
Links
State Tracking
Instruction Clarity
Plan Adherence
VIGiA
2026.02
3.3
4.06
4.26
Qwen 3 VL
2026.02
2.85
2.35
3.39
InternVL 3.5
2026.02
2.52
2.44
3.13
Qwen 2.5 VL
2026.02
2.31
2.41
2.74
Llava OV
2026.02
1.87
2.59
3
MM-PlanLLM
2026.02
1.61
2.11
2.24
Feedback
Search any
task
Search any
task