Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Belief reasoning on EgoToM
Loading...
72
Accuracy
Humans
18.336
32.268
46.2
60.132
Mar 25, 2026
Accuracy
Updated 2mo ago
Evaluation Results
Method
Method
Links
Accuracy
Humans
Setting=Baseline, Nfra...
2026.03
72
Humans
Setting=Baseline, Nfra...
2026.03
71
Gemini-2.5-Flash
Setting=Baseline, Nfra...
2026.03
46.7
Video-Llama2-72B
Setting=Baseline, Nfra...
2026.03
46
LLaVA-Next-Video-7B
Setting=w/o \delta _{V...
2026.03
45.3
LLaVA-Next-Video-7B
Setting=+\alpha \Delta...
2026.03
45.3
GPT-4-Turbo
Setting=Baseline, Nfra...
2026.03
45
Qwen2.5-VL-7B
Setting=+\alpha \Delta...
2026.03
42
Qwen2.5-VL-7B
Setting=w/o \delta _{V...
2026.03
40.6
LLaVA-Next-Video-7B
Setting=w/o \delta _{T...
2026.03
39.2
LLaVA-Next-Video-7B
Setting=Rnd-\Delta, Nf...
2026.03
39.2
CogVLM2
Setting=Baseline, Nfra...
2026.03
39
LLaVA-Next-Video-7B
Setting=Baseline, Nfra...
2026.03
38.9
Qwen2.5-VL-7B
Setting=Rnd-\Delta, Nf...
2026.03
36
Qwen2.5-VL-7B
Setting=Baseline, Nfra...
2026.03
35.6
Qwen2.5-VL-7B
Setting=w/o \delta _{T...
2026.03
35.6
Qwen2.5-VL-7B
Setting=-\alpha \Delta...
2026.03
24.3
LLaVA-Next-Video-7B
Setting=-\alpha \Delta...
2026.03
20.6
GPT-4o
Setting=Baseline, Nfra...
2026.03
20.4
Feedback
Search any
task
Search any
task