Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Document Question Answering on MMLongBench-Doc

69.6Accuracy (all)

GPT-4o-200b-128

4.39221.32138.2555.179Oct 8, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
69.6---69.356.5
2025.10
62.5---67.645
2025.10
61---63.737.2
2025.10
59.3---62.940.9
2025.10
57.5---60.730.3
2025.10
49.8---53.734.3
2025.10
44.8---49.524.2
2025.10
41.5---46.226.1
2025.10
39.9---50.625
2025.10
39.1---41.420.2
2025.10
38.9---39.520
2025.10
38.3---39.421.3
2025.10
38.3---49.222.7
2025.10
34.5---32.717.8
2025.10
33.6---4019.9
2025.10
31.9---39.520.8
2025.10
31.5---39.919.2
2025.10
30.4---36.521.2
2025.10
30.2---38.118.9
2025.10
28.7---37.216.4
2025.10
8.5---9.59.5
2025.10
7---7.77.2
2025.10
6.9---7.46.4
2026.01
-42.844.9---
2026.01
---58.1--
2026.01
-30.130.5---
2026.01
-39.637.2---
2026.01
-2927.8---
2026.01
-32.230.8---
2026.01
-31.436.5---
2026.01
-42.8----
2026.01
-38.138.3---
2026.01
-42----
2026.01
-47.4----
2026.01
---58.6--
2026.01
---56.6--
2026.01
---63.3--
2026.01
---67.6--
2026.01
-51.849.1---
2026.01
-57.354.1---
2026.01
-52.350.859.2--
2026.01
-56.355.365.9--
2026.01
-5756.867.6--
2026.01
-48.449.259.4--
2026.01
-54.453.965.3--
-65.866---