Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context Multimodal Evaluation on MileBench (test)

25.34TN Score

Full Cache

1.58647.753213.9220.0868Jun 6, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.06
25.349.0140.515.718.5615.9855.574.55234.12
2025.06
23.439.4138.515.6217.7616.765574.551.533.61
2025.06
11.2529.4544.528.3637.5342.4662.5636163.34
2025.06
10.7329.3943.527.337.9540.5762.56262.562.74
2025.06
108.7737.515.4615.2514.12537450.530.96
2025.06
9.687.9833.516.3215.1816.624663.548.528.59
2025.06
9.386.973115.8515.0616.7647644828.22
2025.06
7.928.523515.5316.8415.3753.574.55130.91
2025.06
7.812.6328.514.051.726.5549.8685025.45
2025.06
7.818.734.515.2115.0315.0653745130.48
2025.06
4.5529.2541.526.5437.4139.4161.56060.560.11
2025.06
4.2328.734228.9736.6638.666261.561.560.71
2025.06
3.9729.1340.526.7937.7836.0362.561.56159.7
2025.06
3.4625.124026.1434.0727.58605960.555.98
2025.06
3.346.5129.515.7913.9614.1245.56346.526.47
2025.06
3.276.032914.8214.415.3745.56445.526.43
2025.06
3.123.592611.773.7310.4442.5434420.91
2025.06
2.55.512815.7314.8614.074463.54425.8