Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Machine Translation on anaphora benchmark (val+test)

54.41BLEU

gpt-4o

19.93428.884537.83546.7855Oct 20, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
54.4171.4992.591.185.35
2025.10
52.8671.3592.7591.673.78
2025.10
49.1967.8191.4290.365.22
2025.10
49.1767.4189.990.34.17
2025.10
49.0869.5892.4291.61-
2025.10
49.0669.5392.3491.01-
2025.10
49.0168.2891.890.525.58
2025.10
46.5867.3891.3389.762.68
2025.10
4566.5191.6391.14-
2025.10
43.9765.6691.2591.16-
2025.10
43.966.391.7990.19-
2025.10
43.4363.8990.8487.82-
2025.10
39.3658.8183.7984.54.34
2025.10
35.0258.8589.387.36-
2025.10
34.1357.7688.9185.94-
2025.10
34.0458.288.5585.51-
2025.10
32.9853.2979.2879.04-1.06
2025.10
30.3554.4186.7884.64-3.78
2025.10
26.6950.5985.5382.29-
2025.10
26.6751.2486.6781.63-
2025.10
26.6649.7984.6980.47-0.03
2025.10
23.7449.9685.8680.1-
2025.10
23.1947.4182.7775.84-3.48
2025.10
21.2645.2176.7877.12-2.48