Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Knowledge-intensive image editing on KrisBench (test)

79.8Factual Accuracy

GPT-4o

46.426455.090763.75572.4193Feb 2, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
79.881.3778.3280.09
2026.02
71.8567.1663.6868
2026.02
70.6772.3856.8968.23
2026.02
66.6963.552.3861.85
2026.02
66.1861.9249.0260.18
2026.02
65.2659.6562.962.41
2026.02
57.3644.247.7949.71
2026.02
47.7144.847.9250.27