Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Open-ended Question Answering on PhysReason v2 (test)

51.1Subpart-AND (v2)

GPT-4o

11.78821.99432.242.406May 13, 2026
Updated 19d ago

Evaluation Results

MethodLinks
2026.05
51.1
2026.05
49.1
2026.05
39.6
38.8
2026.05
32.2
25.1
23.9
2026.05
23.3
13.3