Share your thoughts, 1 month free Claude Pro on usSee more

Goal reconstruction on OneStop (New Item)

0.651BERTScore

Incorrect Human (same critical span)

Updated 4mo ago

Evaluation Results

Method	Links
Incorrect Human (same critical span) 2025.05		0.651	67.7
Gemini few-shot 2025.05		0.642	68.3
DalEye-Llama 2025.05		0.631	64.8
DalEye-GPT 2025.05		0.63	65.8
Gemini zero-shot 2025.05		0.629	66.4
Text-only GPT-4o-mini 2025.05		0.619	61.9
DalEye-LLaVA 2025.05		0.618	61
Text-only Llama 3.1 2025.05		0.617	60.9
Arbitrary Gemini 3 2025.05		0.612	63.6
Incorrect Human (different critical span) 2025.05		0.603	49
Text-only LLaVA OneVision 2025.05		0.595	63.6