Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Open-ended Question Answering on Proposed LLM-based evaluation benchmark OEQ

96.9Completeness

GPT-4o-Mini-Audio

54.2665.3376.487.47Dec 2, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
96.94998.168.4
2025.12
96.847.497.267.2
2025.12
89.145.288.962.7
2025.12
77.862.276.968.3
2025.12
77.728.476.948
2025.12
77.773.571.373.9
2025.12
7751.556.957.7
2025.12
68.544.848.150.2
2025.12
55.924.757.437.5