Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Macro-average of HallusionBench, AMBER, CRPE, R-Bench, and BLINK

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Multimodal EvaluationMacro-average of HallusionBench, AMBER, CRPE, R-Bench, and BLINK
Overall Score63.35
13
Showing 1 of 1 rows