Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reefknot

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multiple Choice Question (MCQ) AnsweringReefknot
Perc. Score74.05
24
Yes/No Question AnsweringReefknot
Accuracy45.31
24
Relation Hallucination EvaluationReefknot 1.0 (full)
Perception Y/N Accuracy46.7
12
Relational Hallucination EvaluationReefknot
F1 Score67.9
5
Showing 4 of 4 rows