Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VerifyBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
VerificationVerifyBench Hard 1.0 (test)
Mean@3 Accuracy91.9
18
VerificationVerifyBench 1.0 (test)
m@3 Accuracy96.6
18
Showing 2 of 2 rows