Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PBSBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Cell-Level Visual Question AnsweringPBSBench Out of Domain
Accuracy (T/F)97
17
Cell-Level Visual Question AnsweringPBSBench (In-domain)
True/False Accuracy77
17
Open-ended Question AnsweringPBSBench Slide-level 1.0 (test)
BLEU-136
6
Fill-in-the-blank Question AnsweringPBSBench Slide-level 1.0 (test)
Exact Match (EMatch)27
6
Multiple Choice Question AnsweringPBSBench Slide-level 1.0 (test)
Accuracy85
6
True/False Question AnsweringPBSBench Slide-level 1.0 (test)
Accuracy86
6
Showing 6 of 6 rows