Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

QA dataset

Benchmarks

Task NameDataset NameSOTA ResultTrend
Defense against Indirect Prompt InjectionFiltered QA dataset
ASR (Naive)97.65
30
Question AnsweringQA Dataset 50 questions Long condition
Accuracy72
8
Question AnsweringQA Dataset 100 questions - Short condition
Accuracy94
8
Question AnsweringQA Dataset 150 questions Overall
Accuracy86.7
8
Question AnsweringQA dataset Reverse direction
Exact Match Accuracy87
2
Question AnsweringQA dataset Same direction
Exact Match Accuracy100
2
Showing 6 of 6 rows