Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Probe

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringProbe 2 1.0 (test)
Accuracy65
42
Knowledge ProbingProbe 1
Accuracy96
42
Anomaly Detectionprobe
ROC-AUC98.88
14
Anomaly Detectionprobe
PR-AUC92.07
14
Question AnsweringProbe 2
Accuracy70
4
Question AnsweringProbe 1
Accuracy99
4
Showing 6 of 6 rows