Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BIXBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Biomedical Intelligence EvaluationBixBench 205 (Evaluation)
Accuracy85.9
25
Automated auditingBIXBench (Verified-50)
Recall (A)83.3
6
Quantitative reasoning and autonomous analysisBixBench Human Verified-50
Accuracy83.33
3
Quantitative reasoning and autonomous analysisBixBench-Verified-50 Full set
Accuracy90
3
Showing 4 of 4 rows