Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mercury

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code EfficiencyMercury
Beyond@183.6
29
Out-of-distribution AI-generated text detectionMercury Out-of-distribution (OOD) 2 (test unseen domains)
Legal Accuracy95.7
16
Code GenerationMercury (test)
Pass Rate (Easy)90.3
8
Black-box Vulnerability DetectionMercury
Vulnerabilities Found Count22
4
Showing 4 of 4 rows