Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Reasoning on GPQA Diamond (Mean@16)
Loading...
84.15
Mean@16
FlashMLA
80.0836
81.1393
82.195
83.2507
Feb 11, 2026
Mean@16
Updated 4d ago
Evaluation Results
Method
Method
Links
Mean@16
FlashMLA
Backbone=DeepSeek-V3.1...
2026.02
84.15
SnapMLA
Backbone=DeepSeek-V3.1...
2026.02
82.57
FlashMLA
Backbone=LongCat-Flash...
2026.02
81.5
SnapMLA
Backbone=LongCat-Flash...
2026.02
80.24
Feedback
Search any
task
Search any
task