Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Complex Multi-constraint Reasoning on MER-Bench Complex

802-P Accuracy

Gemini-3.0-Flash

37.8848.81559.7570.685Feb 7, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
805319.550.80.594
2026.02
7849.521.548.30.549
2026.02
7546.51746.20.546
2026.02
72391642.30.472
2026.02
704915.544.80.49
2026.02
662912.535.80.395
2026.02
613410350.393
2026.02
39.562.5160.182