Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Stability and Performance Evaluation on Benchmark dataset n=80

0.9895U

Grok-3

0.95310.962550.9720.98145Apr 27, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
0.98950.0120.79680.90691.85180.97750.9830.0055
2026.04
0.98450.0440.95970.94821.95390.94060.9620.0215
0.96950.05170.85940.9531.90620.91780.94240.0246
2026.04
0.95450.1480.89810.7991.84850.80650.87440.0679