Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Quantitative reasoning and autonomous analysis on BixBench Human Verified-50

83.33Accuracy

CellType

81.759682.167382.57582.9827May 7, 2026
Updated 26d ago

Evaluation Results

MethodLinks
2026.05
83.33
2026.05
83.33
2026.05
81.82