Share your thoughts, 1 month free Claude Pro on usSee more

Aggregated LLM Evaluation on 8 Standard Benchmarks Aggregate

73.7Average Accuracy

Full model

Updated 1mo ago

Evaluation Results

Method	Links
Full model 2026.05		73.7
RCO 2026.05		71
EvoESAP 2026.05		66.5
RCO 2026.05		60.5
EvoESAP 2026.05		58.5