Share your thoughts, 1 month free Claude Pro on usSee more

Research Solution Evaluation on ICLR problems 2026 (test)

56Feasibility Win%

Mistral-24B

Updated 2mo ago

Evaluation Results

Method	Links
Mistral-24B 2025.10		56	0.48	70.7	0.01	58.7	0.3
Combined LLM 2025.10		52.6	0.68	73.5	0	65.2	0.006
GPT-OSS-120B 2025.10		50	1	45.5	0.73	67.9	0.09
GPT-OSS-120B 2025.10		48.9	1	76.2	0.0009	72.1	0.005