Share your thoughts, 1 month free Claude Pro on usSee more

Multiple Choice Question Answering on CTIBench MCQA

0.819Score

GPT-5

Updated 4mo ago

Evaluation Results

Method	Links
GPT-5 2026.01		0.819
GPT-4.1 2026.01		0.76
GPT-5-Mini 2026.01		0.753
o3-Mini 2026.01		0.716
GPT-OSS-120B 2026.01		0.714
Llama-Primus-Nemotron-70B-Instruct 2026.01		0.705
Llama-3.3-70B-Instruct 2026.01		0.692
Foundation-Sec-8B-Reasoning 2026.01		0.691
GPT-5-Nano 2026.01		0.688
Qwen-3-14B 2026.01		0.664
Phi-4 2026.01		0.658
GPT-OSS-20B 2026.01		0.655
Foundation-Sec-8B-Instruct 2026.01		0.65
Qwen-3-8B 2026.01		0.649
Llama-3.1-8B-Instruct 2026.01		0.607
Llama-Primus-Merged 2026.01		0.604
DeepHat-V1-7B 2026.01		0.493