Share your thoughts, 1 month free Claude Pro on usSee more

Advanced Reasoning on ruGPQA Diamond

0.773Accuracy

o4-mini (medium)

Updated 4mo ago

Evaluation Results

Method	Links
o4-mini (medium) 2025.12		0.773
DeepSeek-R1 2025.12		0.763
DeepSeek-V3 2025.12		0.657
DeepSeek-R1-Distill-Qwen-32B 2025.12		0.631
Qwen3-32B 2025.12		0.606
T-pro 2.0 2025.12		0.591
RuadaptQwen3-32B-Instruct 2025.12		0.591
GPT-4o 2025.12		0.51
GigaChat 2 Max 2025.12		0.475
Gemma 3 27B 2025.12		0.439
YandexGPT5-Pro 2025.12		0.354