Share your thoughts, 1 month free Claude Pro on usSee more

User simulation on Synthetic Exposure Overall

55Accuracy (%)

Llama-3.2-3B-Instruct +SFT+DPO

Updated 3mo ago

Evaluation Results

Method	Links
Llama-3.2-3B-Instruct +SFT+DPO 2025.08		55
Qwen2.5-3B-Instruct +SFT+DPO 2025.08		54.7
Qwen2.5-3B-Instruct +SFT+GRPO 2025.08		53.1
Llama-3.2-3B-Instruct +SFT+GRPO 2025.08		52.7
Gemini-3.0-Pro-Preview (2025-11-18) 2025.08		47.7
Qwen2.5-3B-Instruct +SFT 2025.08		46.4
Llama-3.2-3B-Instruct +SFT 2025.08		46
GPT-5.1 (2025-11-13) 2025.08		42.6
Gemini-2.5-Flash (2025-06-17) 2025.08		42.5
GPT-5 (2025-08-07) 2025.08		42.2
GPT-5-Mini (2025-08-07) 2025.08		40.7
Qwen2.5-32B-Instruct (Teacher) 2025.08		39.7
GPT-5-Nano (2025-08-07) 2025.08		36.8
Gemma-3-12B-it 2025.08		36
Qwen2.5-14B-Instruct 2025.08		35.6
Qwen2.5-7B-Instruct 2025.08		33.7
Gemma-3-4B-it 2025.08		29.6
Qwen2.5-3B-Instruct 2025.08		27.2
Llama-3.2-3B-Instruct 2025.08		22.7