Share your thoughts, 1 month free Claude Pro on usSee more

User Simulation on MIND Real Exposure

34.8Accuracy

Gemini-3.0-Pro-Preview (2025-11-18)

Updated 3mo ago

Evaluation Results

Method	Links
Gemini-3.0-Pro-Preview (2025-11-18) 2025.08		34.8
Llama-3.2-3B-Instruct +SFT+DPO 2025.08		34
Qwen2.5-3B-Instruct +SFT+DPO 2025.08		33.5
GPT-5 (2025-08-07) 2025.08		33.2
Qwen2.5-3B-Instruct +SFT+GRPO 2025.08		32.8
Llama-3.2-3B-Instruct +SFT+GRPO 2025.08		32.2
Gemini-2.5-Flash (2025-06-17) 2025.08		31.8
GPT-5-Mini (2025-08-07) 2025.08		31.4
GPT-5.1 (2025-11-13) 2025.08		31
Qwen2.5-3B-Instruct +SFT 2025.08		30.9
Gemma-3-12B-it 2025.08		29.1
Qwen2.5-32B-Instruct (Teacher) 2025.08		28.7
GPT-5-Nano (2025-08-07) 2025.08		28.4
Qwen2.5-14B-Instruct 2025.08		27.9
Gemma-3-4B-it 2025.08		27.5
Llama-3.2-3B-Instruct +SFT 2025.08		27.3
Qwen2.5-3B-Instruct 2025.08		27.2
Qwen2.5-7B-Instruct 2025.08		25.9
Llama-3.2-3B-Instruct 2025.08		19.9