Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
User Simulation on MIND Real Exposure
Loading...
34.8
Accuracy
Gemini-3.0-Pro-Preview (2025-11-18)
19.304
23.327
27.35
31.373
Aug 25, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini-3.0-Pro-Preview (2025-11-18)
Model Category=Proprie...
2025.08
34.8
Llama-3.2-3B-Instruct +SFT+DPO
Model Category=Fine-tu...
2025.08
34
Qwen2.5-3B-Instruct +SFT+DPO
Model Category=Fine-tu...
2025.08
33.5
GPT-5 (2025-08-07)
Model Category=Proprie...
2025.08
33.2
Qwen2.5-3B-Instruct +SFT+GRPO
Model Category=Fine-tu...
2025.08
32.8
Llama-3.2-3B-Instruct +SFT+GRPO
Model Category=Fine-tu...
2025.08
32.2
Gemini-2.5-Flash (2025-06-17)
Model Category=Proprie...
2025.08
31.8
GPT-5-Mini (2025-08-07)
Model Category=Proprie...
2025.08
31.4
GPT-5.1 (2025-11-13)
Model Category=Proprie...
2025.08
31
Qwen2.5-3B-Instruct +SFT
Model Category=Fine-tu...
2025.08
30.9
Gemma-3-12B-it
Model Category=Base Mo...
2025.08
29.1
Qwen2.5-32B-Instruct (Teacher)
Model Category=Base Mo...
2025.08
28.7
GPT-5-Nano (2025-08-07)
Model Category=Proprie...
2025.08
28.4
Qwen2.5-14B-Instruct
Model Category=Base Mo...
2025.08
27.9
Gemma-3-4B-it
Model Category=Base Mo...
2025.08
27.5
Llama-3.2-3B-Instruct +SFT
Model Category=Fine-tu...
2025.08
27.3
Qwen2.5-3B-Instruct
Model Category=Fine-tu...
2025.08
27.2
Qwen2.5-7B-Instruct
Model Category=Base Mo...
2025.08
25.9
Llama-3.2-3B-Instruct
Model Category=Fine-tu...
2025.08
19.9
Feedback
Search any
task
Search any
task