Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
User simulation on Synthetic Exposure Overall
Loading...
55
Accuracy (%)
Llama-3.2-3B-Instruct +SFT+DPO
21.408
30.129
38.85
47.571
Aug 25, 2025
Accuracy (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (%)
Llama-3.2-3B-Instruct +SFT+DPO
Model Category=Fine-tu...
2025.08
55
Qwen2.5-3B-Instruct +SFT+DPO
Model Category=Fine-tu...
2025.08
54.7
Qwen2.5-3B-Instruct +SFT+GRPO
Model Category=Fine-tu...
2025.08
53.1
Llama-3.2-3B-Instruct +SFT+GRPO
Model Category=Fine-tu...
2025.08
52.7
Gemini-3.0-Pro-Preview (2025-11-18)
Model Category=Proprie...
2025.08
47.7
Qwen2.5-3B-Instruct +SFT
Model Category=Fine-tu...
2025.08
46.4
Llama-3.2-3B-Instruct +SFT
Model Category=Fine-tu...
2025.08
46
GPT-5.1 (2025-11-13)
Model Category=Proprie...
2025.08
42.6
Gemini-2.5-Flash (2025-06-17)
Model Category=Proprie...
2025.08
42.5
GPT-5 (2025-08-07)
Model Category=Proprie...
2025.08
42.2
GPT-5-Mini (2025-08-07)
Model Category=Proprie...
2025.08
40.7
Qwen2.5-32B-Instruct (Teacher)
Model Category=Base Mo...
2025.08
39.7
GPT-5-Nano (2025-08-07)
Model Category=Proprie...
2025.08
36.8
Gemma-3-12B-it
Model Category=Base Mo...
2025.08
36
Qwen2.5-14B-Instruct
Model Category=Base Mo...
2025.08
35.6
Qwen2.5-7B-Instruct
Model Category=Base Mo...
2025.08
33.7
Gemma-3-4B-it
Model Category=Base Mo...
2025.08
29.6
Qwen2.5-3B-Instruct
Model Category=Fine-tu...
2025.08
27.2
Llama-3.2-3B-Instruct
Model Category=Fine-tu...
2025.08
22.7
Feedback
Search any
task
Search any
task