Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tulu Generation Evaluation on Tulu
Loading...
85
Grammar Accuracy
Gemini 2.0 Flash
12.2
31.1
50
68.9
Feb 17, 2026
Grammar Accuracy
Contamination Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Grammar Accuracy
Contamination Rate
Gemini 2.0 Flash
Prompt Strategy=Full S...
2026.02
85
5
GPT-4o
Prompt Strategy=Full S...
2026.02
82
7
Llama 3.1 70B
Prompt Strategy=Full S...
2026.02
78
6
Gemini 2.0 Flash
Prompt Strategy=+Gram+...
2026.02
72
10
GPT-4o
Prompt Strategy=+Gram+...
2026.02
70
12
Llama 3.1 70B
Prompt Strategy=+Gram+...
2026.02
65
15
Gemini 2.0 Flash
Prompt Strategy=+Grammar
2026.02
63
28
GPT-4o
Prompt Strategy=+Grammar
2026.02
62
32
Llama 3.1 70B
Prompt Strategy=+Grammar
2026.02
58
35
Gemini 2.0 Flash
Prompt Strategy=Baseline
2026.02
25
75
GPT-4o
Prompt Strategy=Baseline
2026.02
20
80
Llama 3.1 70B
Prompt Strategy=Baseline
2026.02
15
82
Feedback
Search any
task
Search any
task