Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tulu Generation Evaluation on Tulu
Loading...
85
Grammar Accuracy
Gemini 2.0 Flash
12.2
31.1
50
68.9
Feb 17, 2026
Grammar Accuracy
Contamination Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Grammar Accuracy
Contamination Rate
Gemini 2.0 Flash
Prompt Strategy=Full S...
2026.02
85
5
GPT-4o
Prompt Strategy=Full S...
2026.02
82
7
Llama 3.1 70B
Prompt Strategy=Full S...
2026.02
78
6
Gemini 2.0 Flash
Prompt Strategy=+Gram+...
2026.02
72
10
GPT-4o
Prompt Strategy=+Gram+...
2026.02
70
12
Llama 3.1 70B
Prompt Strategy=+Gram+...
2026.02
65
15
Gemini 2.0 Flash
Prompt Strategy=+Grammar
2026.02
63
28
GPT-4o
Prompt Strategy=+Grammar
2026.02
62
32
Llama 3.1 70B
Prompt Strategy=+Grammar
2026.02
58
35
Gemini 2.0 Flash
Prompt Strategy=Baseline
2026.02
25
75
GPT-4o
Prompt Strategy=Baseline
2026.02
20
80
Llama 3.1 70B
Prompt Strategy=Baseline
2026.02
15
82
Feedback
Search any
task
Search any
task