Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task Semantic Diversity on RoboGene Generated Task Descriptions 1.0 (test)
Loading...
80.32
BLEU-1
Rule-based
26.8744
40.7497
54.625
68.5003
Feb 18, 2026
BLEU-1
BLEU-2
BLEU-3
BLEU-4
ROUGE-L
Cosine Similarity
Updated 3mo ago
Evaluation Results
Method
Method
Links
BLEU-1
BLEU-2
BLEU-3
BLEU-4
ROUGE-L
Cosine Similarity
Rule-based
2026.02
80.32
67.82
62.11
56.45
57.86
69.84
Human
2026.02
72.8
65.62
60.19
55.55
48.86
68.88
Gemini 2.5 Pro
2026.02
38.97
9.76
5.73
3.23
27.67
34.63
GPT-4o
2026.02
36.61
9.84
4.82
2.71
25.12
34.64
RoboGene
2026.02
28.93
6.62
3.15
1.75
19.18
29.07
Feedback
Search any
task
Search any
task