Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Image Generation on LLM-Judge Evaluation Set
Loading...
37.2
Quality
Tuna-R
21.704
25.727
29.75
33.773
Apr 27, 2026
Quality
Diversity
Updated 1mo ago
Evaluation Results
Method
Method
Links
Quality
Diversity
Tuna-R
LLM Judge=Claude Opus 4.7
2026.04
37.2
29.9
Tuna-R
LLM Judge=GPT-5.4
2026.04
35.7
30.9
Tuna-2
LLM Judge=Claude Opus 4.7
2026.04
34.8
41.9
Tuna-2
LLM Judge=GPT-5.4
2026.04
32.1
48.4
Tuna
LLM Judge=Claude Opus 4.7
2026.04
28.1
28.2
Tuna
LLM Judge=GPT-5.4
2026.04
22.3
20.6
Feedback
Search any
task
Search any
task