Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Realism assessment on 200-sample
Loading...
3.06
Visual Integration
MIRAGE
2.5712
2.6981
2.825
2.9519
May 27, 2026
Visual Integration
Layout Plausibility
Semantic Consistency
Detectability
Overall Score
Krippendorff's Alpha
Spearman Rho (Human-LLM)
Updated 6d ago
Evaluation Results
Method
Method
Links
Visual Integration
Layout Plausibility
Semantic Consistency
Detectability
Overall Score
Krippendorff's Alpha
Spearman Rho (Human-LLM)
MIRAGE
evaluation_type=Human...
2026.05
3.06
3.21
2.77
3.03
3.02
-
-
AgentHazard
evaluation_type=Human...
2026.05
2.59
2.68
2.28
2.52
2.52
-
-
Feedback
Search any
task
Search any
task