Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Evaluating Generative Models via One-Dimensional Code Distributions

About

Most evaluations of generative models rely on feature-distribution metrics such as FID, which operate on continuous recognition features that are explicitly trained to be invariant to appearance variations, and thus discard cues critical for perceptual quality. We instead evaluate models in the space of discrete visual tokens, where modern 1D image tokenizers compactly encode both semantic and perceptual information and quality manifests as predictable token statistics. We introduce Codebook Histogram Distance (CHD), a training-free distribution metric in token space, and Code Mixture Model Score (CMMS), a no-reference quality metric learned from synthetic degradations of token sequences. To stress-test metrics under broad distribution shifts, we further propose VisForm, a benchmark of 210K images spanning 62 visual forms and 12 generative models with expert annotations. Across AGIQA, HPDv2/3, and VisForm, our token-based metrics achieve state-of-the-art correlation with human judgments. We will release all code and datasets to facilitate future research, with the code publicly available at https://github.com/zexiJia/1d-Distance.

Zexi Jia, Pengcheng Luo, Yijia Zhong, Jinchao Zhang, Jie Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Generative Model EvaluationHPD v3
Real Score0.629
13
Generative Model EvaluationAGIQA
AttnGAN Performance Score0.57
13
Human preference predictionAGIQA
Accuracy71.5
8
Human preference predictionHPD v2
Accuracy74.9
8
Human preference predictionHPD v3
Accuracy61.3
8
Human preference predictionVisForm
Accuracy66.7
8
Showing 6 of 6 rows

Other info

Follow for update