FGSVQA: Frequency-Guided Short-form Video Quality Assessment

About

Short-form video poses new challenges to the quality assessment of user-generated content (UGC) due to its complex generation pipeline, rapid content variation, and mixed distortions. To address this challenge, we propose an end-to-end video quality assessment (VQA) framework that employs a dense visual encoder based on CLIP, and incorporates compression priors derived from the frequency domain to generate artifact- and structure-aware weight maps for feature aggregation. By explicitly decomposing artifact, structure, and original visual feature branches and adaptively fusing them over time through a learned gating module, the proposed method achieves accurate and efficient quality prediction. Experimental results show that our method achieves strong performance on short-form video datasets in terms of average rank and linear correlation (SRCC: 0.736, PLCC: 0.787), while maintaining efficient inference runtime. The code and additional results are available at: https://github.com/xinyiW915/FGSVQA.

Xinyi Wang, Angeliki Katsenou, Junxiao Shen, David Bull• 2026

Related benchmarks

Task	Dataset	Result
No-Reference Video Quality Assessment	YT-SFV SDR_ANIMAL_5NGJ.MP4 (sample)	Inference Time (s)0.313	16
Video Quality Assessment	YouTube-SFV HDR2SDR (test)	SRCC0.543	14
No-Reference Video Quality Assessment	KVQ (test)	SRCC0.877	4
No-Reference Video Quality Assessment	YT-SFV SDR (test)	SRCC78.8	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord