Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VIAFormer: Voxel-Image Alignment Transformer for High-Fidelity Voxel Refinement

About

We propose VIAFormer, a Voxel-Image Alignment Transformer model designed for Multi-view Conditioned Voxel Refinement--the task of repairing incomplete noisy voxels using calibrated multi-view images as guidance. Its effectiveness stems from a synergistic design: an Image Index that provides explicit 3D spatial grounding for 2D image tokens, a Correctional Flow objective that learns a direct voxel-refinement trajectory, and a Hybrid Stream Transformer that enables robust cross-modal fusion. Experiments show that VIAFormer establishes a new state of the art in correcting both severe synthetic corruptions and realistic artifacts on the voxel shape obtained from powerful Vision Foundation Models. Beyond benchmarking, we demonstrate VIAFormer as a practical and reliable bridge in real-world 3D creation pipelines, paving the way for voxel-based methods to thrive in large-model, big-data wave.

Tiancheng Fang, Bowen Pan, Lingxi Chen, Jiangjing Lyu, Chengfei Lyu, Chaoyue Niu, Fan Wu• 2026

Related benchmarks

TaskDatasetResultRank
Refining VFM-derived artifactsToys4k
mIoU44.6
13
Refining VFM-derived artifactsDora
mIoU45.85
13
Refinement of VFM-derived artifactsToys4k (synthetically corrupted)
mIoU0.858
8
Showing 3 of 3 rows

Other info

Follow for update