Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

About

Crystal modeling spans a family of conditional and unconditional generation tasks, including crystal structure prediction (CSP) and de novo generation (DNG). While recent deep generative models have shown promising performance, they remain largely task-specific, lacking a unified framework that shares crystal representations across tasks. To address this limitation, we propose Multimodal Crystal Flow (MCFlow), a unified multimodal flow model that realizes multiple crystal generation tasks as distinct inference trajectories via independent time variables for atom types and crystal structures. To enable multimodal flow in a standard transformer model, we introduce a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation, injecting compositional and crystallographic priors without explicit structural templates. Experiments on the MP-20 and MPTS-52 benchmarks show that a single MCFlow model is competitive with task-specific baselines across CSP, DNG, and structure-conditioned atom type generation.

Kiyoung Seong, Sungsoo Ahn, Sehui Han, Changyoung Park• 2026

Related benchmarks

Task	Dataset	Result
De Novo Generation	MP-20	Structural Validity0.9961	21
Crystal Structure Prediction	MPTS-52 (test)	MR41.45	13
Crystal Structure Prediction	MP-20 July 2021 (test)	MR77.84	13
Crystal Structure Prediction	MP-20 (test)	Match Rate @ 164.08	9
De novo Crystal Generation	LeMat GenBench	Validity98.6	8
Atom type generation	MP-20	Compositional Accuracy90.23	7
Atom type generation	MPTS-52	Comp. Accuracy84.25	7
De Novo Generation	MPTS-52 (test)	Structural Validity98.34	7
Crystal Structure Prediction	MP-20 polymorph (test)	METRe70.7	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord