Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model

About

We describe our experience training Summer-22B, a video foundation model developed from scratch. This report documents the engineering challenges, design decisions, and lessons learned while scaling from raw footage collection to a functional model trained on approximately 50 million clips. We outline our approach combining metadata-driven dataset curation, multi-stage filtering, $\mu$P parameterization, and hypersphere-constrained optimization. We developed the Lavender Data system for dataset management and adopted inference-aware architectural choices. We share observations on what worked in our setting: dataset engineering consumed the majority of effort, architectural variants showed smaller differences than we expected, and $\mu$P hyperparameter transfer appeared effective even under geometric constraints. We hope this account proves useful to others undertaking similar projects.

Simo Ryu, Chunghwan Han• 2026

Related benchmarks

Task	Dataset	Result	Rank
Video Generation	VBench 2.0	Human Fidelity0.745		26
Video Generation	VBench 2.0 (overall)	Total Score0.539		4

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord