Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model

About

We describe our experience training Summer-22B, a video foundation model developed from scratch. This report documents the engineering challenges, design decisions, and lessons learned while scaling from raw footage collection to a functional model trained on approximately 50 million clips. We outline our approach combining metadata-driven dataset curation, multi-stage filtering, $\mu$P parameterization, and hypersphere-constrained optimization. We developed the Lavender Data system for dataset management and adopted inference-aware architectural choices. We share observations on what worked in our setting: dataset engineering consumed the majority of effort, architectural variants showed smaller differences than we expected, and $\mu$P hyperparameter transfer appeared effective even under geometric constraints. We hope this account proves useful to others undertaking similar projects.

Simo Ryu, Chunghwan Han• 2026

Related benchmarks

TaskDatasetResultRank
Video GenerationVBench 2.0
Human Fidelity0.745
26
Video GenerationVBench 2.0 (overall)
Total Score0.539
4
Showing 2 of 2 rows

Other info

Follow for update