Advancing Open-source World Models
About
We present LingBot-World, an open-sourced world simulator stemming from video generation. Positioned as a top-tier world model, LingBot-World offers the following features. (1) It maintains high fidelity and robust dynamics in a broad spectrum of environments, including realism, scientific contexts, cartoon styles, and beyond. (2) It enables a minute-level horizon while preserving contextual consistency over time, which is also known as "long-term memory". (3) It supports real-time interactivity, achieving a latency of under 1 second when producing 16 frames per second. We provide public access to the code and model in an effort to narrow the divide between open-source and closed-source technologies. We believe our release will empower the community with practical applications across areas like content creation, gaming, and robot learning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Generation | VBench (test) | -- | 48 | |
| Interactive Video World Model Evaluation | RealWM120K VBench (val) | Latency (s)1.92e+3 | 9 | |
| World Simulation | Ann-Arbor-City-Bench | FID57.99 | 8 | |
| World Simulation | Busan-City-Bench | FID62.14 | 8 | |
| Camera-level spatial editing | SpatialEdit-Bench | Camera Viewpoint Error0.696 | 6 | |
| Interactive World Modeling | General Game World Modeling | Resolution720 | 6 | |
| Long-term Image-to-Video Generation | RE10K Long (test) | FID64.84 | 4 |