Kairos: A Scalable Serving System for Physical AI
About
Physical AI is experiencing rapid growth with frontier foundation models increasing its capabilities across general environments. Physical AI tasks are characterized by inference properties that are markedly different from digital AI. They consist of multiple rounds of inference and action execution, generating a chunk of actions in each inference round, and asynchronously interleaving inference and execution. This makes existing digital AI serving systems unsuited for physical AI; a shortcoming that is critical for enabling their wide adoption, considering their size and the scale of the robot fleets they have to serve. To fill this gap, we design Kairos, the first multi-robot serving system that makes the generate-execute loop a first-class citizen, with active involvement in the execution phase. Across a wide range of physical AI models and robots, Kairos reduces the average end-to-end task latency by 31.8--66.5% over state-of-the-art digital AI serving practices, with gains scaling with the robot fleet size.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | LIBERO SmolVLA | P25 Success Rate41.9 | 3 | |
| Robotic Manipulation | LIBERO XVLA | Success Rate (P25)74.9 | 3 | |
| Robotic Manipulation | LIBERO Pi0.5 | P25 Success Rate77.4 | 3 | |
| Robotic Manipulation | MetaWorld SmolVLA | P25 Success Rate88.4 | 3 | |
| Robotic Manipulation | Isaac GR00T N1.5 | P25 Score61.6 | 3 | |
| Robotic Manipulation | Bimanual Pi0.5 | P25 Success Rate55.9 | 3 | |
| Robotic Manipulation | RoboTwin Fast-WAM | P2562.4 | 3 | |
| Robotic Manipulation | Bridge minic-video | P25 Score54.6 | 3 | |
| Robotic Manipulation Serving | LIBERO SmolVLA | P25 Latency Reduction39.5 | 3 | |
| Robotic Manipulation Serving | LIBERO XVLA | P25 Latency Reduction74.5 | 3 |