TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems
About
Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100 -- 1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices' ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Processing Inference | ViT (Vision Transformer) | Average Latency (ms)91.02 | 16 | |
| Image Processing Inference | ResNet50 | Average Latency (ms)8.96 | 15 | |
| Image Processing Inference | MobileNet V3 | Average Latency (ms)0.7 | 14 | |
| LLM Inference | Llama 3.2 Samsung Galaxy S25 Ultra 1B (test) | Prefill Min Throughput (tokens/sec)185.3 | 13 | |
| LLM Inference | Qwen3 Samsung Galaxy S25 Ultra 0.6B (test) | Prefill Throughput (min)172.6 | 12 | |
| LLM Inference | Qwen3 Google Pixel 9 Pro XL 0.6B (test) | Prefill Throughput (min, tokens/sec)101.1 | 10 | |
| LLM Inference | Llama 3.2 Google Pixel 9 Pro XL 1B (test) | Prefill Throughput (min) (tokens/sec)117 | 10 |