TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems

About

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100 -- 1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices' ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.

Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Shlomi Regev, Rocky Rhodes, Tiezhen Wang, Pete Warden• 2020

Related benchmarks

Task	Dataset	Result
Image Processing Inference	ViT (Vision Transformer)	Average Latency (ms)91.02	16
Image Processing Inference	ResNet50	Average Latency (ms)8.96	15
Image Processing Inference	MobileNet V3	Average Latency (ms)0.7	14
LLM Inference	Llama 3.2 Samsung Galaxy S25 Ultra 1B (test)	Prefill Min Throughput (tokens/sec)185.3	13
LLM Inference	Qwen3 Samsung Galaxy S25 Ultra 0.6B (test)	Prefill Throughput (min)172.6	12
LLM Inference	Qwen3 Google Pixel 9 Pro XL 0.6B (test)	Prefill Throughput (min, tokens/sec)101.1	10
LLM Inference	Llama 3.2 Google Pixel 9 Pro XL 1B (test)	Prefill Throughput (min) (tokens/sec)117	10

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord