Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems

About

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100 -- 1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices' ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.

Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Shlomi Regev, Rocky Rhodes, Tiezhen Wang, Pete Warden• 2020

Related benchmarks

TaskDatasetResultRank
Image Processing InferenceViT (Vision Transformer)
Average Latency (ms)91.02
16
Image Processing InferenceResNet50
Average Latency (ms)8.96
15
Image Processing InferenceMobileNet V3
Average Latency (ms)0.7
14
LLM InferenceLlama 3.2 Samsung Galaxy S25 Ultra 1B (test)
Prefill Min Throughput (tokens/sec)185.3
13
LLM InferenceQwen3 Samsung Galaxy S25 Ultra 0.6B (test)
Prefill Throughput (min)172.6
12
LLM InferenceQwen3 Google Pixel 9 Pro XL 0.6B (test)
Prefill Throughput (min, tokens/sec)101.1
10
LLM InferenceLlama 3.2 Google Pixel 9 Pro XL 1B (test)
Prefill Throughput (min) (tokens/sec)117
10
Showing 7 of 7 rows

Other info

Follow for update