Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device

About

Local execution of AI on edge devices is important for low latency and offline operation. However, deploying models on diverse hardware remains fragmented, often requiring model conversion or complete reimplementation outside the PyTorch ecosystem where the model was originally authored. We introduce ExecuTorch, a unified PyTorch-native deployment framework for edge AI. ExecuTorch enables seamless deployment of machine learning models across heterogeneous compute environments. It scales from embedded microcontrollers to complex system-on-chips (SoCs) with dedicated accelerators, powering devices ranging from wearables and smartphones to large compute clusters. ExecuTorch preserves PyTorch semantics while allowing customization, support for optimizations like quantization, and pluggable execution "backends". These features together enable fast experimentation, allowing researchers to validate deployment behavior entirely within PyTorch, bridging the gap between research and production.

Mergen Nachin, Digant Desai, Sicheng Stephen Jia, Chen Lai, Mengwei Liu, Jacob Szwejbka, Raziel Alvarez, RJ Ascani, Dave Bort, Manuel Candales, Andrew Caples, Yanan Cao, Zhengxu Chen, Soumith Chintala, Gregory Comer, Tanvir Islam, Songhao Jia, Tarun Karuturi, Jack Khuu, Abhinay Kukkadapu, Tugsbayasgalan Manlaibaatar, Andrew Or, Kimish Patel, Siddartha Pothapragada, Lucy Qiu, Supriya Rao, Orion Reblitz-Richardson, Max Ren, Scott Roy, Anthony Shoumikhin, Scott Wolchok, Guang Yang, Angela Yi, Martin Yuan, Hansong Zhang, Jack Zhang, Jerry Zhang, Shunting Zhang, C. Cagatay Bilgin• 2026

Related benchmarks

TaskDatasetResultRank
Image Processing InferenceViT (Vision Transformer)
Average Latency (ms)3.81
16
Image Processing InferenceResNet50
Average Latency (ms)0.55
15
Image Processing InferenceMobileNet V3
Average Latency (ms)0.24
14
LLM InferenceLlama 3.2 Samsung Galaxy S25 Ultra 1B (test)
Prefill Min Throughput (tokens/sec)2.81e+3
13
LLM InferenceQwen3 Samsung Galaxy S25 Ultra 0.6B (test)
Prefill Throughput (min)1.54e+3
12
LLM InferencePhi4 Mini Samsung Galaxy S25 Ultra 3.8B (test)
Prefill Throughput (min, tokens/sec)1.16e+3
10
LLM InferenceQwen3 Google Pixel 9 Pro XL 0.6B (test)
Prefill Throughput (min, tokens/sec)591
10
LLM InferenceLlama 3.2 Google Pixel 9 Pro XL 1B (test)
Prefill Throughput (min) (tokens/sec)530
10
Image Processing InferenceSwin T
Average Latency (ms)3.38
8
LLM InferencePhi4 Mini Google Pixel 9 Pro XL 3.8B (test)
Prefill Min Throughput (tokens/sec)119.6
8
Showing 10 of 10 rows

Other info

Follow for update