Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BitRL: Reinforcement Learning with 1-bit Quantized Language Models for Resource-Constrained Edge Deployment

About

The deployment of intelligent reinforcement learning (RL) agents on resource-constrained edge devices remains a fundamental challenge due to the substantial memory, computational, and energy requirements of modern deep learning systems. While large language models (LLMs) have emerged as powerful architectures for decision-making agents, their multi-billion parameter scale confines them to cloud-based deployment, raising concerns about latency, privacy, and connectivity dependence. We introduce BitRL, a framework for building RL agents using 1-bit quantized language models that enables practical on-device learning and inference under severe resource constraints. Leveraging the BitNet b1.58 architecture with ternary weights (-1, 0, +1) and an optimized inference stack, BitRL achieves 10-16x memory reduction and 3-5x energy efficiency improvements over full-precision baselines while maintaining 85-98 percent of task performance across benchmarks. We provide theoretical analysis of quantization as structured parameter perturbation, derive convergence bounds for quantized policy gradients under frozen-backbone architectures, and identify the exploration-stability trade-off in extreme quantization. Our framework systematically integrates 1-bit quantized language models with reinforcement learning for edge deployment and demonstrates effectiveness on commodity hardware.

Md. Ashiq Ul Islam Sajid, Mohammad Sakib Mahmood, Md. Tareq Hasan, Md Abdur Rahim, Rafat Ara, Md. Arafat Hossain• 2026

Related benchmarks

TaskDatasetResultRank
Classic Discrete ControlCartPole v1
Mean Episodic Return476
18
Classic Discrete ControlMountainCar v0
Mean Episodic Return108
18
Classic Discrete ControlAcrobot v1
Mean Episodic Return94
5
Language-Conditioned TasksTextWorld Cooking
Mean Episodic Return0.75
5
Language-Conditioned TasksBabyAI GoToRedBall
Mean Episodic Return0.88
5
Continuous Control (MuJoCo)HalfCheetah v4
Mean Episodic Return4.21e+3
5
Continuous Control (MuJoCo)Hopper v4
Mean Episodic Return2.89e+3
5
Continuous Control (MuJoCo)Walker2d v4
Mean Episodic Return3.62e+3
5
Language-Conditioned TasksSmartHome Light
Mean Episodic Return0.82
5
Edge DeploymentRaspberry Pi 4
Peak Memory (MB)682
4
Showing 10 of 10 rows

Other info

Follow for update