Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations

About

This paper presents Llama Guard 3-1B-INT4, a compact and efficient Llama Guard model, which has been open-sourced to the community during Meta Connect 2024. We demonstrate that Llama Guard 3-1B-INT4 can be deployed on resource-constrained devices, achieving a throughput of at least 30 tokens per second and a time-to-first-token of 2.5 seconds or less on a commodity Android mobile CPU. Notably, our experiments show that Llama Guard 3-1B-INT4 attains comparable or superior safety moderation scores to its larger counterpart, Llama Guard 3-1B, despite being approximately 7 times smaller in size (440MB).

Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel, Zechun Liu, Changsheng Zhao, Yangyang Shi, Tijmen Blankevoort, Mahesh Pasupuleti, Bilge Soran, Zacharie Delpierre Coudert, Rachad Alao, Raghuraman Krishnamoorthi, Vikas Chandra• 2024

Related benchmarks

Task	Dataset	Result
Violation Detection	PolicyGuardBench	Safety F159.52	30
Violation Detection	HarmBench	Safety F167.96	7
Violation Detection	WildGuard (test)	Safety F168.47	7
Harmful Conversation Detection	ProsocialDialog 652-conversation (test)	Precision98	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord