CryptoTensors: A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution

About

To enhance the performance of large language models (LLMs) in various domain-specific applications, sensitive data such as healthcare, law, and finance are being used to privately customize or fine-tune these models. Such privately adapted LLMs are regarded as either personal privacy assets or corporate intellectual property. Therefore, protecting model weights and maintaining strict confidentiality during deployment and distribution have become critically important. However, existing model formats and deployment frameworks provide little to no built-in support for confidentiality, access control, or secure integration with trusted hardware. Current methods for securing model deployment either rely on computationally expensive cryptographic techniques or tightly controlled private infrastructure. Although these approaches can be effective in specific scenarios, they are difficult and costly for widespread deployment. In this paper, we introduce CryptoTensors, a secure and format-compatible file structure for confidential LLM distribution. Built as an extension to the widely adopted Safetensors format, CryptoTensors incorporates tensor-level encryption and embedded access control policies, while preserving critical features such as lazy loading and partial deserialization. It enables transparent decryption and automated key management, supporting flexible licensing and secure model execution with minimal overhead. We implement a proof-of-concept library, benchmark its performance across serialization and runtime scenarios, and validate its compatibility with existing inference frameworks, including Hugging Face Transformers and vLLM. Our results highlight CryptoTensors as a light-weight, efficient, and developer-friendly solution for safeguarding LLM weights in real-world and widespread deployments.

Huifeng Zhu, Shijie Li, Qinfeng Li, Yier Jin• 2025

Related benchmarks

Task	Dataset	Result
LLM Inference	Transformers (PyTorch) workflow Qwen3 family (inference)	Model Load Time (s)1.16	18
vLLM Model Deployment and Inference	Qwen3-0.6B vLLM inference	Model Load Time (s)0.34	3
vLLM Model Deployment and Inference	Qwen3-1.7B vLLM (inference)	Model Load Time (s)0.594	3
vLLM Model Deployment and Inference	Qwen3-4B vLLM (inference)	Model Load Time (s)1.104	3
vLLM Model Deployment and Inference	Qwen3-8B vLLM inference	Model Load Time (s)1.806	3
vLLM Model Deployment and Inference	Qwen3-14B vLLM (inference)	Model Load Time (s)3.165	3
vLLM Model Deployment and Inference	Qwen3-32B inference vLLM	Model Load Time (s)7.668	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord