Share your thoughts, 1 month free Claude Pro on usSee more

Language Model Inference on Mobile and IoT Device Latency Benchmark llama.cpp (inference evaluation)

4,215Throughput (Sample, tokens/s)

TinyLLaMA 1.1B

Updated 4mo ago

Evaluation Results

Method	Links
TinyLLaMA 1.1B 2023.12		4,215	39.49	19.75	20.83
MobileLLaMA 2.7B 2023.12		3,932	18.1	14.71	28.3
MobileLLaMA 2.7B 2023.12		3,919	17.59	9.14	44.85
TinyLLaMA 1.1B 2023.12		3,887	44.17	31.54	13.22
MobileLLaMA 1.4B 2023.12		3,870	36.2	28.32	14.76
MobileLLaMA 1.4B 2023.12		3,846	35.46	17.93	22.81
TinyLLaMA 1.1B 2023.12		3,801	306.76	78.83	5.38
MobileLLaMA 1.4B 2023.12		3,738	253.22	66.79	6.33
OpenLLaMA 3B 2023.12		3,604	8.97	7.14	58.04
OpenLLaMA 3B 2023.12		3,340	143.25	32.16	12.83
MobileLLaMA 1.4B 2023.12		3,289	249.56	60.73	6.96
OpenLLaMA 3B 2023.12		3,093	7.32	6.58	63.33
MobileLLaMA 2.7B 2023.12		3,040	133.41	33.28	12.46
TinyLLaMA 1.1B 2023.12		3,007	279.61	72.3	5.89
MobileLLaMA 2.7B 2023.12		2,647	130.97	38.99	10.74
OpenLLaMA 3B 2023.12		2,382	80.34	29.97	13.94