| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| End-to-end throughput | LLaMA-2-7B-Chat | Throughput (tokens/sec)449 | 60 | |
| LLM Decoding | Llama 70B 3.1 | Throughput3,119.55 | 48 | |
| Linear Layer Latency Inference | Llama-3-8B decoder block | Latency (µs)153 | 36 | |
| Quantization | LLAMA | Processing Time (hr)0.25 | 30 | |
| Time to First Token | Llama 3.1 8B Q4 weights | TTFT (ms)204 | 28 | |
| Attention Operator Throughput | Llama 405B (128 Q-heads/8 KV-heads/128 Head-dimension) 3.1 | TFLOPS225.3 | 28 | |
| LLM Decoding | Llama 70B (H100 GPU Cluster) 3.1 | Throughput894.32 | 27 | |
| Fingerprint Similarity | Llama2 7B | Similarity Score1 | 24 | |
| Model Retrieval | Llama-8B model tree (test) | Rank1 | 21 | |
| Decoding | Llama 70B 3.1 (inference) | Throughput1,410.39 | 21 | |
| Jailbreak Attack | Llama 8B 3.1 | NR Rate96 | 20 | |
| Throughput Measurement | LLaMA-2 13B | Throughput (tokens/s)19.4 | 20 | |
| Language Modeling | LLaMA 13B 2 | Perplexity (PPL)4.57 | 20 | |
| Representation Injection Performance | Llama2-7B evaluation scenarios (test) | Accuracy85.16 | 18 | |
| Language Modeling | LLaMA-2-7B | Perplexity5.47 | 18 | |
| Post-training Performance Evaluation | Llama 3.2-3B (val) | Max Ppost70.1 | 15 | |
| OOD Detection | LLaMa 1 (test) | AUROC0.924 | 15 | |
| Hallucination Detection | LLaMa 1 (test) | AUROC0.894 | 15 | |
| Quantization | LLAMA v1 (train) | Processing Time (hr)0.25 | 15 | |
| Constrained LLM Decoding | Llama-3-8B | Inference Time (ms)11.77 | 14 | |
| Watermark Detection | Llama-3-8B-Instruct 150 tokens (generations) | Mean P9 | 13 | |
| Watermark Detection | Llama-3 8B Instruct 30 tokens (generations) | Mean Precision23 | 13 | |
| Knowledge Editing | LLaMA-3 | Average Runtime8.1 | 12 | |
| Latency Measurement | LLaMA linear layers 13B (inference) | Latency (ms)0.0499 | 12 | |
| LLM Generation | Llama-2-7B | Latency (LAN)22.1 | 12 |