| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| End-to-end throughput | LLaMA-2-7B-Chat | Throughput (tokens/sec)449 | 60 | |
| LLM Decoding | Llama 70B 3.1 | Throughput3,119.55 | 48 | |
| Linear Layer Latency Inference | Llama-3-8B decoder block | Latency (µs)153 | 36 | |
| Quantization | LLAMA | Processing Time (hr)0.25 | 30 | |
| Attention Operator Throughput | Llama 405B (128 Q-heads/8 KV-heads/128 Head-dimension) 3.1 | TFLOPS225.3 | 28 | |
| LLM Decoding | Llama 70B (H100 GPU Cluster) 3.1 | Throughput894.32 | 27 | |
| Model Retrieval | Llama-8B model tree (test) | Rank1 | 21 | |
| Decoding | Llama 70B 3.1 (inference) | Throughput1,410.39 | 21 | |
| Throughput Measurement | LLaMA-2 13B | Throughput (tokens/s)19.4 | 20 | |
| Language Modeling | LLaMA 13B 2 | Perplexity (PPL)4.57 | 20 | |
| Fingerprint Similarity | Llama2 7B | Similarity Score0.995 | 17 | |
| Post-training Performance Evaluation | Llama 3.2-3B (val) | Max Ppost70.1 | 15 | |
| OOD Detection | LLaMa 1 (test) | AUROC0.924 | 15 | |
| Hallucination Detection | LLaMa 1 (test) | AUROC0.894 | 15 | |
| Quantization | LLAMA v1 (train) | Processing Time (hr)0.25 | 15 | |
| Constrained LLM Decoding | Llama-3-8B | Inference Time (ms)11.77 | 14 | |
| Watermark Detection | Llama-3-8B-Instruct 150 tokens (generations) | Mean P9 | 13 | |
| Watermark Detection | Llama-3 8B Instruct 30 tokens (generations) | Mean Precision23 | 13 | |
| LLM Generation | Llama-2-7B | Latency (LAN)22.1 | 12 | |
| LLM Training Optimization | Llama 3.2 3B | Training Time Reduction (%)12.3 | 12 | |
| Language Modeling | LLaMA Pretraining | Final Loss2.74 | 12 | |
| Large Language Model Training Efficiency | Llama 1.7B 3.2 | Energy Reduction (Iso-Time)28.3 | 11 | |
| Language Modeling | LLaMA-2-7B | Perplexity5.47 | 11 | |
| Language Modeling | LLaMA-350M pre-training (val) | Validation Loss2.707 | 10 | |
| INT2 Quantization | LLaMA-30B | Memory Footprint (GB)18.59 | 10 |