| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Attention Operator Throughput | Llama 405B (128 Q-heads/8 KV-heads/128 Head-dimension) 3.1 | TFLOPS615.39 | 62 | |
| End-to-end throughput | LLaMA-2-7B-Chat | Throughput (tokens/sec)449 | 60 | |
| Latency Measurement | LLaMA-8B-Instruct Chunked Prefill 3.1 (inference) | Attention Latency (ms)423.1 | 49 | |
| LLM Decoding | Llama 70B 3.1 | Throughput3,119.55 | 48 | |
| Negative Sentiment | Llama-3-8B n≈200 | ASR77 | 42 | |
| Linear Layer Latency Inference | Llama-3-8B decoder block | Latency (µs)153 | 36 | |
| Large Language Model Inference | LLaMA-2 7B (inference) | P99 Per-Token Latency (ms)8.23 | 33 | |
| Quantization | LLAMA | Processing Time (hr)0.25 | 30 | |
| Time to First Token | Llama 3.1 8B Q4 weights | TTFT (ms)204 | 28 | |
| LLM Decoding | Llama 70B (H100 GPU Cluster) 3.1 | Throughput894.32 | 27 | |
| Fingerprint Similarity | Llama2 7B | Similarity Score1 | 24 | |
| Model Retrieval | Llama-8B model tree (test) | Rank1 | 21 | |
| Decoding | Llama 70B 3.1 (inference) | Throughput1,410.39 | 21 | |
| Jailbreak Attack | Llama 8B 3.1 | NR Rate96 | 20 | |
| Throughput Measurement | LLaMA-2 13B | Throughput (tokens/s)19.4 | 20 | |
| Language Modeling | LLaMA 13B 2 | Perplexity (PPL)4.57 | 20 | |
| Persona Discovery | Llama-3.1-70B Large Target | Similarity Score98 | 18 | |
| Persona Discovery | Llama 8B Small Target 3.1 | Similarity Score0.97 | 18 | |
| Representation Injection Performance | Llama2-7B evaluation scenarios (test) | Accuracy85.16 | 18 | |
| Language Modeling | LLaMA-2-7B | Perplexity5.47 | 18 | |
| Jailbreak Attack Transferability | Llama-3B Target Transferability set | ASR81 | 17 | |
| Jailbreak Attack | Llama 7B 2 | ASR97 | 17 | |
| Persona Discrimination | Llama Cross-generator 3.3-70B | Persona Separability (Δ)0.427 | 16 | |
| Transferable Adversarial Attack | Llama 11B-V 3.2 | Attack Success Rate (ASR)57.3 | 16 | |
| LLM Inference | LLaMA-7B v1 (serving) | Decode Latency (ms/token)12.16 | 16 |