| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Watermark Detection | Llama-2-7b-chat-hf 10 samples UMD watermarking (test) | AUROC (t=0)1 | 64 | |
| Attention Operator Latency | LLaMA-2 Chat 7B | Attention Latency (ms)0.075 | 60 | |
| Safety Evaluation | LLaMA-2-7B-CHAT Safety (test) | Safety Score0.55 | 60 | |
| Jailbreak Attack Transferability | Llama-2-7b-chat finetuned variants v1 (test) | Transfer Success Rate (TSR)60.4 | 16 | |
| Watermark Attack Success Rate | Llama-2-7b-chat-hf UMD watermarking (10 samples) | ASR100 | 15 | |
| LLM Quantization | Llama-2-70B | GPU Hours (h)2.2 | 13 | |
| LLM Inference Verification | Llama-2 7B | Verification Latency (s)0.17 | 12 | |
| Training Stability Analysis | Llama-2 7B pre-training | Number of Spikes0 | 9 | |
| Attribute Steering | Llama-2-7b-Chat-hf Open-Ended Generation | Wealth Score2.46 | 7 | |
| Decoding Latency | Llama-2-7B 32k sequence length v1 (inference) | Decoding Latency (s)0.062 | 6 | |
| Decoding Latency | Llama-2-7B 16k sequence length v1 (inference) | Decoding Latency (s)0.041 | 6 | |
| Decoding Throughput | Llama 2 7B inference v1.0 | Decoding Throughput (TOK/s)188 | 6 | |
| Model Fingerprinting | LLaMA-2 7B fine-tuned variants | U-test p-value0 | 5 | |
| LLM Inference | LLaMA-2 70B sequence length 2048 | Max Batch Size384 | 5 | |
| Decoding Latency | Llama-2-7B 64k sequence length v1 (inference) | Decoding Latency (s)0.098 | 5 | |
| Decoding Throughput | Llama 2 70B v1.0 (inference) | Throughput (TOK/s)23.5 | 5 | |
| Training Stability Analysis | Llama-2 1.3B (train) | Number of Spikes0 | 4 | |
| Computational Efficiency | LLaMA-2-7B (test) | Total Time1,495 | 4 | |
| LLM Serving | LLaMA-2 70B chatbot workload | TTFT (ms)45.2 | 4 | |
| LLM Decoding | Llama-2-70B | Per-step Decoding Latency0.2163 | 4 | |
| Machine Translation | Llama-2-13B-chat Seen Languages (tl2en) | BLEU32.6 | 3 | |
| Machine Translation | Llama-2-13B-chat Seen Languages (en2tl) | BLEU18.3 | 3 | |
| Machine Translation | Llama-2-13B-chat Seen Languages (sq2en) | BLEU17.5 | 3 | |
| Machine Translation | Llama-2-13B-chat Seen Languages (en2sq) | BLEU9.7 | 3 | |
| Machine Translation | Llama-2-13B-chat Seen Languages sk2en | BLEU35.4 | 3 |