| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Vulnerability/Inconsistency Labeling | CellularSpecSec-Bench Stage 3 (Vulnerability / Inconsistency Labeling) | Binary F193.05 | 4 | |
| Evidence and Explanations Correctness | CellularSpecSec-Bench Stage 3 | E&E Correctness88 | 2 | |
| True/False QA (TFQA) | CellularSpecSec-Bench Stage 3 | Score=298 | 2 | |
| Cross-Clause QA (CCQA) | CellularSpecSec-Bench Stage 3 | Score=295.06 | 2 | |
| Evidence-Grounded MCQA | CellularSpecSec-Bench Stage 2 | Accuracy100 | 2 | |
| Evidence-Grounded Abstractive QA | CellularSpecSec-Bench Stage 2 | Score (Stage 2)96.8 | 2 | |
| Evidence-Grounded Extractive QA | CellularSpecSec-Bench Stage 2 | Score 20.968 | 2 | |
| Multiple Choice Question Answering (MCQA) | CellularSpecSec-Bench Stage 1 | Accuracy100 | 2 | |
| Abstractive QA | CellularSpecSec-Bench Stage 1 | Score 297 | 2 | |
| Extractive QA | CellularSpecSec-Bench Stage 1 | Stage 1 Score (Component 2)97.75 | 2 | |
| Vulnerability/Inconsistency Labeling with Evidence and Explanations | CellularSpecSec-Bench Stage 3 Evidence and Explanations | Evidence Correctness8,800 | 2 | |
| Vulnerability/Inconsistency Labeling with Evidence and Explanations | CellularSpecSec-Bench Stage 3 (TFQA) | Score Level 296 | 2 | |
| Vulnerability/Inconsistency Labeling with Evidence and Explanations | CellularSpecSec-Bench Stage 3 (CCQA) | Score Level 298 | 2 |