Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CellularSpecSec-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Vulnerability/Inconsistency LabelingCellularSpecSec-Bench Stage 3 (Vulnerability / Inconsistency Labeling)
Binary F193.05
4
Evidence and Explanations CorrectnessCellularSpecSec-Bench Stage 3
E&E Correctness88
2
True/False QA (TFQA)CellularSpecSec-Bench Stage 3
Score=298
2
Cross-Clause QA (CCQA)CellularSpecSec-Bench Stage 3
Score=295.06
2
Evidence-Grounded MCQACellularSpecSec-Bench Stage 2
Accuracy100
2
Evidence-Grounded Abstractive QACellularSpecSec-Bench Stage 2
Score (Stage 2)96.8
2
Evidence-Grounded Extractive QACellularSpecSec-Bench Stage 2
Score 20.968
2
Multiple Choice Question Answering (MCQA)CellularSpecSec-Bench Stage 1
Accuracy100
2
Abstractive QACellularSpecSec-Bench Stage 1
Score 297
2
Extractive QACellularSpecSec-Bench Stage 1
Stage 1 Score (Component 2)97.75
2
Vulnerability/Inconsistency Labeling with Evidence and ExplanationsCellularSpecSec-Bench Stage 3 Evidence and Explanations
Evidence Correctness8,800
2
Vulnerability/Inconsistency Labeling with Evidence and ExplanationsCellularSpecSec-Bench Stage 3 (TFQA)
Score Level 296
2
Vulnerability/Inconsistency Labeling with Evidence and ExplanationsCellularSpecSec-Bench Stage 3 (CCQA)
Score Level 298
2
Showing 13 of 13 rows