Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Prefix-risk ranking on TerminalBench (held-out)
Loading...
0.557
AUPRC
PrefixGuard (GRU)
0.07444
0.19972
0.325
0.45028
May 7, 2026
AUPRC
Updated 26d ago
Evaluation Results
Method
Method
Links
AUPRC
PrefixGuard (GRU)
Input view=StepView, H...
2026.05
0.557
PrefixGuard (Transformer)
Input view=StepView, H...
2026.05
0.555
PrefixGuard (FSM)
Input view=StepView, H...
2026.05
0.447
GRU (Raw-text control)
Input view=Raw text, H...
2026.05
0.37
Transformer (Raw-text control)
Input view=Raw text, H...
2026.05
0.363
FSM (Raw-text control)
Input view=Raw text, H...
2026.05
0.272
PrefixGuard (DFA)
Input view=StepView, H...
2026.05
0.184
DFA (Raw-text control)
Input view=Raw text, H...
2026.05
0.137
GPT-5.4-mini
Input view=Prompt, Hea...
2026.05
0.127
DeepSeek-V4-Pro
Input view=Prompt, Hea...
2026.05
0.107
PPM LSTM
Input view=StepView ac...
2026.05
0.093
Feedback
Search any
task
Search any
task