Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Cross-Input Privacy Leakage on tool_mock (args_exfil, llm)
Loading...
5
RN
MiniMax-M2.5
4.75
4.875
5
5.125
Mar 24, 2026
RN
EN
EE
CER
AER
ExecErr
Updated 25d ago
Evaluation Results
Method
Method
Links
RN
EN
EE
CER
AER
ExecErr
MiniMax-M2.5
Provider=MiniMax-M2.5
2026.03
5
4.2
7
26
28
0
qwen3.5-plus
Provider=qwen3.5-plus
2026.03
5
4.2
7
14
14
0
DeepSeek
Provider=DeepSeek
2026.03
5
5
8.33
100
100
0
Feedback
Search any
task
Search any
task