Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Robustness on Robustness
Loading...
87.97
RR
SAC Multi-task
37.478
50.5865
63.695
76.8035
Nov 4, 2024
RR
Updated 4d ago
Evaluation Results
Method
Method
Links
RR
SAC Multi-task
Model=Qwen2-72B (Instr...
2024.11
87.97
SAC Single-task
Model=Qwen2-7B (Instruct)
2024.11
85.89
SAC Single-task
Model=Qwen2-72B (Instr...
2024.11
84.23
SAC Multi-task
Model=Qwen2-7B (Instruct)
2024.11
82.16
SAC Single-task
Model=Llama2-13B (Chat)
2024.11
78.42
SAC Multi-task
Model=Llama2-13B (Chat)
2024.11
75.93
No Control
Model=Qwen2-72B (Instr...
2024.11
58.51
No Control
Model=Qwen2-7B (Instruct)
2024.11
57.68
No Control
Model=Llama2-13B (Chat)
2024.11
39.42
Feedback
Search any
task
Search any
task