Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Adversarial Factuality
Loading...
99.04
CR
SAC Single-task
69.9616
77.5108
85.06
92.6092
Nov 4, 2024
CR
Updated 4d ago
Evaluation Results
Method
Method
Links
CR
SAC Single-task
Backbone=Qwen2-72B-Ins...
2024.11
99.04
SAC Multi-task
Backbone=Qwen2-72B-Ins...
2024.11
98.56
No Control
Backbone=Qwen2-72B-Ins...
2024.11
98.08
SAC Multi-task
Backbone=Qwen2-7B-Inst...
2024.11
96.65
SAC Single-task
Backbone=Qwen2-7B-Inst...
2024.11
95.22
RepE
Control Dim=Single
2024.11
90.43
SAC
Control Dim=Single
2024.11
89.47
SAC
Control Dim=Multiple
2024.11
86.12
No Control
Control Dim=Single
2024.11
76.56
No Control
Control Dim=Multiple
2024.11
76.56
No Control
Backbone=Qwen2-7B-Inst...
2024.11
76.08
RepE-Mean
Control Dim=Multiple
2024.11
72.59
RepE-Merge
Control Dim=Multiple
2024.11
71.08
Feedback
Search any
task
Search any
task