Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Behavior selection on CAA behaviors
Loading...
4.64
COAIS Score
COLD-FD
0.2304
1.3752
2.52
3.6648
Mar 6, 2026
COAIS Score
Correlation Score
Hallucination Score
MR
Referential Score
Survival Score
SyCo Score
Updated 2mo ago
Evaluation Results
Method
Method
Links
COAIS Score
Correlation Score
Hallucination Score
MR
Referential Score
Survival Score
SyCo Score
COLD-FD
Model=Mistral-7B-v0.1
2026.03
4.64
8.52
8.52
2.88
7.54
7.38
1.47
DiffMean
Model=Mistral-7B-v0.1
2026.03
3
7.76
4.02
2
1.96
7.82
1.26
ReFT(vector)
Model=Mistral-7B-v0.1
2026.03
0.66
6.66
3.92
2.42
1.56
7.76
1.15
Base
Model=Mistral-7B-v0.1
2026.03
0.48
6.08
3.74
2.14
1.1
7.66
1.11
COLD-Kernel
Model=Mistral-7B-v0.1
2026.03
0.4
6.24
3.76
2.38
1.54
7.66
1.06
Feedback
Search any
task
Search any
task