Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Compositional Reasoning Dataset
Loading...
43.3
Correction Score (C)
CREME
-0.7856
10.6597
22.105
33.5503
Feb 22, 2024
Correction Score (C)
Paraphrasing Score (P)
Generalization Score (G)
Specificity Score (S)
Updated 4d ago
Evaluation Results
Method
Method
Links
Correction Score (C)
Paraphrasing Score (P)
Generalization Score (G)
Specificity Score (S)
CREME
Backbone=OpenAlpaca-3B
2024.02
43.3
23.71
3.61
1.24
CREME
Backbone=LLAMA-2-7B
2024.02
17
7.99
1.27
0.86
OpenAlpaca-3B
Model Type=Base Model
2024.02
7.2
7
13.5
0.6
LLAMA-2-7B
Model Type=Base Model
2024.02
3.2
2.3
13.1
0.3
Memory Injection
Backbone=LLAMA-2-7B
2024.02
2.21
0.3
0.32
26.72
CoT-PatchScopes
Backbone=LLAMA-2-7B
2024.02
1.2
-
-
-
Memory Injection
Backbone=OpenAlpaca-3B
2024.02
0.98
0.45
0.75
2.93
CoT-PatchScopes
Backbone=OpenAlpaca-3B
2024.02
0.91
-
-
-
Feedback
Search any
task
Search any
task