Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Logical Reasoning on BBH 3-shot chain-of-thought
Loading...
61.35
EM
VAR
10.8268
23.9434
37.06
50.1766
Feb 16, 2025
EM
Updated 26d ago
Evaluation Results
Method
Method
Links
EM
VAR
Base Model=Qwen2.5-7B
2025.02
61.35
ALoL
Base Model=Qwen2.5-7B
2025.02
61.14
DPO
Base Model=Qwen2.5-7B
2025.02
53.06
DPO
Base Model=Llama2-7B
2025.02
39.1
ALoL
Base Model=Llama2-7B
2025.02
38.16
VAR
Base Model=Llama2-7B
2025.02
37.56
Base
Base Model=Qwen2.5-7B
2025.02
29.3
Base
Base Model=Llama2-7B
2025.02
12.77
Feedback
Search any
task
Search any
task