Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
First-Order Logic translation on FOLIO (test)
Loading...
66
BLEU
Qwen3-1.7B-SGRPO
35.84
43.67
51.5
59.33
Dec 16, 2025
BLEU
LE
LE*
Updated 4d ago
Evaluation Results
Method
Method
Links
BLEU
LE
LE*
Qwen3-1.7B-SGRPO
Training=SGRPO
2025.12
66
-
87.4
Qwen3-1.7B-SFT
Training=Supervised Fi...
2025.12
61.2
-
85
ChatGPT-4o
2025.12
38.4
82.6
80.9
LogicLLaMA-13B
configuration=RLHF Corre.
2025.12
38.4
85.8
-
LogicLLaMA-7B
configuration=RLHF Corre.
2025.12
37.8
84.1
-
DeepSeek-V3
2025.12
37.6
83
79.2
ChatGPT-3.5
2025.12
37
80.2
77.6
Feedback
Search any
task
Search any
task