Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning How to Cube

About

Despite the effectiveness of Cube-and-Conquer (C&C) for solving challenging Boolean Satisfiability (SAT) problems, no prior work has shown that transformer-based models can learn effective cubing heuristics. We introduce a neuro-symbolic post-training framework for this task. We design an MCTS-based data curation pipeline that uses symbolic heuristics to explore splitting decisions over SAT competition formulas, producing preference data grounded in solver statistics and augmented with reasoning traces from a teacher model. Our two-stage post-training, supervised fine-tuning (SFT) followed by direct preference optimization (DPO), enables a 4B-parameter model to achieve a pass@5 score of 53 on 100 SAT competition benchmarks, surpassing frontier LLMs such as Claude-Sonnet-4 (50) and matching the best symbolic heuristic (53). Ablations show that SFT alone improves pass@5 from 46 to 51, with DPO adding 2 additional benchmarks; an entropy/agreement ablation on realized first-cube decisions further shows that SFT, not DPO, accounts for the root-level decision diversity that produces complementary per-run coverage over deterministic symbolic methods. This demonstrates that transformers can be trained to make effective cubing decisions in a domain traditionally dominated by symbolic methods.

Ferhat Erata, Sam Kouteili, Thanos Typaldos, Timos Antonopoulos, Robert B. Jones, Byron Cook, Ruzica Piskac• 2026

Related benchmarks

TaskDatasetResultRank
SAT solving100 SAT competition benchmarks 50 SAT and 50 UNSAT (test)
Pass@149
11
Showing 1 of 1 rows

Other info

Follow for update