Compander-Aligned Query Geometry for Quantized Zeroth-Order Optimization
About
Low-bit forward evaluation is an attractive route to memory-efficient zeroth-order (ZO) adaptation: the optimizer needs only scalar losses, and the model can be queried near deployment precision. The obstacle is that a quantized ZO query is not a continuous finite difference followed by harmless storage rounding. The query chooses endpoints, the low-precision engine rounds them, and the loss difference is measured along the rounded chord. For nonuniform companding quantizers, this makes the codebook insufficient to predict ZO behavior: a fixed weight-space radius can collapse in dense cells, over-span sparse cells, or assign a rounded chord to an unrounded update direction. We identify the missing object as query geometry and model scalar nonuniform quantization as $Q = \phi^{-1} \circ U \circ \phi$. CAQ-ZO (Compander-Aligned Queries for Zeroth-Order Optimization) forms one-grid-step Rademacher stencils $z \pm \Delta r$ in $z = \phi(x)$, maps endpoints back through $\phi^{-1}$, and updates in $z$. Our theory proves the grid-span mismatch, decomposes endpoint-rounding estimator residuals, and gives stationarity bounds in which generic off-grid queries retain a $\Delta^2/\mu^2$ residual channel while CAQ-ZO makes the query-time residual exactly zero. Synthetic experiments isolate this channel, and matched NF4 Qwen/Llama fine-tuning shows that CAQ-ZO improves the trained NF4 baseline under the same quantizer and evaluation budget.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Classification | BoolQ | Accuracy67.2 | 118 | |
| Text Classification | RTE | Accuracy61.5 | 104 | |
| Classification | SST2 | Accuracy76.5 | 102 | |
| Classification | CB | Accuracy60.5 | 70 | |
| Generation | SQuAD | F1 Score58.6 | 52 | |
| Multiple-Choice | COPA | Accuracy78.4 | 36 |