Adaptive Lipschitz-Free Conditional Gradient Methods for Stochastic Composite Nonconvex Optimization
About
We propose ALFCG (Adaptive Lipschitz-Free Conditional Gradient), the first \textit{adaptive} projection-free framework for stochastic composite nonconvex minimization that \textit{requires neither global smoothness constants nor line search}. Unlike prior conditional gradient methods that use openloop diminishing stepsizes, conservative Lipschitz constants, or costly backtracking, ALFCG maintains a self-normalized accumulator of historical iterate differences to estimate local smoothness and minimize a quadratic surrogate model at each step. This retains the simplicity of Frank-Wolfe while adapting to unknown geometry. We study three variants. ALFCG-FS addresses finite-sum problems with a SPIDER estimator. ALFCG-MVR1 and ALFCG-MVR2 handle stochastic expectation problems by using momentum-based variance reduction with single-batch and two-batch updates, and operate under average and individual smoothness, respectively. To reach an $\epsilon$-stationary point, ALFCG-FS attains $\mathcal{O}(N+\sqrt{N}\epsilon^{-2})$ iteration complexity, while ALFCG-MVR1 and ALFCG-MVR2 achieve $\tilde{\mathcal{O}}(\sigma^2\epsilon^{-4}+\epsilon^{-2})$ and $\tilde{\mathcal{O}}(\sigma\epsilon^{-3}+\epsilon^{-2})$, where $N$ is the number of components and $\sigma$ is the noise level. In contrast to typical $\mathcal{O}(\epsilon^{-4})$ or $\mathcal{O}(\epsilon^{-3})$ rates, our bounds reduce to the optimal rate up to logarithmic factors $\tilde{\mathcal{O}}(\epsilon^{-2})$ as the noise level $\sigma \to 0$. Extensive experiments on multiclass classification over nuclear norm balls and $\ell_p$ balls show that ALFCG generally outperforms state-of-the-art conditional gradient baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Optimization | Deterministic Setting | Complexity (Big O Notation)-2 | 7 | |
| Optimization | Finite-Sum Setting | Complexity Bound-2 | 6 | |
| Optimization | Expectation Setting | Complexity (Big O)-3 | 6 |