Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference
Taha EL BAKKALI EL KADI, Omar Saadi
The stochastic three points (STP) algorithm is a derivative-free optimization technique designed for unconstrained optimization problems in $\mathbb{R}^d$. In this paper, we analyze this algorithm for three classes of functions: smooth functions that may lack convexity, smooth convex functions, and smooth functions that are strongly convex. Our work provides the first almost sure convergence results of the STP algorithm, alongside some convergence results in expectation.For the class of smooth functions, we establish that the best gradient iterate of the STP algorithm converges almost surely to zero at a rate of $o(1/{T^{\frac{1}{2}-\epsilon}})$ for any $\epsilon\in (0,\frac{1}{2})$, where $T$ is the number of iterations. Furthermore, within the same class of functions, we establish both almost sure convergence and convergence in expectation of the final gradient iterate towards zero.For the class of smooth convex functions, we establish that $f(\theta^T)$ converges to $\inf_{\theta \in \mathbb{R}^d} f(\theta)$ almost surely at a rate of $o(1/{T^{1-\epsilon}})$ for any $\epsilon\in (0,1)$, and in expectation at a rate of $O(\frac{d}{T})$ where $d$ is the dimension of the space.Finally, for the class of smooth functions that are strongly convex, we establish that when step sizes are obtained by approximating the directional derivatives of the function, $f(\theta^T)$ converges to $\inf_{\theta \in \mathbb{R}^d} f(\theta)$ in expectation at a rate of $O((1-\frac{\mu}{dL})^T)$, and almost surely at a rate of $o((1-s\frac{\mu}{dL})^T)$ for any $s\in (0,1)$, where $\mu$ and $L$are the strong convexity and smoothness parameters of the function.