Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference
Arthur Jacot, Seok Hoan Choi, Yuxiao Wen
We show that deep neural networks (DNNs) can efficiently learn anycomposition of functions with bounded $F_{1}$-norm, which allowsDNNs to break the curse of dimensionality in ways that shallow networkscannot. More specifically, we derive a generalization bound that combinesa covering number argument for compositionality, and the $F_{1}$-norm(or the related Barron norm) for large width adaptivity. We show thatthe global minimizer of the regularized loss of DNNs can fit for examplethe composition of two functions $f^{\*}=h\circ g$ from a small numberof observations, assuming $g$ is smooth/regular and reduces the dimensionality(e.g. $g$ could be the quotient map of the symmetries of $f^{*}$),so that $h$ can be learned in spite of its low regularity. The measuresof regularity we consider is the Sobolev norm with different levelsof differentiability, which is well adapted to the $F_{1}$ norm.We compute scaling laws empirically and observe phase transitionsdepending on whether $g$ or $h$ is harder to learn, as predictedby our theory.