Plastic Learning with Deep Fourier Features

Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference

Bibtex Paper

Authors

Alex Lewandowski, Dale Schuurmans, Marlos C. Machado

Abstract

Deep neural networks can struggle to learn continually in the face of non-stationarity, a phenomenon known as loss of plasticity. In this paper, we identify underlying principles that lead to plastic algorithms. We provide theoretical results showing that linear function approximation, as well as a special case of deep linear networks, do not suffer from loss of plasticity. We then propose deep Fourier features, which are the concatenation of a sine and cosine in every layer, and we show that this combination provides a dynamic balance between the trainability obtained through linearity and the effectiveness obtained through the nonlinearity of neural networks. Deep networks composed entirely of deep Fourier features are highly trainable and sustain their trainability over the course of learning. Our empirical results show that continual learning performance can be improved by replacing ReLU activations with deep Fourier features combined with regularization. These results hold for different continual learning scenarios (e.g., label noise, class incremental learning, pixel permutations) on all major supervised learning datasets used for continual learning research, such as CIFAR10, CIFAR100, and tiny-ImageNet.