Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference
Johannes Kaiser, Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis
Accurately estimating the informativeness of individual samples in a dataset is an important objective in deep learning, as it can guide sample selection, which can improve model efficiency and accuracy by removing redundant or potentially harmful samples. We propose $\text{\textit{Laplace Sample Information}}$ ($\mathsf{LSI}$) measure of sample informativeness grounded in information theory widely applicable across model architectures and learning settings.$\mathsf{LSI}$ leverages a Bayesian approximation to the weight posterior and the KL divergence to measure the change in the parameter distribution induced by a sample of interest from the dataset.We experimentally show that $\mathsf{LSI}$ is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty.We demonstrate these capabilities of $\mathsf{LSI}$ on image and text data in supervised and unsupervised settings.Moreover, we show that $\mathsf{LSI}$ can be computed efficiently through probes and transfers well to the training of large models.