Oracle efficient truncated statistics

Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference

Bibtex Paper

Authors

Konstantinos Karatapanis, Vasilis Kontonis, Christos Tzamos

Abstract

We study the problem of learning from truncated samples: instead of observingsamples from some underlying population $p^\ast$, we observe only the examples that fall in some survival set $S \subset \mathbb{R}^d$ whose probability mass (measured with respect to $p^\ast$) is at least $\alpha$. Assuming membership oracle access to the truncation set $S$, prior works obtained algorithms for the case where $p^\ast$ is Gaussian or more generally an exponential family with strongly convex likelihood --- albeit with a super-polynomial dependency on the (inverse) survival mass $1/\alpha$both in terms of runtime and in number of oracle calls to the set $S$. In this work we design a new learning method with runtime and query complexity polynomial in $1/\alpha$. Our result significantly improves over the prior works by focusing on efficiently solving the underlying optimization problem using a generalpurpose optimization algorithm with minimal assumptions.