Unlocking the Power of Representations in Long-term Novelty-based Exploration

Saade, Alaa; Kapturowski, Steven; Calandriello, Daniele; Blundell, Charles; Sprechmann, Pablo; Sarra, Leopoldo; Groth, Oliver; Valko, Michal; Piot, Bilal

Unlocking the Power of Representations in Long-term Novelty-based Exploration

Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot

International Conference on Learning Representations 2024 (ICLR 2024) Conference

Bibtex Paper

Abstract

We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with \DETOCS achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!"

Abstract

Name Change Policy