Halton Scheduler for Masked Generative Image Transformer

Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference

Bibtex Paper Supplemental

Authors

Victor Besnier, Mickael Chen, David Hurych, Eduardo Valle, MATTHIEU CORD

Abstract

Masked Generative Image Transformers (MaskGIT) have emerged as a scalableand efficient image generation framework, able to deliver high-quality visuals withlow inference costs. However, MaskGIT’s token unmasking scheduler, an essentialcomponent of the framework, has not received the attention it deserves. We analyzethe sampling objective in MaskGIT, based on the mutual information betweentokens, and elucidate its shortcomings. We then propose a new sampling strategybased on our Halton scheduler instead of the original Confidence scheduler. Moreprecisely, our method selects the token’s position according to a quasi-random,low-discrepancy Halton sequence. Intuitively, that method spreads the tokensspatially, progressively covering the image uniformly at each step. Our analysisshows that it allows reducing non-recoverable sampling errors, leading to simplerhyper-parameters tuning and better quality images. Our scheduler does not requireretraining or noise injection and may serve as a simple drop-in replacement forthe original sampling strategy. Evaluation of both class-to-image synthesis onImageNet and text-to-image generation on the COCO dataset demonstrates that theHalton scheduler outperforms the Confidence scheduler quantitatively by reducingthe FID and qualitatively by generating more diverse and more detailed images.Our code is at https://github.com/valeoai/Halton-MaskGIT.