AnoLLM: Large Language Models for Tabular Anomaly Detection

Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference

Bibtex Paper

Authors

Che-Ping Tsai, Ganyu Teng, Phillip Wallis, Wei Ding

Abstract

We introduce AnoLLM, a novel framework that leverages large language models (LLMs) for unsupervised tabular anomaly detection. By converting tabular data into a standardized text format, we further adapt a pre-trained LLM with this serialized data, and assign anomaly scores based on the negative log likelihood generated by the LLM. Unlike traditional methods that can require extensive feature engineering, and often lose textual information during data processing, AnoLLM preserves data integrity and streamlines the preprocessing required for tabular anomaly detection. This approach can effectively handle mixed-type data, especially those containing textual features. Our empirical results indicate that AnoLLM delivers the best performance on six benchmark datasets with mixed feature types. Additionally, across 30 datasets from the ODDS library, which are predominantly numerical, AnoLLM performs on par with top performing baselines.