Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference
Julian Minder, Kevin Du, Niklas Stoehr, Giovanni Monea, Chris Wendler, Robert West, Ryan Cotterell
When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge.Choosing how sensitive the model is to its context is a fundamental functionality, as it enables the model to excel at tasks like retrieval-augmented generation and question-answering. In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge.To guide this search, we design a task for controllable context sensitivity. In this task, we first feed the model a context ("Paris is in England") and a question ("Where is Paris?"); we then instruct the model to either use its prior or contextual knowledge and evaluate whether it generates the correct answer for both intents (either "France" or "England").When fine-tuned on this task, instruct versions of Llama-3.1, Mistral-v0.3, and Gemma-2 can solve it with high accuracy (85-95%). Analyzing these high-performing models, we narrow down which layers may be important to context sensitivity using a novel linear time algorithm. Then, in each model, we identify a 1-D subspace in a single layer that encodes whether the model follows context or prior knowledge.Interestingly, while we identify this subspace in a fine-tuned model, we find that the exact same subspace serves as an effective knob in not only that model but also non-fine-tuned instruct and base models of that model family.Finally, we show a strong correlation between a model's performance and how distinctly it separates context-agreeing from context-ignoring answers in this subspace.These results suggest a single fundamental subspace facilitates how the model chooses between context and prior knowledge.