EdgeRunner now available for DoW users at no cost. Click here to get started for free.

Can Editing 1 Neuron Fix Repetition Loops in LLMs?

Read our new paper here: https://arxiv.org/abs/2606.13705

Download the models here: E2B, E4B, 26A4B, 31B

The inner workings of large language models (LLMs) are notoriously difficult to interpret because of their connectedness. Along one dimension, each token in a sequence can influence the others through the self-attention mechanism, and similarly all elements of each embedding are connected across the multi-layer perceptron in each layer. But what if a specific behavior could be localized to a small number of neurons, and, if that behavior is undesirable, could it be turned off? Such was the underlying motivation of our recent work to remove repetition loops, an error mode, from LLMs.

We focused on the repetition loop error mode because it has a real world impact. Sometimes token costs can rise not only due to heavy reasoning or extended-horizon function-calling, but simply due to the model falling into an underlying failure mode, such as repetition loops. In this case, the model repeats the same word or phrase over and over until its token budget has been exhausted, without ever making progress on solving the problem at hand. The screenshot below shows an example one of our scientists observed while using Claude Sonnet in his IDE, and the Related Work section of our paper provides a sample of literature documenting that repetition loops are a widespread phenomenon. Even in the case of local models like EdgeRunner’s, in which the user is not charged by the token, repetition loops still inhibit the system from arriving at a final answer.

In our paper we identify that repetition loops are present in Google’s latest open weight models, namely the Gemma 4 model family, and we investigate the effectiveness of a simple measure for addressing these repetition loops: weight surgery. Our findings show that editing even a single neuron can mitigate this failure mode, as in the case of the Gemma 4 E2B model, while editing a handful of neurons can have similar effects on the larger models of the family. We explore different editing operations: stripping, amplifying, suppressing or sign-inversion of neurons, and in the case of the Gemma 4 26B-A4B, expert masking. In order to identify the components (i.e. layers/neurons/experts) that play the most important role in loop formation and mitigation without regression in a model’s capabilities, we perform per-layer ablations and per-neuron attribution sweeps.

Our findings become even more interesting when we study the behavior of the edited models. In the cases of the larger models (Gemma 4 26B-A4B and 31B), even though our proposed modifications allow the models to stay out of repetitions, they enter the state of ‘Doom Looping’ instead, i.e. a non-convergent regime in which a model self-corrects in circles over a fact that it cannot recall, exhausting the token budget without committing to a final answer. We argue that doom looping is fundamentally a knowledge-precision problem, potentially formed during pruning or distillation phases, and while weight surgery can delete a loop, it cannot supply a missing fact.

Despite LLM architectural connectedness, our work shows that certain LLM behaviors can be localized to a surprisingly granular level. Though model surgery may not always be the right technique, we believe that more detailed model interpretability and architectural analysis can only improve modelers’ ability to design better aligned LLMs of the future.

Aris Lazaridis is a Research Engineer at EdgeRunner AI. He holds a PhD in Reinforcement Learning from Aristotle University of Thessaloniki (AUTH). His expertise spans Reinforcement Learning methods and their applications in complex real-world domains, as well as the intersection of RL and Large Language Models across both research and industry.

Contact us: research@edgerunnerai.com