VentureBeat
Follow
Researchers find that retraining only small parts of AI models can cut costs and prevent forgetting
Enterprises often face a problem when fine-tuning large language models (LLMs) where the models lose previously learned abilities, a phenomenon known as catastrophic forgetting. Researchers from the University of Illinois Urbana-Champaign have proposed a new method to avoid this, focusing on retraining only narrow parts of the LLM. This approach aims to reduce computational costs and preserve the model's existing knowledge. The team suggests that catastrophic forgetting is not true memory loss but a side effect of bias drift. They investigated this by training two vision-language LLMs, LLaVA and Qwen 2.5-VL, on specific tasks and observing their performance on held-out benchmarks. Surprisingly, they found that tuning only the self-attention projection layers led to learning new tasks without performance drops on existing ones. The research indicates that tuning the multi-layer perceptron (MLP) can cause output bias and temporary forgetting. By selectively tuning specific MLP components while keeping others frozen, they achieved effective learning with minimal forgetting. This narrow retraining method offers a more cost-efficient and controllable way to update LLMs. While the current research is limited to vision-language models, the findings are expected to be applicable to other LLMs across different modalities.