Introducing Nested Learning: A... Note

Introducing Nested Learning: A new ML paradigm for continual learning

The last decade has seen significant progress in machine learning but faces challenges with continual learning, unlike the adaptable human brain. Current large language models struggle with catastrophic forgetting, where learning new information erases old knowledge. Traditional solutions treat model architecture and training algorithms separately, hindering unified learning systems. A paper published at NeurIPS 2025 introduces Nested Learning, which unifies architecture and optimization as interconnected, multi-level problems. This paradigm suggests that model architecture and training rules are different optimization levels with distinct information flows and update rates. Nested Learning allows for deeper computational depth in AI, addressing issues like catastrophic forgetting. A proof-of-concept architecture called "Hope" demonstrates superior performance in language modeling and long-context memory management. The Nested Learning perspective reveals that complex ML models are nested optimization problems, enabling a new design dimension for deeper learning components. This approach allows for multi-time-scale updates for each component, enhancing continual learning capabilities. Experiments show that Nested Learning principles lead to more expressive, capable, and efficient learning algorithms.
CdXz5zHNQW_WpDIEoePOg.png