RSS VentureBeat

Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base leverages Power Retention technique

In 2017, the transformer architecture revolutionized AI, becoming the foundation for large language models. However, its attention mechanism's quadratic computational cost limits its scalability for long contexts. In 2025, Manifest AI introduced Brumby-14B-Base, a model that replaces attention with Power Retention, a recurrent and hardware-efficient mechanism. Brumby achieves comparable performance to transformers like Qwen3-14B and GLM-4.5-Air with a minimal training cost of $4,000. Power Retention uses a recurrent state update, maintaining a memory matrix that compresses past information, resulting in constant-time per-token computation unlike the transformers. Brumby's efficiency stems from retraining an existing transformer model, retaining prior knowledge and adapting it to the new architecture. Benchmarks show Brumby matching or exceeding transformer performance in reasoning tasks, particularly those involving long contexts. Power Retention's local matrix operations yield significant hardware efficiency and potential speedups in inference. Manifest AI aims to democratize AI development by enabling cost-effective retraining of large models, facilitating easier adaptation of transformer models. Manifest AI's ultimate goal is to model intelligent processes, moving beyond simply modeling artifacts of intelligence with their new model architecture.
favicon
bsky.app
AI and ML News on Bluesky @ai-news.at.thenote.app
favicon
venturebeat.com
venturebeat.com
Create attached notes ...