Researchers say they trained a foundation model from scratch for about $1,500

Training large language models from scratch is prohibitively expensive, often costing millions and requiring vast internet-scale data. Sapient has developed HRM-Text, a more cost-effective approach that uses a Hierarchical Recurrent Model (HRM) instead of standard Transformers. HRM-Text trains exclusively on instruction-response pairs, mirroring real-world enterprise use cases. This method allows for sample-efficient training, enabling the creation of a 1-billion-parameter HRM-Text on a curated dataset at a fraction of the usual cost. The model demonstrates performance competitive with much larger, established open models on key industry benchmarks. This innovation means that foundational pretraining is now accessible to organizations with fewer resources. The core inefficiency in current LLMs is their reliance on brute-force next-token prediction, which wastes compute power on memorizing internet data. Sapient's CEO highlights the economic limitations of current practices, where scaling up models leads to diminishing returns. Fine-tuning existing models often requires substantial general-purpose data, making it computationally intensive and difficult to control. Enterprises with proprietary data need compact reasoning cores rather than massive, general-purpose models. HRM-Text decouples computation into strategic and execution layers, improving efficiency. The architecture ensures stable semantic context and local iterative refinement. Sapient introduced MagicNorm and a warm-up method to stabilize training and prevent gradient issues. The switch from next-token prediction to task completion with instruction-response pairs is a key differentiator. HRM-Text achieved impressive benchmark scores with significantly less training data and compute. This efficiency means businesses can deploy specialized reasoning models that leverage external knowledge stores instead of memorizing vast datasets.

https://venturebeat.com/technology/researchers-say-they-trained-a-foundation-model-from-scratch-for-about-1-500 venturebeat.com

RSS Hunter • Jun 10