VentureBeat
Follow
Huawei's new open source technique shrinks LLMs to make them run on less powerful, less expensive hardware
Huawei's Computing Systems Lab has introduced SINQ, an open-source quantization method for large language models (LLMs). SINQ significantly reduces memory usage by 60-70% without compromising output quality, making LLMs accessible on more affordable hardware. This technique enables models previously requiring high-end enterprise GPUs to run on consumer-grade setups. SINQ is fast, calibration-free, and easy to integrate into existing workflows. The method employs dual-axis scaling and Sinkhorn-Knopp-style normalization to minimize quantization errors. It outperforms other calibration-free techniques across various benchmarks. SINQ supports non-uniform quantization and can be combined with calibration methods for even better performance. The open-source code is available under an Apache 2.0 license on GitHub and Hugging Face. This initiative aims to lower the barrier for LLM deployment by enabling efficient model shrinking.