Huawei's new open source techn... Note
VentureBeat

Huawei's new open source technique shrinks LLMs to make them run on less powerful, less expensive hardware

Huawei's Computing Systems Lab has introduced SINQ, an open-source quantization method for large language models (LLMs). SINQ significantly reduces memory usage by 60-70% without compromising output quality, making LLMs accessible on more affordable hardware. This technique enables models previously requiring high-end enterprise GPUs to run on consumer-grade setups. SINQ is fast, calibration-free, and easy to integrate into existing workflows. The method employs dual-axis scaling and Sinkhorn-Knopp-style normalization to minimize quantization errors. It outperforms other calibration-free techniques across various benchmarks. SINQ supports non-uniform quantization and can be combined with calibration methods for even better performance. The open-source code is available under an Apache 2.0 license on GitHub and Hugging Face. This initiative aims to lower the barrier for LLM deployment by enabling efficient model shrinking.
CdXz5zHNQW_dY7aVaURSP.png