AI & ML News
Follow
Announcing PyTorch/XLA 2.4: A better Pallas and developer experience, plus “eager mode”
PyTorch/XLA 2.4 offers significant enhancements for deep learning on TPUs and GPUs. This release introduces improvements to Pallas, a custom kernel language, which now supports both TPUs and GPUs, enhancing performance with Python code. New API calls, like torch_xla.sync(), simplify integration into existing PyTorch workflows. An experimental eager mode allows immediate execution of operations on target hardware, though TPUs require a "mark_step" call for emulation. Enhancements to Pallas include Flash Attention and Paged Attention support, and built-in Megablocks' block sparse kernels for group matrix multiplication. Additionally, a new TPU command line interface, tpu-info, facilitates debugging by displaying utilization and device information, akin to Nvidia's nvidia-smi tool. Despite these changes, existing code remains compatible, making the upgrade seamless for developers. These updates collectively aim to improve usability and performance in PyTorch/XLA.