PyTorch 2.0 has finally arrived, and it is now generally available for developers to use. PyTorch is an open-source machine learning (ML) training approach that has grown to become one of the field’s most extensively used tools. The project was originally developed by Facebook (now Meta), and PyTorch 1.0 was released in 2018. Since then, the project has undergone several incremental improvements.
In September 2022, the PyTorch Foundation was established to promote open governance and encourage collaboration and contributions. This effort led to the beta preview of PyTorch 2.0 in December 2022.
The main focus of PyTorch 2.0 is performance, and it includes several new features to improve the speed and efficiency of ML training. Accelerated Transformers, crucial for large-scale language models and generative AI, are one of the most significant new features. This feature allows models to connect different concepts and has been integrated into the torch/compile API.
PyTorch 2.0 uses the torch/compile API as its main interface, which wraps and returns a compiled model. TorchInductor is used with NVIDIA and AMD GPUs and relies on the OpenAI Triton deep learning compiler to generate performant code and hide low-level hardware details. Kernels generated by OpenAI Triton achieve performance comparable to hand-written kernels and specialized cuda libraries such as cuBLAS.
PyTorch 2.0 also includes several new features and optimizations, including significant optimizations in GNN inference and training for CPU acceleration, Metal Performance Shaders (MPS) for GPU-accelerated PyTorch training on the Mac platform, and Amazon AWS’s optimization of PyTorch CPU inference on Graviton3-based C7g instances.
The custom kernel architecture for scaled dot product attention (SPDA) is another key feature of PyTorch 2.0 that enables high-performance support for learning and inference. This feature has been integrated into torch/compile(), and model developers can also use scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operator.
PyTorch 2.0 will make training and deploying state-of-the-art transformer models easier and faster. The accelerated transformers feature uses custom kernel architecture for scaled dot product attention, enabling high-performance training and inference support. PyTorch 2.0 supports multiple SDPA custom kernels and hardware types that can support transformers. The impact of these improvements will be immeasurable, allowing developers to train models faster than its predecessor, PyTorch.