PyTorch 2.0 Released: Accelerating Machine Learning with High-Performance Support for Transformers


Published on:

PyTorch 2.0 has finally arrived, and it is now generally available for developers to use. PyTorch is an open-source machine learning (ML) training approach that has grown to become one of the field’s most extensively used tools. The project was originally developed by Facebook (now Meta), and PyTorch 1.0 was released in 2018. Since then, the project has undergone several incremental improvements.

In September 2022, the PyTorch Foundation was established to promote open governance and encourage collaboration and contributions. This effort led to the beta preview of PyTorch 2.0 in December 2022.

The main focus of PyTorch 2.0 is performance, and it includes several new features to improve the speed and efficiency of ML training. Accelerated Transformers, crucial for large-scale language models and generative AI, are one of the most significant new features. This feature allows models to connect different concepts and has been integrated into the torch/compile API.

PyTorch 2.0 uses the torch/compile API as its main interface, which wraps and returns a compiled model. TorchInductor is used with NVIDIA and AMD GPUs and relies on the OpenAI Triton deep learning compiler to generate performant code and hide low-level hardware details. Kernels generated by OpenAI Triton achieve performance comparable to hand-written kernels and specialized cuda libraries such as cuBLAS.

PyTorch 2.0 also includes several new features and optimizations, including significant optimizations in GNN inference and training for CPU acceleration, Metal Performance Shaders (MPS) for GPU-accelerated PyTorch training on the Mac platform, and Amazon AWS’s optimization of PyTorch CPU inference on Graviton3-based C7g instances.

The custom kernel architecture for scaled dot product attention (SPDA) is another key feature of PyTorch 2.0 that enables high-performance support for learning and inference. This feature has been integrated into torch/compile(), and model developers can also use scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operator.

PyTorch 2.0 will make training and deploying state-of-the-art transformer models easier and faster. The accelerated transformers feature uses custom kernel architecture for scaled dot product attention, enabling high-performance training and inference support. PyTorch 2.0 supports multiple SDPA custom kernels and hardware types that can support transformers. The impact of these improvements will be immeasurable, allowing developers to train models faster than its predecessor, PyTorch.

Vishak is a skilled Editor-in-chief at Code and Hack with a passion for AI and coding. He has a deep understanding of the latest trends and advancements in the fields of AI and Coding. He creates engaging and informative content on various topics related to AI, including machine learning, natural language processing, and coding. He stays up to date with the latest news and breakthroughs in these areas and delivers insightful articles and blog posts that help his readers stay informed and engaged.

Related Posts:

Leave a Reply

Please enter your comment!
Please enter your name here