Meta AI has announced the release of DINOv2, an upgraded version of its self-supervised vision transformer model, DINO, which was introduced two years ago. The model has shown strong performance without requiring fine-tuning, setting it apart from similar models such as CLIP.
One of the unique features of DINOv2 is that it does not require any labelled data to perform tasks such as depth estimation, image classification, semantic segmentation, and image retrieval. This is made possible by Meta AI’s pre-training approach, which involves training the model on large amounts of raw text using objectives that do not require supervision, such as language modelling and word vectors.
DINOv2 is an open-source, unlabeled, self-supervised pre-trained model trained on 142 million images. It provides high-performance functionality that can be used directly as input for simple linear classifiers. This flexibility allows DINOv2 to be a versatile backbone for various computer vision tasks.
This AI model is a significant advancement in the field of computer vision since it eliminates the need for costly labelled data, saving developers time and resources. With DINOv2, developers can focus on the core of their application without worrying about collecting and labelling data.
The GitHub repository for DINOv2 is now available, allowing developers to access and experiment with the model. DINOv2’s release has sparked significant attention in the computer vision community and will likely greatly influence the area.