Apple, in collaboration with Cornell University, recently unveiled ‘Ferret’, a pioneering open-source multimodal large language model (MLLM).
Ferret’s core functionality lies in its ability to interact with images in unprecedented ways. It can refer to and identify regions within images of any shape, making it adept at handling complex queries related to these images. This capability opens up a multitude of possibilities in various fields, such as image search and accessibility. For example, when presented with an image of an animal, Ferret can not only identify the species but also offer contextual information based on the other elements in the image.
The release was performed under a non-commercial license, indicating that while Ferret cannot be commercialized in its present form, it could be integrated into future Apple products or services.
Ferret’s training involved the GRIT dataset, a comprehensive collection of 1.1 million samples rich in hierarchical spatial knowledge. This dataset, including 95,000 hard negative data samples, was designed to enhance the model’s robustness and accuracy. The researchers behind Ferret noted its exceptional performance in traditional referring and grounding tasks, significantly outperforming existing models in multimodal chatting that demands region-based localization and specificity. However, they also acknowledged the model’s potential to produce harmful or counterfactual responses, indicating areas for future improvement.
Notably, Ferret was trained on eight A100 GPUs with 80GB memory each, a specification that highlights the model’s high-level capabilities and Apple’s investment in advanced computing resources. This aspect of Ferret’s development is particularly striking given Apple’s history with Nvidia, the producer of these GPUs.
The release of Ferret is crucial for researchers, as it demonstrates Apple’s growing interest in transparency and collaboration in AI development. This is particularly relevant given Apple’s ongoing efforts to expand its AI server infrastructure, which, although currently may not match the scale of operations like ChatGPT.
Coinciding with Ferret’s release, Apple has also been making strides in other AI-related areas. The company recently introduced Apple MLX, a suite of open-source tools designed to enhance machine learning on Apple Silicon. Moreover, Apple has been working on its own generative AI model, internally referred to as ‘Apple GPT‘.