Apple unveils its open-source multimodal language model Ferret


Published on:

Apple, in collaboration with Cornell University, recently unveiled ‘Ferret’, a pioneering open-source multimodal large language model (MLLM).

Ferret’s core functionality lies in its ability to interact with images in unprecedented ways. It can refer to and identify regions within images of any shape, making it adept at handling complex queries related to these images. This capability opens up a multitude of possibilities in various fields, such as image search and accessibility. For example, when presented with an image of an animal, Ferret can not only identify the species but also offer contextual information based on the other elements in the image.

The release was performed under a non-commercial license, indicating that while Ferret cannot be commercialized in its present form, it could be integrated into future Apple products or services.

Ferret’s training involved the GRIT dataset, a comprehensive collection of 1.1 million samples rich in hierarchical spatial knowledge. This dataset, including 95,000 hard negative data samples, was designed to enhance the model’s robustness and accuracy. The researchers behind Ferret noted its exceptional performance in traditional referring and grounding tasks, significantly outperforming existing models in multimodal chatting that demands region-based localization and specificity. However, they also acknowledged the model’s potential to produce harmful or counterfactual responses, indicating areas for future improvement​.

Notably, Ferret was trained on eight A100 GPUs with 80GB memory each, a specification that highlights the model’s high-level capabilities and Apple’s investment in advanced computing resources. This aspect of Ferret’s development is particularly striking given Apple’s history with Nvidia, the producer of these GPUs.

The release of Ferret is crucial for researchers, as it demonstrates Apple’s growing interest in transparency and collaboration in AI development. This is particularly relevant given Apple’s ongoing efforts to expand its AI server infrastructure, which, although currently may not match the scale of operations like ChatGPT.

Coinciding with Ferret’s release, Apple has also been making strides in other AI-related areas. The company recently introduced Apple MLX, a suite of open-source tools designed to enhance machine learning on Apple Silicon. ​Moreover, Apple has been working on its own generative AI model, internally referred to as ‘Apple GPT‘. 

Vishak is a skilled Editor-in-chief at Code and Hack with a passion for AI and coding. He has a deep understanding of the latest trends and advancements in the fields of AI and Coding. He creates engaging and informative content on various topics related to AI, including machine learning, natural language processing, and coding. He stays up to date with the latest news and breakthroughs in these areas and delivers insightful articles and blog posts that help his readers stay informed and engaged.

Related Posts:

Leave a Reply

Please enter your comment!
Please enter your name here