Meta AI Unveils CM3leon: A Groundbreaking Multimodal Model for Text and Image Generation


Published on:

Meta AI has developed a groundbreaking new multimodal model named CM3leon (pronounced “chameleon”). This model is the first that can understand and generate both text and images bi-directionally, allowing users to create images from text descriptions or compose text based on images.

CM3leon represents a major leap forward in multimodal AI. Its architecture utilizes a decoder-only tokenizer-based transformer network similar to text-only language models. Building on previous work with RA-CM3, CM3leon also incorporates a ” retrieval-reinforced ” technique, using an external database during training to actively seek diverse and relevant data.

The massive multitask instruction tuning of CM3leon enables it to perform various tasks, including text-to-image generation, text-guided image editing, caption generation, visual question answering, and structure-guided image editing.

To train CM3leon, Meta used a dataset of millions of licensed images from Shutterstock. The best-performing version has over 7 billion parameters, more than double the size of DALL-E 2. On the image generation benchmark MS-COCO, CM3leon achieved a new state-of-the-art Fréchet Inception Distance score of 4.88, surpassing Google’s Parti model.

According to Meta, a key advantage of CM3leon is its ability to produce coherent images closely following complex prompts for text-to-image generation and editing. It also excels at detailed image captioning and visual question answering, showing versatility across visual language tasks.

CM3leon was trained using only licensed image data, avoiding concerns about image ownership while maintaining high performance. Meta states this brings them closer to enabling creativity and enhanced applications for the metaverse.

The closed-source nature of CM3leon has drawn some criticism, though, as Meta and other tech giants benefit greatly from open-source AI while keeping their own models private. Still, with its revolutionary multimodal abilities, CM3leon represents a significant step forward for AI.

Vishak is a skilled Editor-in-chief at Code and Hack with a passion for AI and coding. He has a deep understanding of the latest trends and advancements in the fields of AI and Coding. He creates engaging and informative content on various topics related to AI, including machine learning, natural language processing, and coding. He stays up to date with the latest news and breakthroughs in these areas and delivers insightful articles and blog posts that help his readers stay informed and engaged.

Related Posts:

Leave a Reply

Please enter your comment!
Please enter your name here