Florence: Microsoft Releases Multimodal Vision AI Model For Improved Image And Video Analysis


Published on:

Microsoft’s Florence AI model has been released for public preview two years after its announcement as part of “Project Florence”. Florence is a “unified” and “multimodal” state-of-the-art vision AI model that understands multiple modalities, such as language and images, video, and audio. 

Florence has been added to the Vision APIs in Azure Cognitive Services and Microsoft Vision Services, with features ranging from automatic captioning, background removal, and video summarization to image search, available to existing Azure customers.

Multimodal models are believed to be the best route to higher-performing AI systems. Multimodal models can perform tasks in one go that unimodal models cannot, thanks to contextual information from additional modalities. Florence understands the relationship between images, video, language, and their modalities, making it computationally efficient, leading to faster processing and lower backend costs.

Florence’s improved image tagging, image search, and background generation capabilities will benefit platforms, products, and services, including LinkedIn, Microsoft Teams, PowerPoint, Outlook, and Word. The new features added to Vision Studio, such as high-density captions, image search, background removal, and model customization, will improve search recommendations, make digital content accessible, and support custom models that match unique business needs with high accuracy and just a handful of images.

Reddit is one of the platforms that will use Florence-powered API to generate captions for images on its platform, creating “alt text” to help visually impaired users follow threads better. Tiffany Ong, Product Manager for Consumer Products at Reddit, said that Microsoft’s Vision technology makes it easier for users to find and understand the content. Reddit’s newly created image captions make it more accessible for everyone and encourage Reddit users to explore images, providing more opportunities to participate in conversations and ultimately building a sense of connection and community.

Florence’s new features added to Vision Studio will improve search recommendations, make digital content accessible, and support custom models with high accuracy and just a handful of images.

Vishak is a skilled Editor-in-chief at Code and Hack with a passion for AI and coding. He has a deep understanding of the latest trends and advancements in the fields of AI and Coding. He creates engaging and informative content on various topics related to AI, including machine learning, natural language processing, and coding. He stays up to date with the latest news and breakthroughs in these areas and delivers insightful articles and blog posts that help his readers stay informed and engaged.

Related Posts:

Leave a Reply

Please enter your comment!
Please enter your name here