Florence: Microsoft Releases Multimodal Vision AI Model For Improved Image And Video Analysis

Microsoft’s Florence AI model has been released for public preview two years after its announcement as part of “Project Florence”. Florence is a “unified” and “multimodal” state-of-the-art vision AI model that understands multiple modalities, such as language and images, video, and audio.

Florence has been added to the Vision APIs in Azure Cognitive Services and Microsoft Vision Services, with features ranging from automatic captioning, background removal, and video summarization to image search, available to existing Azure customers.

Multimodal models are believed to be the best route to higher-performing AI systems. Multimodal models can perform tasks in one go that unimodal models cannot, thanks to contextual information from additional modalities. Florence understands the relationship between images, video, language, and their modalities, making it computationally efficient, leading to faster processing and lower backend costs.

Florence’s improved image tagging, image search, and background generation capabilities will benefit platforms, products, and services, including LinkedIn, Microsoft Teams, PowerPoint, Outlook, and Word. The new features added to Vision Studio, such as high-density captions, image search, background removal, and model customization, will improve search recommendations, make digital content accessible, and support custom models that match unique business needs with high accuracy and just a handful of images.

Reddit is one of the platforms that will use Florence-powered API to generate captions for images on its platform, creating “alt text” to help visually impaired users follow threads better. Tiffany Ong, Product Manager for Consumer Products at Reddit, said that Microsoft’s Vision technology makes it easier for users to find and understand the content. Reddit’s newly created image captions make it more accessible for everyone and encourage Reddit users to explore images, providing more opportunities to participate in conversations and ultimately building a sense of connection and community.

Florence’s new features added to Vision Studio will improve search recommendations, make digital content accessible, and support custom models with high accuracy and just a handful of images.

Florence: Microsoft Releases Multimodal Vision AI Model For Improved Image And Video Analysis

Related Posts:

Microsoft Launches Phi-3 Mini: A Compact AI Powerhouse Challenging Industry Giants

Microsoft Proposes Using OpenAI Tech for US Military, Ignoring Ethical Guidelines

Microsoft and OpenAI Set to Launch $115 Billion AI Supercomputer, Codenamed Stargate, with Nuclear Power Option

Microsoft Faces Scrutiny as Copilot AI Generates Illegal and Disturbing Content

Mistral and Microsoft Forge Alliance and the Launch of Mistral Large

FTC launches investigation into tech giants’ AI alliances