Microsoft has unveiled a new AI model, NUWA-XL, that can generate up to 11 minutes of video content with 16 descriptors. The NUWA-XL is an upgraded version of the NUWA model announced earlier this year.
Using the Diffusion over Diffusion architecture, it generates image and video content from natural language text descriptions. The Global Diffusion model generates keyframes for the entire time axis of the movie, and the Local Diffusion model fills in the content around the keyframes. This ensures continuity and consistency of the generated content while accelerating overall content generation efficiency.
Microsoft’s technology has significantly improved the speed of generating content. While it took an average of 7.55 minutes to generate 1,024 images before, now it takes just 26 seconds — an overall speed improvement of 94.26%.
The NUWA-XL model is based on the professional animation content production process of creating keyframes and continuously generating content derived from the keyframes to form a complete animation video. This ensures quality and speeds up content generation.
With this new AI model, Microsoft is making it easier and faster for creators to generate video content from natural language text descriptions. This has substantial consequences for various companies, including media and entertainment, since creating video content is time-consuming and resource-intensive.
While the NUWA-XL is still in its early phases of development, it can potentially change how video content is created. Microsoft’s dedication to expanding its AI capabilities demonstrates the company’s ongoing attempts to push the frontiers of technology and promote innovation. The NUWA-XL is only one example of how artificial intelligence may accelerate content creation and make it more accessible to a broader spectrum of artists.