Stability AI has introduced Stable Cascade, a text-to-image architecture designed to set new standards in the quality, flexibility, and efficiency of AI-driven image creation.
Stable Cascade builds upon the success of its predecessor, Stable Diffusion, which has been a foundational tool in the realm of text-to-image AI applications. However, Stable Cascade distinguishes itself by introducing enhanced capabilities that include generating variations of an image, improving image resolution, and specialized editing features such as inpainting, outpainting, and canny edge detection. These features allow users to modify specific parts of an image or create new images from the outlines of existing ones.
Unlike the single large language model approach of its predecessors, Stable Cascade utilizes a trio of models based on the Würstchen architecture. The process starts with compressing text prompts into latents, or smaller code pieces, which are then decoded by the subsequent stages. This method significantly reduces memory requirements and computational time, making the model both faster and more resource-efficient. In practical terms, Stable Cascade can generate images in about 10 seconds, a substantial improvement over the 22 seconds required by the current SDXL model.
Stability AI has released all necessary code for training, fine-tuning, and customization on GitHub, including scripts for specialized applications such as inpainting/outpainting, canny edge generation, and super-resolution. As of now, Stable Cascade is available for non-commercial use, with guidelines pending further policy development.