Apple has introduced MGIE, an innovative open-source AI model that is set to revolutionize the way we approach image editing. MGIE, which stands for MLLM-Guided Image Editing, leverages the power of multimodal large language models (MLLMs) to interpret and execute natural language instructions for image manipulation.
Developed through a collaboration between Apple and researchers from the University of California, Santa Barbara, MGIE was unveiled in a paper presented at the International Conference on Learning Representations (ICLR) 2024. The paper highlights MGIE’s ability to significantly improve both automatic metrics and human evaluations of edited images, all while maintaining competitive inference efficiency.
As VentureBeat reports, MGIE is versatile, and capable of handling a wide range of editing tasks from simple colour adjustments to complex object manipulations. It excels at Photoshop-style modifications, including cropping, resizing, rotating, and applying filters, as well as more advanced edits like changing backgrounds, adding or removing objects, and blending images. Additionally, MGIE can optimize the overall quality of photos by adjusting brightness, contrast, sharpness, and color balance, and can even apply artistic effects to transform photos into sketches or paintings.
Local editing is another area where MGIE shines, allowing for precise modifications to specific regions or objects within an image. This includes altering attributes like shape, size, color, texture, and style, further enhancing the model’s utility in creating tailored and expressive imagery.
As we already mentioned, MGIE’s innovative approach to image editing is based on the use of MLLMs, which are adept at processing both text and images. These models enable MGIE to understand editing instructions in natural language and translate them into specific, pixel-level changes.
The model operates in two main stages. Initially, it uses MLLMs to generate expressive and clear instructions from user input, providing a detailed guide for the editing process. Subsequently, MGIE employs these models to create a visual imagination of the desired outcome, which then informs the pixel-level manipulation of the image. This process is facilitated by a novel end-to-end training scheme that optimizes instruction derivation, visual imagination, and the editing modules simultaneously.
MGIE is available as an open-source project on GitHub, providing access to the code, data, and pre-trained models. A demo notebook and an online web demo hosted on Hugging Face Spaces are also available, showcasing MGIE’s capabilities and ease of use.