Google launched Gemini — a new AI model that promises to revolutionize how technology understands and processes diverse types of information. Gemini stands out for its multimodal capabilities, seamlessly integrating text, code, audio, image, and video data.
The development of Gemini is a result of collaborative efforts by various teams across Google, including Google Research and DeepMind. Demis Hassabis, CEO and Co-Founder of Google DeepMind, emphasized the collective endeavor in creating this innovative model. Sundar Pichai, Google’s chief, described the project as one of the most extensive science and engineering efforts undertaken by the company.
Designed for versatility, Gemini is optimized to function efficiently across a range of platforms, from large data centers to mobile devices. The model is available in three versions, each tailored for specific needs — Gemini Ultra for complex tasks, Gemini Pro for scalable applications, and Gemini Nano for efficient on-device applications.
The integration of Gemini into Google’s ecosystem has already begun. Gemini Pro is being incorporated into Google’s Bard for enhanced reasoning and planning capabilities, while Gemini Nano is powering features in the Pixel 8 Pro smartphone, such as the Summarize function in the Recorder app and Smart Reply in Gboard.
Developers and enterprise customers will have access to Gemini Pro through the Gemini API in Google AI Studio or Google Cloud Vertex AI starting December 13. Gemini Ultra, which is currently undergoing extensive trust and safety checks, is expected to be available for early experimentation and feedback before a broader release in early 2024.
Gemini’s impact is set to extend across various Google products and services, including Search, Ads, Chrome, and Duet AI. In the Search domain, the model has already demonstrated a 40% reduction in latency for English users in the U.S., improving the overall quality of the Search Generative Experience (SGE).
In testing, Gemini Ultra has achieved state-of-the-art performance, surpassing human experts on the Massive Multitask Language Understanding (MMLU) benchmark with a score of 90.0%. This benchmark covers a wide array of subjects, demonstrating Gemini’s comprehensive understanding and problem-solving capabilities.
The model’s native multimodality allows it to excel in various benchmarks, including text, coding, and multimodal tasks. Its sophisticated reasoning capabilities enable it to process vast amounts of written and visual information, facilitating advancements in multiple fields.
Gemini also shows exceptional proficiency in advanced coding, capable of understanding, explaining, and generating high-quality code in popular programming languages. It serves as the engine for advanced coding systems like AlphaCode 2, which has shown improved performance in solving complex programming problems.
Trained using Google’s latest Tensor Processing Units (TPUs) v4 and v5e, Gemini 1.0 is both powerful and efficient. The announcement of Cloud TPU v5p, the most powerful TPU system to date, highlights Google’s ongoing commitment to AI development.
Google has also prioritized responsibility and safety in the development of Gemini. The company has conducted extensive safety evaluations to address potential risks such as bias and toxicity. Collaborations with external experts and the implementation of safety measures, including classifiers to filter out harmful content, reflect Google’s commitment to ethical AI practices.