Google unveils Gemini, it's multimodal AI model that beats ChatGPT

Thu, 7th Dec 2023

FYI, this story is more than a year old

Google has unveiled its latest leap in AI technology, known as Gemini; the product of large-scale cooperation amongst teams across Google, Google DeepMind, and Google Research. It's described as a multimodal AI model and the most capable, flexible, and general model it's yet conceived.

The first incarnation of Gemini has been optimised in three forms – Gemini Ultra, Gemini Pro and Gemini Nano. Gemini Ultra focuses on highly complex tasks, whilst Gemini Pro is designed for scaling across a wide range of tasks. The streamlined Gemini Nano specialises in on-device tasks.

As a multimodal AI model, Gemini is capable of generalising and seamlessly integrating information from various modalities, including text, images, audio, video, and coding languages. It runs efficiently across a range of platforms, from mobile devices to data centres, offering a major enhancement for developers and enterprise clients when building and scaling with AI.

In a significant evolution for AI, Gemini 1.0 simultaneously recognises and interprets text, images, audio, and more. This allows for a more nuanced understanding of complex queries, particularly those related to maths and physics. The AI also can understand, explain, and generate high-quality code in widely adopted programming languages, such as Python, Java, C++, and Go.

Gemini's coding proficiency was tested on HumanEval, the industry benchmark for coding tasks, where it solved 74.4% of the tasks. Additionally, a specialised version of Gemini could develop an advanced code generation system, AlphaCode 2, which excels in both coding and involving complex maths and theoretical computer science.

Google DeepMind subjected the Gemini Pro base model to industry-standard benchmarks. The tests found that Gemini Ultra performs exceptionally, outperforming current results on 30 out of 32 widely used industry benchmark tests. This includes the MMLU (massive multitask language understanding), where Gemini scored 90.04%.

Gemini Ultra is being offered to select customers, developers, partners, and safety and responsibility experts for preliminary testing and feedback. The goal is to release it for broad use by developers and enterprise clients early next year.

Bard, Google's text-based software product, will employ a uniquely tuned version of Gemini Pro, thus significantly improving Bard's abilities in understanding, summarising, reasoning, and planning. Initially available in English in over 180 countries, its range is expected to further expand in the near future.

The upcoming Pixel 8 Pro will be the first smartphone engineered to run Gemini Nano, introducing new features like Summarise in the Recorder app. The technology will also be included in Smart Reply for Gboard, starting with WhatsApp and extending to other messaging apps in the coming year. In addition, Google plans to integrate Gemini in more of Google's core products and services, such as Search, Ads, Chrome, and Duet AI.

Share on: