Is Gemini AI The Google's Next True Big Thing?

A few days back, we so how much Artificial Intelligence (AI) has become powerful. Since the day Pichai first teased at the I/O developer conference in June to the day Google first announced Gemini AI, people have had only questions: What is this cutting-edge conversational artificial intelligence software set to change the game?

But what exactly is Gemini, and why should you care? Google has been an 'AI-first company' for nearly a decade, and they finally made a master move. Dec 6, 2023, Google officially launched its most powerful AI ever. I'll explain everything that you have to know and why you should care about this Gemini.

What is the Gemini AI model?

Gemini, the latest and most robust artificial intelligence model developed by Google, possesses the capability to understand not only text but also images, videos, and audio. Gemini is praised as a multimodal model for its proficiency in handling intricate tasks in fields like math, physics, and beyond.

Additionally, it showcases prowess in understanding and generating top-notch code across diverse programming languages. Other language models like GPT still need to be proficient.

gemini-three-sizes-versatility — Gemini offers versatility with three distinct sizes for varied purposes. / Google

There are some AI that should be proficient enough, like Elon Musk[s x.AI, but most of those tools are yet to be released to the general public. So, we can say Google Gemini is the most capable AI model in the world.

Gemini is basically a set of large language models (LLMs) that take huge advantage of the training methods and techniques abode from AlphaGo. This especially includes reinforcement learning and tree search. Ok, these truly have the potential to beat ChatGPT as the planet's most dominant generative AI solution.

Right now, Gemini is accessible through integrations with the Google Bard chatbot and the Google Pixel 8. According to them, they are planning to let users Integrate Gemini models into applications with Google AI Studio and Google Cloud Vertex AI—available Dec 13.

In a Wired interview, Demis Hassabis, the brain behind Google DeepMind, said Gemini will be "combining some of the strengths of AlphaGo type systems with the amazing language capabilities of the large models."

Who made Gemini?

Google and Alphabet, Google's parent company, jointly built Gemini, marking it as the most advanced AI model in the company's repertoire.

The collaborative efforts of Google DeepMind, led by its CEO and Co-Founder, Demis Hassabis, played a key role in contributing significantly to the development of Gemini.

What Are The Different Versions of Gemini

gemini-three-versions-explored — Explore the Trio of Gemini Versions by Google / Google

There are three different versions of Gemini that were announced. As Google says, "Gemini comes in three sizes". And why are there three different versions of Gemini?

Well, with its advanced capabilities, Gemini is designed to operate from data centers to mobile devices. This flexibility is a huge advantage for developers and businesses of all sizes, offering a smooth avenue to seamlessly integrate the power of AI into their diverse array of products and services.

Gemini significantly transforms how developers and enterprise customers approach building and scaling with AI.

Google's vision with Gemini is to make AI more accessible and affordable for all. Now, while the impressive capabilities of Gemini Pro and Nano will allow them to respond to both text and code deftly, we eagerly anticipate the imminent release of their groundbreaking multimodality features. Look at the tables and columns down below you'll understand these three Gemini versions better.

Version	Description	Availability	Use Cases	Release Information
Gemini Ultra	The most potent version of Gemini, purpose-built for handling complex tasks such as scientific research and drug discovery. Not yet available for public use, with Google announcing its release in 2024 (specific date pending).	Not publicly available	Scientific research, drug discovery	Expected release in 2024
Gemini Pro	Considered the best version for scaling across diverse tasks, including chatbots, virtual assistants, and content generation. It serves as the backbone of the Bard platform and will be accessible to developers and enterprise customers through Google Generative AI Studio or Vertex AI in Google Cloud, starting December 13th, 2023.	Available starting December 13th, 2023	Chatbots, virtual assistants, content generation	Released on December 13th, 2023
Gemini Nano	The most efficient version of Gemini, designed for on-device tasks, making it ideal for applications on mobile phones or smart home devices. Currently available on Pixel 8 Pro devices, it powers features like Summarise in the Recorder app and Smart Reply on Gboard (initially in WhatsApp).	Currently available on Pixel 8 Pro devices	On-device tasks, Summarise, Smart Reply	Available on Pixel 8 Pro devices

Is Gemini better than ChatGPT?

Gemini beats ChatGPT on 30 out of 32 (around 90%) academic benchmarks, breaking down into 10 texts and reasoning, 9 image understanding, 6 video understanding, and 5 speech recognition and translation benchmarks. So the simple answer is Yes.

And let me answer you to the million-dollar question more elaborately.

Technical Capabilities of Gemini

ChatGPT hit over 100 million monthly active users this year with a new record of the fastest 100 million users. Google initially relied on Gemini's text and image to stand out from GPT-4, But to fight back, OpenAI started letting users input voice and image queries into ChatGPT.

Gemini is armed with Google's proprietary data arsenal, pulling insights from services like Google Search, YouTube, Google Books, and Google Scholar. Reports hint that Gemini might be the champ with its twice-as-many-token training.

Google DeepMind and Brain teams, featuring stars like Sergey Brin and Paul Barham, use their expertise in reinforcement learning and tree search to Gemini.

gemini-multimodal-family — Gemini showcasing advanced multimodal capabilities for the AGemini family / DeepMind

The MMLU (Massive Multitask Language Understanding) test explores 57 subjects from math, physics, and law. Gemini AI stole the spotlight with an impressive score of 90%, outshining ChatGPT's 86.4%.

What is MMLU (Massive Multitask Language Understanding)?

MMLU (Massive Multitask Language Understanding) ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem-solving ability.

Subjects range from traditional areas, such as mathematics and history, to more specialized areas, like law and ethics.
MMLU benchmark measures the model's multitask accuracy.

Gemini uses Chain of Thoughts (CoT@32), while ChatGPT stuck to the 5-shots technique. Here's the plot twist: the less powerful Pro model of Gemini AI surpasses the GPT-3.5 (the free version of ChatGPT) in six out of eight tests. Here is the full technical report.

The Ulitmate Comparison Between GPT Vs. Gemini

ChatGPT is readily available and accessible through various platforms and APIs. It's got free access tiers with capabilities to paid plans like Chat GPT Plus and ChatGPT Enerprice versions. But Gemini is in the testing phase; rumors suggest it will offer various access options, with hints of limited free access and tiered paid plans.

gemini-ultra-pro-openai-gpt4-whisper-chart — Gemini Ultra and Pro compared to OpenAI's GPT-4 and Whisper in Google's chart / Google/ZDNET

Regarding ease of use, ChatGPT is simple with a user-friendly interface, and APIs are available to purchase. On the flip side, with its advanced capabilities, Gemini might demand more prowess, though the details of its interface and API are still not revealed to the public.

ChatGPT seamlessly associates with platforms like Discord and Telegram. But Gemini is still a newer Language Model (LLM); integration might take some time, but given Google's extensive playground, we can anticipate this in various Google products and services very soon.

Regarding accessibility, ChatGPT proponents inclusivity with features like text-to-speech and speech-to-text options. While Gemini is yet to debut, the commitment from Google suggests it won't skimp on accessibility tools.

Unlike other general models currently powering AI chatbots, Gemini differentiates itself with its intrinsic multimodal feature.

However, popular models such as GPT-4 depend on plugins and integrations to achieve true multimodality.

Google talks about ensuring safety and responsibility in Gemini, subjecting it to rigorous internal and external testing along with thorough red-teaming.

Sundar Pichai, Google's CEO, highlights the importance of guaranteeing data security and reliability, especially in enterprise-first products where generative AI often generates revenue.

Acknowledging the risks of launching a cutting-edge AI system, Demis Hassabis describes the Ultra release as akin to a controlled beta, creating a "safer experimentation zone" for Google's most capable model.

This cautious release strategy aims to uncover unforeseen issues and potential attack vectors, learning and adapting throughout the process. To be honest first version of the Gemini model might not be a world-changer, but at least it will help to catch up with OpenAI.