Google & Microsoft Strike Back!

Microsoft and Google with the 1-2 Punch

Just like ChatGPT, Microsoft’s Copilot has also completed its year one and Microsoft’s celebrating. And it is promising some of the other AI features coming into Copilot—some old and some new.

What is going on here?

Microsoft Copilot is going to get some “new” AI features in its second year.

What does this mean?

To start with, Copliot will also be using the new model from OpenAI—GPT-4 Turbo and an updated version of DALL-E3. Then Code interpreter will slowly to Copilot (fka Bing chat)—at first just using code to answer queries and then with the ability to add your own files, just like ChatGPT plus. Similarly, Microsoft is combining GPT-4V with its web image and search data to make Copilot multimodal.

There are two “new” product-specific features as well (not taken from ChatGPT) —Inline Compose and Deep Search. Inline compose can get Copilot to rewrite text on any website (unclear what this means) and Deep Search can refine queries with extra information and specific requests (like in Perplexity).

Why should I care?

GPT-4 Turbo is not the most impressive addition when everyone is complaining that it’s worse. But soon many capabilities of ChatGPT Plus will be free in 169 countries.

Another interesting point is that Copilot is evolving beyond the chatbox and adding more product-specific features in Edge and Search. These potentially have the power to take users away from Google products.

Google launched Gemini out of the blue yesterday. —The Information first reported that Google is postponing this launch to January and then updated the report to say nope, Google’s gonna do it this week.

And Google launched, with a bang.

All I see are blue numbers under Gemini with GPT-4 (and GPT-4 Vision) greyed out on the side. Impressive stuff from Google. (TLDR at the bottom)

This looks pretty bad, right?

But for whom…

As the demo-induced excitement takes some rest, AI Twitter (X) dive into the post and find some troubling things.

  • The MMLU performance which Google claims surpasses GPT-4 (and even humans) is a clever play on prompting technique, Otherwise, Gemini is still losing to GPT-4. Almost near GPT-4, but losing.

  • The flagship demo is a post-processed video (expected) with prompts read in the video different from actual prompts to the system (unexpected). Google reveals this on its own, by releasing a dev post “How it’s made’ breaking down how they created the video.

  • The blue numbers are for Gemini Ultra, which is gonna come next year. The model is live right now in Bard is Pro, one version down. The developer access for even Pro models is a week away (13th December).

What exactly did Google launch then?

The biggest breakthrough in the Gemini announcements is the fact that Gemini models are trained on multimodal data from the ground up. This includes text, images, videos and audio. This change might result in Google getting a lead and OpenAI play catch up in 2024.

Gemini comes in three sizes: Ultra, Pro and Nano. Ultra beats GPT-4 on many benchmarks and is comparable on the rest. But we are not getting Ultra anytime soon. We’re getting Pro in Bard, starting now. The kids in this party of giants, the Nano models will run on mobile devices starting with the Pixel 8 Pro.

Gemini Pro will be in developers’ hands next week. Android devs will also get access to Gemini Nano. Gemini Ultra’s first appearance next year would be in a different product called Bard Advanced. It’ll likely combine these features and be paid.

The highlight of Benchmark performance is MMLU beating GPT-4 and Humans. They use a new prompting + reward technique to get to 90% on MMLU [technical report]

Google's performance on vision and audio benchmarks is more impressive. Pro with Vision is comparable to GPT-4 with Vision, and Gemini wins over Whisper by a huge margin. But as we all know those are benchmarks. When replicating Gemini tests in ChatGPT, GPT-4 did as well as the demo videos.

Ah! The demos. Tell me more

The key demo, Hands-on with Gemini, shows Gemini using image and audio inputs, working in multiple languages, writing code, and reasoning using images or videos as context. Obviously, the demo is cherry-picked and sped up with post-production (like audio outputs). Google’s behind-the-scenes article explains how much.

But there are about a dozen other demos buried below this one. A quick recap of the interesting ones:

  • Gemini allows scientists to scan through 200,000 papers, find ~250 relevant ones and extract data from those papers.

  • A special version of Gemini, AlphaCode2 performs better than 85% of humans in competitive programming.

  • It can check your kids’s science homework or help you in listening to French podcasts.

  • Gemini can create UIs on the fly. They are calling it Bespoke UI in the demo. It’ll be interesting to see if this comes out as a product early next year.

So what?

Google proved a model beating GPT-4 is possible but again, ships a waitlist. Gemini’s integration into Google products remains to be seen.

Bard with Gemini Pro is likely better than the ChatGPT free version. ChatGPT Plus with GPT-4 or Bing in Creative mode (using GPT-4 under the hood) is still better.

That’s it, the rest of it is noise.

*Credit to Ben’s Bites consolidating such valuable information daily.