Google’s Gemini has beaten Pokémon Blue

Gemini, Google’s most advanced AI model, has been in the news for beating the classic 1996 Game Boy game Pokémon Blue.

This achievement marks a significant step for artificial intelligence in gaming, showcasing how far technology has come in tackling complex, open-ended challenges that require reasoning and adaptability.

Gemini’s Journey through Pokémon Blue

Gemini’s run through Pokémon Blue was not a solo effort. The project was led by Joel Z., a software engineer unaffiliated with Google, who streamed the entire playthrough on Twitch.

Joel Z ran the livestream and helped with technical issues, but he made it clear that he didn’t do much and that it wasn’t cheating.

Instead, his role was to enhance Gemini’s decision-making and reasoning abilities, only stepping in for technical clarifications, such as informing Gemini about a bug that required talking to a Rocket Grunt twice to obtain a key—an issue later fixed in Pokémon Yellow.

Gemini used an “agent harness” during the trip. This system sent it new screenshots and other game data.

This gave Gemini the chance to look at the situation, call on specialized sub-agents when needed, and choose what to do in the emulator.

Such a setup highlights the importance of external tools in helping AI models navigate complex environments like Pokémon Blue.

What a finish! Gemini 2.5 Pro just completed Pokémon Blue! Special thanks to @TheCodeOfJoel for creating and running the livestream, and to everyone who cheered Gem on along the way. pic.twitter.com/E2pn3tpfEb
— Sundar Pichai (@sundarpichai) May 3, 2025

Google executives, including CEO Sundar Pichai, celebrated the achievement, with Pichai humorously referring to the project as “Artificial Pokémon Intelligence.”

Logan Kilpatrick, product lead for Google AI Studio, also noted Gemini’s rapid progress, earning its fifth badge much faster than competing models.

Gemini vs. other AI Models: The Benchmark debate

Gemini’s success has naturally led to comparisons with other AI models, particularly Anthropic’s Claude, which is still working its way through Pokémon Red.

While Google has completed Pokémon Blue, Claude has yet to finish its version of the game, despite making steady progress and serving as an inspiration for the Gemini project.

But it’s hard to make direct comparisons between Gemini and Claude. During gameplay, each AI uses different resources and agents to get different information.

As Joel Z emphasized, these differences mean that performance in Pokémon Blue shouldn’t be considered a definitive benchmark for AI capabilities.

The unique setups and developer interventions make it impossible to declare one model categorically superior to another in this context.

Despite the limitations in benchmarking, using classic games like Pokémon Blue has become a popular way to test and showcase AI’s problem-solving skills.

These games present open-ended goals and require the AI to adapt to unexpected scenarios, making them ideal for evaluating reasoning and adaptability.

Final words

Gemini’s completion of Pokémon Blue is more than just a technical achievement—it represents a new era in AI development, where models are not only solving structured problems but also navigating complex, nostalgic worlds that many people grew up with.

The project continues to evolve, with ongoing improvements to the framework and growing interest from both the AI and gaming communities.

By demonstrating advanced reasoning, adaptability, and the ability to collaborate with human developers, Google is setting new standards for what artificial intelligence can achieve in interactive environments.

Google’s Gemini has beaten Pokémon Blue

Gemini’s Journey through Pokémon Blue

Gemini vs. other AI Models: The Benchmark debate

Final words

Related Posts

You can now try Microsoft’s Gaming Copilot AI assistant on PC

Elden Ring on Switch 2 will introduce two brand-new character classes

Warframe adds two new Protoframes that you can romance

Virtual Boy returns to Switch 1 and 2 with classic games

The developers behind Overwatch have unionized

Recommended

iOS 26.1 may include a toggle to reduce Liquid Glass this week

Nvidia is planning to invest up to $1 billion in Poolside

Decentralized Social Media Bluesky hits 40 million users, introduces ‘dislikes’ beta

YouTube TV Disney blackout leaves millions without ESPN and ABC

Daily

iOS 26.1 may include a toggle to reduce Liquid Glass this week

Nvidia is planning to invest up to $1 billion in Poolside

Decentralized Social Media Bluesky hits 40 million users, introduces ‘dislikes’ beta

Most Viewed

iOS 26.1 may include a toggle to reduce Liquid Glass this week

Nvidia is planning to invest up to $1 billion in Poolside

Decentralized Social Media Bluesky hits 40 million users, introduces ‘dislikes’ beta

Recommended

iOS 26.1 may include a toggle to reduce Liquid Glass this week

Nvidia is planning to invest up to $1 billion in Poolside

Decentralized Social Media Bluesky hits 40 million users, introduces ‘dislikes’ beta

YouTube TV Disney blackout leaves millions without ESPN and ABC