The Latest Google I/O Conference Unveils Significant Evolution of AI Assistant Gemini, Capable of Handling Tasks from Mobile Phones and Image Creation to Code Modification
Google I/O Conference Kicks Off on 5/21, CEO Sundar Pichai Announces Major AI Upgrades
At the Google I/O conference on May 21, CEO Sundar Pichai took the opportunity to announce several major upgrades to the company’s AI offerings. Not only did they launch the Ironwood TPU, which is 10 times faster than its predecessor, but they also introduced Google Beam, which focuses on a 3D immersive calling experience, and the Gemini App’s agent mode that can assist in booking, viewing properties, and arranging itineraries. This showcases Google’s ambition to create a “universal AI assistant” that integrates into the daily lives of humanity.
Ironwood TPU Makes a Strong Debut, 10x Faster than Previous Generation
Pichai first introduced the company’s seventh-generation TPU, “Ironwood,” which features:
- Performance that is 10 times faster than the previous generation
- A complete TPU Pod capable of processing 42.5 million trillion calculations per second
- To be available to Google Cloud users by the end of the year.
AI-Driven 3D Video Device, Google Beam Launched
Google Beam, an AI-driven 3D video device, is characterized by:
- Consisting of 6 cameras
- Ability to synthesize 3D light field images post-capture
- Aiming to make remote video calls feel like face-to-face conversations.
The first batch of devices was co-developed with HP and will be available to early users this year.
Real-Time Translation and Screen Sharing Now Live, Major Evolution for Gemini
As part of Google’s actively developed Gemini Live AI model, it has undergone significant upgrades:
- Real-time voice translation: Currently supports English and Spanish, with more languages to follow
- Supports screen sharing and visual analysis: Can instantly analyze the user’s current screen; for example, if a streetlight is mistaken for a person, Gemini will respond, “That’s your shadow!”
- Available for Android and iOS users starting May 21.
AI Multitasking Agent Project Mariner Launched, Available to Developers via Gemini API
Pichai also announced that Google will soon open its multitasking agent Project Mariner, which can:
- Handle 10 tasks simultaneously
- Learn and replicate task processes
- Be accessible to developers through the Gemini API.
Gemini App’s New Agent Mode Can Help You Find Properties and Arrange Itineraries
As Google’s flagship AI application, the Gemini App has evolved its functionalities impressively:
- Introduced AI agent mode, which can automatically search for properties and arrange viewing schedules
- Can also assist in making calls and booking itineraries.
The Gemini App’s “agent mode” feature will automatically search for listings on platforms like Zillow, arrange viewing schedules, and even help you make calls and book itineraries, supporting MCP integration with other services. MCP acts like a connector for Gemini to interface with various websites, apps, and service systems, upgrading from merely speaking to being “an agent that helps get things done.”
Gmail Begins Integrating Gemini, Automatically Assisting Users with Email Replies
As a typical email service, Gmail has also begun integrating Gemini, which can:
- Read users’ past email writing styles, documents, and calendars
- Automatically generate reply content.
Available for subscription users of Gmail this summer.
Gemini Flash and 2.5 Pro Major Upgrades, AI Coding Assistant
Jules Assists with Code Changes
The new version of the Gemini Flash model is faster and more powerful than before, including:
- Launching the 2.5 Pro “Deep Think” mode, capable of handling complex math problems and lengthy tasks
- Scheduled for official release in June
- Supporting 24 languages, able to adjust tone naturally, and offering bilingual modes, all integrated into the Gemini API.
By taking a screenshot of the code and feeding it to 2.5 Pro, the developer-focused AI assistant Jules can assist with code changes, with public testing starting on May 21.
AI Models for Music and Video Released, AI Video Creation Platform Project Flow Debuts
- Imagine 4: A new generation of image AI generation model with more accurate text processing and 10 times faster generation speed, capable of handling typography.
- Veo 3: A new video generation model that can integrate narration and ambient sound.
- Lyria 2: A generative AI music model capable of producing high-quality music.
- Project Flow: A new AI video creation platform that allows users to freely generate or upload characters and scenes, enabling AI to create visuals through text commands.
Complete Integration with Chrome, Wear, and TV, Ensuring Search AI Effectively Assists
Search AI has undergone a comprehensive evolution: “AI Mode” has transformed into a real assistant.
- AI Mode: Capable of answering complex questions using charts, tables, and summary reports.
- Search Live: Enables interactive searches similar to video calls.
- Try-On Feature: Upload a photo to simulate and compare clothing try-on situations.
- One-Click Checkout: Reminders for price changes, adding to the shopping cart, and automatic checkout are all managed seamlessly.
- Gemini in Chrome: Can directly read page content to provide answers.
- Deep Research + Canvas: Allows users to upload reports and transform them into web pages, podcasts, or quizzes with one click.
- Gemini Live is integrating with Keep, Maps, and Calendar.
Gemini Enters the XR Field, Collaborating with Samsung to Create AI Glasses and Headsets
Google is also collaborating with Samsung to create XR smart glasses, Project Muhan, expected to launch this year. Project Muhan will support voice, visual search, translation, navigation, and real-time responses, and is being developed in partnership with Warby Parker.
Ultra Subscription Plans and Global Expansion
Google AI Pro / Ultra: Pro offers higher usage limits, while Ultra allows early access to new features, along with YouTube Premium and expanded cloud storage.
Features like 2.5 Pro Deep Think, Veo 3, and Flow will be prioritized for Ultra subscribers.
As CEO Pichai concluded, Gemini is evolving from a multimodal model into an “AI world model,” and Google’s vision is to create a true “universal AI agent” capable of assisting humans in writing emails, solving problems, editing videos, coordinating outfits, auditioning, and even walking to find coffee shops, fully integrating into human daily life.
Risk Warning
Investing in cryptocurrencies carries a high level of risk, with prices subject to extreme volatility. You may lose your entire principal. Please assess the risks carefully.