Can Text or Images Create Games? Google Launches Generative Interactive Environment AI Model “Genie”
Google DeepMind recently launched the generative interactive environment AI model “Genie”, which can generate interactive animated games through text or image prompts without prior training on game mechanics and operations.
Table of Contents
Toggle
Google DeepMind Launches Generative Interactive Environment Tool “Genie”
What is Genie?
Multi-Model Architecture
Learning to Reproduce Actions and Identify Controllable Parts
Creating Games from Synthetic or Real Images
Google and OpenAI Engage in Intense Competition
Genie
As an artificial intelligence company acquired by Google in 2014, Google DeepMind stated in a paper submitted on the 23rd that it has launched the generative interactive environment AI model “Genie”, which can generate controllable interactive virtual environments through text, images, or sketches. According to the content, Genie is trained using a large amount of publicly available online videos, instead of relying on specific game or scene data, making it more widely applicable in game development and creative entertainment fields.
As a new creation of generative AI, we have introduced the generative interactive environment “Genie”, which can generate interactive and playable environments based on a single image prompt.
Multi-Model Architecture
First, the paper shows that Genie is set up as a basic world model with a total of 11 billion parameters, including a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model.
Genie Paper Content
Therefore, it can be trained in an unsupervised manner from 2D platform game and robotics videos on the internet without explicit instructions. It can also infer consistent or multiple latent actions from the generated environments by utilizing external images we provide, including real-world photos or sketches, creating virtual environments that can be controlled and interacted with.
What sets Genie apart is its ability to learn and identify controllable parts from videos and generate interactive scenarios.
Additionally, Genie can create a complete new interactive environment with just one image. It first uses the text-to-image generative model Imagen2 to generate keyframes and then applies dynamic effects to the images using Genie.
Genie can generate interactive animated environments through synthesized images.
At the same time, Genie can also accept unseen image prompts, including real-world photos or simple sketches, allowing people to interact with previously immovable real objects.
Genie can generate interactive animated environments through real photos and hand-drawn sketches.
Blog
The article states:
Genie’s features allow anyone, even children, to create and enter controlled simulated environments or interactive generated worlds.
The end of the article also mentions the ambitious goal of Genie:
The applications of Genie are not limited to entertainment or creative development. It can also serve as an excellent testing platform for training intelligent agents, thus driving the development of the AI field.
It is reported that an intelligent agent refers to an autonomous entity that can observe the surrounding environment and take actions to achieve goals. This is a core concept and important goal of current AI research.
In recent months, Google has released multiple generative AI models or information, including the powerful AI advisor “Gemini”, the text-to-video generation tool “Lumiere”, and the keyword image generation tool “ImageFX”, all of which have attracted public attention.
On the other hand, OpenAI’s text-to-video tool Sora, as the first video generation product, also sparked an AI frenzy a few weeks ago.
(Why can OpenAI’s Sora bring a big leap in AI video generation just by providing text to AI?)
However, there has been recent controversy surrounding Gemini’s generation of images, which has caused Alphabet, its parent company, to experience a more than 4% drop in stock price in a single day (26).
Demis Hassabis, Head of Research at Google DeepMind, stated at the World Mobile Congress (MWC Barcelona 2024) yesterday:
“We have taken down that feature of Gemini and will fix the issue and restore it in the coming weeks.”
AI
Gemini
Genie
Google
Google DeepMind
ImageFX
Lumiere
OpenAI
Generative AI
Related Articles
Reddit Signs Collaboration Agreement with Google to Provide Content for AI Model Training
Nvidia’s Financial Report Exceeds Expectations Again, Celebrating the AI Coin