The Era of “Ultra-realistic Conversations” and Robots Falling in Love: OpenAI’s Latest Model GPT-4o
Yesterday’s OpenAI press conference announced the new language model GPT-4o, which can take user input in the form of text, sound, images, laughter, and emotions, providing users with a more human-like chat environment.
Table of Contents:
Toggle
The Real Chatbot GPT-4o
Advantages of the GPT-4o Model
Can be Used as a Real-time Chatbot
Goal to be Available to All Users for Free
OpenAI Continues to Compete with Google
According to the team, GPT-4o will move towards a more natural human-machine interaction, accepting any combination of text, audio, and visual inputs, and generating any combination of text, audio, and visual outputs. Compared to existing models, GPT-4o is more accurate and faster in understanding visual and audio information.
GPT-4o performs similarly to GPT-4 Turbo in English text and code, with an average response time of 320 milliseconds, similar to the gap between human conversations. The average delay for GPT-3.5 in the past was 2.8 seconds, and for GPT-4 it was 5.4 seconds.
However, what do these represent?
The GPT-4o model can achieve more realistic interaction by analyzing speech and real-time images, meaning that users only need to open their phone camera or have a direct conversation with it to start.
For example, real-time translation, singing birthday songs, serving as a customized language learning tutor, analyzing the surrounding environment, understanding human jokes and displaying a happy mood and laughter, or understanding the ironic meaning behind language.
GPT-4o can be like a real friend, praising how cute the user’s dog is with envy and asking for its name out of curiosity. GPT-4o is more like a conversation than a question and answer.
The GPT-4o model has trained a new model that spans text, visual, and audio end-to-end. In addition to the user’s primary voice or text input, it can automatically input the user’s expressions, laughter, and environment, making the responses more realistic and accurate. If the user interrupts its speech, GPT-4o also knows what to do.
Chat-4o Learning Math
The “o” in GPT-4o refers to “omni,” which means versatile. The team hopes to bring users a model that can respond to anything, rather than just text input or single-dimensional questions.
Currently, GPT-4o is available to paying users, but it seems that only text and voice input are open, and the officially mentioned real-time image input needs to wait for some time. OpenAI’s goal is to make it available to all users for free.
Paying users can try GPT-4o in advance.
Based on my experience, many of the features mentioned by the team are still not perfect, including the effectiveness of listening to jokes in Chinese, the emptiness of real chat content, and the slower response speed. I look forward to further updates from the team.
OpenAI chose to release the new product before the Google I/O developer conference, indicating strong competition. Previously, it was rumored that both ChatGPT and Gemini models from OpenAI may collaborate with Apple to be introduced in iOS 18.
GPT-4
GPT-4o
OpenAI
Further Reading:
Vitalik: GPT4 Has Already Passed the Turing Test, Best to Remember This
Is GPT-4o not far from “Her”? The Potential Applications of GPT-4o’s Integration of Multi-dimensional Voice Interaction.