Is GPT-4o Not Far from “Her” in terms of Integrated Multidimensional Voice Interaction Potential?
“Her” is a science fiction romance film directed and written by Spike Jonze, released in 2013. The story of the film is set in the near future, with the protagonist Theodore (played by Joaquin Phoenix) being a lonely writer who has just gone through a failed marriage. To alleviate his loneliness, he purchases a state-of-the-art artificial intelligence operating system (OS) that has self-learning and emotional cognitive capabilities.
The operating system has a female voice and calls herself Samantha (voiced by Scarlett Johansson). Over time, Theodore develops a deep emotional relationship with Samantha and gradually realizes that she is not just a program, but a unique entity with personality and emotions. The film explores the relationship between human emotions, loneliness, love, and technology.
After the release of OpenAI’s latest model, GPT-4o, founder Sam Altman responded to the product using references from the film “Her.”
The connection between “Her” and GPT-4o lies in both being AI-based conversational systems. The AI Samantha depicted in “Her” and GPT-4o are designed to engage in natural and fluent conversations with humans. However, the story of “Her” goes further, exploring whether artificial intelligence can possess genuine emotions, consciousness, and the potential to establish deep emotional connections with humans. Although ChatGPT 4.0 has made significant progress in natural language processing and conversation generation, it still lacks genuine emotions and consciousness and primarily functions to generate meaningful conversations and answer questions based on training data.
The film “Her” reminds us to contemplate and explore the boundaries between humans and technology as AI technology continues to advance, and how we can utilize technology to improve our lives without losing our humanity and emotions.
OpenAI’s Chief Technical Officer, Mira Murati, elaborated on how GPT-4o expands upon GPT-4’s intelligence foundation to encompass various media formats. Unlike its predecessor, GPT-4 Turbo, which was limited to text and images, GPT-4o integrates voice and enhances the multidimensional interaction between users and AI. This includes a more dynamic ChatGPT that supports voice interaction, real-time conversation, and responses to subtle nuances in human speech.
The improvements in ChatGPT are particularly notable. With GPT-4o, users can now interrupt AI during its response and receive rich responses that adapt to subtle differences in queries. Additionally, the enhanced visual capabilities of AI enable it to quickly analyze images and provide relevant information, ranging from code analysis to brand recognition in photos.
Looking ahead, OpenAI plans to expand the capabilities of GPT-4o, including real-time translation of foreign menus and potentially even live sports commentary. The new model also has multilingual capabilities, supporting around 50 languages, with improved efficiency and scalability compared to previous versions. Initially, the voice capabilities of GPT-4o will be limited to a few partners to address potential misuse issues.
Reportedly, GPT-4o is available in the free version of ChatGPT (unavailable in Taiwan as of the deadline), while subscribers have access to more usable information. A revamped ChatGPT user interface promises to provide more interactive communication processes, and the macOS desktop version has already begun rolling out, with the Windows version expected to be released later this year.